0% found this document useful (0 votes)
3 views

PythonAI-VectorEmbeddingsForSharing

The document provides an overview of vector embeddings, including their generation, similarity calculations, and applications in AI. It discusses various distance metrics, quantization methods, and the importance of vector search in tasks like recommendation systems and fraud detection. Additionally, it highlights the use of different embedding models and their respective dimensions, along with practical coding examples for implementation.

Uploaded by

38o69zmdn
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

PythonAI-VectorEmbeddingsForSharing

The document provides an overview of vector embeddings, including their generation, similarity calculations, and applications in AI. It discusses various distance metrics, quantization methods, and the importance of vector search in tasks like recommendation systems and fraud detection. Additionally, it highlights the use of different embedding models and their respective dimensions, along with practical coding examples for implementation.

Uploaded by

38o69zmdn
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Python +

AI
Python + AI
🧠 3/11: LLMs
↖️ 3/13: Vector
embeddings
🔍 3/18: RAG
3/20: Vision models
3/25: Structured outputs
3/27: Quality
Register & Safety
aka.ms/PythonAI/serie
Python + AI
↖️Vector embeddings
Pamela Fox
Python Cloud Advocate
www.pamelafox.org
Today we'll cover...
• What are vector embeddings?
• Vector similarity space
• Vector search
• Vector distance metrics
• Vector quantization
• Dimension reduction
Vector embeddings 101
Want to follow along?
1. Open this GitHub repository:
https://github.com/pamelafox/vector-embeddings-demo
s
2. Use "Code" button to create a GitHub Codespace:

3. Wait a few minutes for Codespace to start up


Vector embeddings
An embedding encodes an input as a list of floating-point numbers.
"dog" → [0.017198, -0.007493, -0.057982,…]

Different embedding models output different embeddings, with varying


lengths.
Vector
Embedding model Encodes MTEB Avg.
length
word2vec words 300
text (up to ~400
Sbert (Sentence-Transformers) 768
words)
text (up to 8191
OpenAI text-embedding-ada-00 1536 61.0%
2 tokens)
text (up to 8191
OpenAI text-embedding-3-small 256 - 1536 62.3%
tokens)
MTEB: text (up to 8191
OpenAIhttps://huggingface.co/spaces/mteb/leaderb
text-embedding-3-large 256 - 3072 64.6%
oard tokens)
Generating an embedding with
OpenAI SDK
Use the OpenAI SDK with OpenAI.com, Azure, Ollama, or
GitHub Models:
openai_client = openai.OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"]
)

Generate embeddings for single or multiple inputs:


embeddings_response = openai_client.embeddings.create(
model ="text-embedding-3-small",
dimensions=1536,
input="hello world"
)
print(embeddings_response.data[0].embedding)

Notebook: generate_embedding.ipynb
Vector embeddings vary across
models
"queen "queen "queen
" " "
word2vec-google-news-300 text-embedding-ada-002 text-embedding-3-small
300 dimensions 1536 dimensions 1536 dimensions
[0.0052490234375, [-0.00449855113402009, [0.04379640519618988,
-0.1435546875, -0.006737332791090012, -0.03982372209429741,
-0.0693359375,...] - 0.044741131365299225, ...]
0.002418933203443885, ...]

Notebook: comparison.ipynb
Vector similarity
We compute embeddings so that we can calculate similarity between inputs.
The most common distance measurement is cosine similarity.

def cosine_similarity(v1, v2):

dot_product = sum(
[a * b for a, b in zip(v1,
v2)])

magnitude = (
sum([a**2 for a in v1]) *
sum([a**2 for a in v2])) ** 0.5

return dot_product / magnitude

Notebook: similarity.ipynb
Similarity space varies across models

text-embedding-ada-002 text-embedding-3-small (1536)

cosin cosin
word word
e e
1.000 1.000
dog dog
0 0
anima 0.885 anima 0.661
l 5 l 9
0.866 0.650
god cat
0 2
0.863 0.618
cat car
5 5
0.856 0.592
fish horse
6 7
0.855 0.573
bird boat
5 7
0.853 0.565
Similarity values range across models
Cosine similarity of "dog" to 1000 other words across two models.

text-embedding-ada-002 text-embedding-3-small (1536)


Business uses for vector similarity
Recommendation system:

https://learn.microsoft.com/azure/postgresql/flexible-server/generative-ai-recommendation-system

Fraud detection:

https://www.redpanda.com/blog/fraud-detection-pipeline-redpanda-
pinecone
Vector search
Vector search
1 Compute the embedding vector for the query
2 Find K closest vectors for the query vector
• Search exhaustively or using approximations

Query Query vector K closest vectors


Compute Search
embedding vector existing vectors

[-0.003335318, - [[“snake”, [-0.122, ..],


“tortoise” 0.0176891904,…] Search [“frog”, [-0.045, ..]]]
OpenAI
create embedding existing vectors
Exhaustive vector search in Python
An exhaustive search checks every single vector for the closest
one.
def exhaustive_search(query_vector, vectors):

similarities = []
for title, vector in vectors.items():
similarity = cosine_similarity(query_vector,
vector)
similarities.append((title, similarity))

similarities.sort(key=lambda x: x[1], reverse=True)

return similarities
Notebook: search.ipynb
ANN (Approximate Nearest Neighbor)
search
There are multiple ANN search algorithms that can speed up
search time.
Algorithm Python package Example database support

HNSW hnswlib PostgreSQL pgvector extension


Azure AI Search
Chromadb
Weaviate
DiskANN diskannpy Cosmos DB

IVFFlat faiss PostgreSQL pgvector extension

Faiss faiss None, in-memory index only


HNSW: Hierarchical Navegable Small
Worlds
The HNSW algorithm is great for situations where your index may
be frequently updated, and scales logarithmically even with large
indexes.
import hnswlib

p = hnswlib.Index(space='cosine', dim=1536)
p.init_index(
max_elements=len(movies),
ef_construction=200,
M=16)

vectors = list(movies.values())
ids = list([i for i in range(len(vectors))])
p.add_items(vectors, ids)

p.set_ef(50)
From HNSW research paper:
https://github.com/nmslib/hnswlib
Business use: Retrieval Augmented
Generation
Vector search can greatly improve the retrieval step in RAG.

Azure OpenAI +
Azure AI Search +
Azure AI Vision +
Azure App Service +

Code:
aka.ms/ragchat

Demo:
aka.ms/ragchat/demo

Join upcoming stream on RAG on 3/18! aka.ms/PythonAI/series


Vector distance metrics
Common distance metrics
Four common distance metrics between two vectors are:

1. Euclidean distance
2. Manhattan distance
3. Inner product
4. Cosine distance

The metric that we pick may depend on whether the vectors are unit
vectors.
Notebook: distance_metrics.ipynb
Unit vectors
A unit vector is a vector with a magnitude of 1.
def magnitude(vector):
return sum([a**2 for a in vector]) ** 0.5

Two vectors with same magnitude After normalization,


of 3.7416573867739413: two vectors with same magnitude
of 1:

[1, 2, 3] [0.26726124 0.53452248 0.80178373]


[3, 1, 2] [0.80178373 0.26726124 0.53452248]
Euclidean distance
The straight-line distance between two points in Euclidean space.

def euclidean(v1, v2):


return magnitude(v1 - v2)

euclidean(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

0.65
5
Manhattan distance
The "taxicab" distance between two points in Euclidean space.

def manhattan(v1, v2):


return sum(abs(a - b)
for a, b in zip(v1, v2))

manhattan(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

1.07
Dot product
The sum of products of corresponding vector elements.

def dot_product(v1, v2):


return sum(a * b
for a, b in zip(v1, v2))

x1y1 + x2y2 + x3y3


dot_product(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

0.78
6
Cosine distance
The complement of the cosine of the angle between two vectors in
Euclidean space.
def cosine_similarity(v1, v2):
return dot_product(v1, v2) /
(magnitude(v1) * magnitude(v2))

def cosine_distance(v1, v2):


return 1 - cosine_similarity(v1,
v2)

cosine_distance(
[0.26726124 0.53452248
0.80178373],
[0.80178373 0.26726124
0.53452248]
)

0.21
Cosine similarity vs. Dot product
For unit vectors, the cosine similarity is the same as the dot
product.

>> cosine_similarity(v1, v2) == dot_product(v1, v2)

True

>>> 1 - cosine_distance(v1, v2) == dot_product(v1, v2)

True

In some vector databases, the dot product operator will be slightly faster
than cosine distance operators, since it does not need to calculate the
magnitude.

If your embeddings are unit vectors, consider using dot product as the metric.
OpenAI embedding models currently all output unit vectors!
Vector quantization
Vector quantization
Most vector embeddings are stored as floating point numbers (64-bit in
Python). We can use quantization to reduce the size of the embeddings.

 Scalar quantization: Reduce each number to an integer

[0.03265173360705376, [53, 40, 20, ...]


0.01370371412485838,
-0.017748944461345673,...]

 Binary quantization: Reduce each number to a single bit

[0.03265173360705376, [1, 1, 0, ...]


0.01370371412485838,
-0.017748944461345673,...]

Notebook: quantization.ipynb
Scalar quantization: The process
float3 int8
2
[0.03265173360705376, 0.01370371412485838, ...] [53, 40,...]
[-0.00786194484680891, - [27, 19, ...]
0.018985141068696976, ...] [29, 44, ...]
[-0.0039056178648024797,
0.019039113074541092, ...]
1. Calculate the min/max of all the embeddings
2. Normalize each embedding's values to [0, 1] range
3. Map normalized values into integer buckets from -128 to
+127
Min float Max float

~Min ~Max
observed observed
value value

-128 127
Scalar quantization: Before & after
"Moan
float3 a" int8
quantizati
2
[0.03265173360705376, [53, 40, 20, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Scalar quantization: Affects on
similarity
float3 int8
2
[0.03265173360705, 0.013703...] [53, 40,...]
[-0.00786194484680891, -0.0189...] [27, 19, ...]
[-0.0039056178648024797, [29, 44, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.903532
Lilo & Stitch 0.502114 The Little Mermaid 0.894227
The Little Mermaid 0.498209 Lilo & Stitch 0.893718
Big Hero 6 0.491800 ✅ Big Hero 6 0.890959
Monsters University 0.484857 Monsters University 0.890915

The Princess and the Frog 0.471984 ✅ The Princess and the Frog 0.889009

Finding Dory 0.471386 ✅ Finding Dory 0.888350


Maleficent 0.461029 Ice Princess 0.885539
Ice Princess 0.457817 Maleficent 0.885364
Binary quantization: The process
float3 bit
2
[0.03265173360705376, 0.01370371412485838, ...] [1, 1,...]
[-0.00786194484680891, - [0, 0, ...]
0.018985141068696976, ...] [0, 1, ...]
[-0.0039056178648024797,
0.019039113074541092, ...]

1. Pick a center C based on average, sample, or offline


knowledge
2. If value is >= C, map to 1, otherwise map to 0

0 C 1
Binary quantization: Before & after
"Moan
float3 a" bit
quantizati
2
[0.03265173360705376, [1, 1, 0, ...]
on
0.01370371412485838,
-0.017748944461345673,...]
Binary quantization: Affects on
similarity
float3 bit
2
[0.03265173360705, 0.013703...] [1, 1,...]
[-0.00786194484680891, -0.0189...] [0, 0, ...]
[-0.0039056178648024797, [0, 1, ...]
0.0190...]
movie similarity movie similarity
Moana 1.000000 Moana 1.000000
Mulan 0.546800 ✅ Mulan 0.686634
Lilo & Stitch 0.502114 The Little Mermaid 0.666260
The Little Mermaid 0.498209 The Princess and the Frog 0.659825
Big Hero 6 0.491800 Lilo & Stitch 0.657599
Monsters University 0.484857 ❌ Big Hero 6 0.655869

The Princess and the Frog 0.471984 Ice Princess 0.648046

Finding Dory 0.471386 ✅ Finding Dory 0.643830


Maleficent 0.461029 The Lion King 0.643088
Ice Princess 0.457817 Maleficent 0.642270
Quantization: effects on storage size
float3 int8 bit
2
[0.03265173360705,...] [53, 40,...] [1, 1,...]
[- [27, [0,
0.00786194484680891,...] 19, ...] 0, ...]
[- [29, [0,
0.00390561786480247,...] 44, ...] 1, ...]
Python built-
in number 12728 12728 12728
type
numpy typed
arrays 12400 1648 1648

Databases with vector storage support can often save more space
with bits,
using techniques such as bit packing.
Quantization: effects on index size in AI
Search
Azure AI Search supports quantization as a way to reduce vector storage
space needed.
float3 int8 bit
2
[0.03265173360705,...] [53, 40,...] [1, 1,...]
[- [27, [0,
0.00786194484680891,...] 19, ...] 0, ...]
[- [29, [0,
0.00390561786480247,...] 44, ...] 1, ...]
Vector
index size 1177.12 298.519 41.8636
(MB)

74.64% reduction! 96.44% reduction!

AI Search has two storage locations for vectors: the HNSW index used for
searching, and the actual data storage. The stats above are for index size.

Learn more in RAG time https://aka.ms/rag-time/journey3


series:
MRL dimension reduction
MRL: Matryoshka Representation
Learning
MRL is a technique that lets you reduce the dimensions of a
vector,
while still retaining much of the original semantic
The OpenAI text-embedding-3-large
representation. 3072
...
model
has default dimensions of 3072, 102 ...
4
but can be truncated all the way to
256. 512 ...
⚠️Only some models support MRL!
You can truncate either:
• when first generating embeddings 256 ...
• or when storing in database (if
supported)
Dimension reduction with OpenAI SDK
Specify dimensions when generating an embedding:
embeddings_response = openai_client.embeddings.create(
model ="text-embedding-3-small",
input="hello world",
dimensions=256
)

print(embeddings_response.data[0].embedding)

Notebook: dimension_reduction.ipynb
Dimension reduction: Before & after
"Moan "Moan
a" a"
dimensions=153 dimensions=256
6
[0.03265173360705376, [0.06316128373146057,
0.01370371412485838, 0.02650836855173111,
-0.017748944461345673,...] -0.03433343395590782,...]
Dimension reduction: Affects on
similarity
dimensions=1536 dimensions=256
[0.03265173360705376, [0.03265173360705376,
0.01370371412485838, 0.01370371412485838,
-0.017748944461345673,...] -0.017748944461345673,...]

movie similarity movie similarity


Moana 1.000000 Moana 1.000000
Mulan 0.546800 The Little Mermaid 0.587367
Lilo & Stitch 0.502114 Mulan 0.583428

The Little Mermaid 0.498209 Lilo & Stitch 0.575990

Big Hero 6 0.491800 ✅ Big Hero 6 0.574590


Monsters University 0.484857 The Princess and the Frog 0.568726

The Princess and the Frog 0.471984 Finding Dory 0.549391
Finding Dory 0.471386 The Lion King 0.521125
Maleficent 0.461029 Tangled 0.513131
Ice Princess 0.457817 Maleficent 0.511412

Dimension reduction plus
quantization
For maximum vector compression, combine both techniques!
...

1 MRL
Dimension To keep high accuracy,
Reduction only compress vectors in
... index,
oversample when retrieving,
2 Scalar or Binary and rescore using originals.
Quantization
. That's how Azure AI Search
can handle billions of vectors
Learn more in RAG time https://aka.ms/rag-time/journey3
Dive even deeper into vector
embeddings!
Vector embeddings 101 Quantization:
• Embedding projector • Scalar quantization 101
• Why are Cosine Similarities of Text embe • Product quantization 101
ddings almost always positive? • Binary and scalar quantiz
• Expected Angular Differences in Embeddi ation
ng Random Text?
• Embeddings: What they are and why the
y matter MRL dimension reduction:
• Unboxing Nomic Embed v
ANN algorithms 1.5: Resizable Production
• HNSW tutorial Embeddings with MRL
• Video: HNSW for Vector Search Explained • MRL from the Ground Up

Distance metrics:
• Two Forms of the Dot Product
• Is Cosine-Similarity of Embeddings Really
About Similarity?
Next steps 🧠 3/11: LLMs
Join upcoming streams! →
↖️ 3/13: Vector
embeddings
Come to office hours on
Thursdays in Discord: 🔍 3/18: RAG
aka.ms/pythonai/oh
3/20: Vision models
Get more Python AI 3/25: Structured outputs
resources 3/27: Quality & Safety
aka.ms/thesource/Python_A Register @ aka.ms/PythonAI/series
I
Thank you!

You might also like