0% found this document useful (0 votes)
32 views101 pages

p2d2 Pgvector Workshop

The document discusses the use of PostgreSQL as a vector database through the pgvector extension, which enables specialized vector data types, indexing, and search capabilities. It outlines the steps for creating a schema, loading embeddings, and querying data, as well as the advantages of using PostgreSQL over specialized vector databases. The presentation emphasizes the integration of PostgreSQL with existing software and data, highlighting its established community and ongoing improvements.

Uploaded by

ryszard.klucha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views101 pages

p2d2 Pgvector Workshop

The document discusses the use of PostgreSQL as a vector database through the pgvector extension, which enables specialized vector data types, indexing, and search capabilities. It outlines the steps for creating a schema, loading embeddings, and querying data, as well as the advantages of using PostgreSQL over specialized vector databases. The presentation emphasizes the integration of PostgreSQL with existing software and data, highlighting its established community and ongoing improvements.

Uploaded by

ryszard.klucha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

PostgreSQL as a Vector database

Gülçin Yildirim Jelinek Boriss Mejías


Staff Engineer Solutions Architect
4th of June, 2024

©EDB 2024 — ALL RIGHTS RESERVED.


The Muffin Chihuahua Challenge

©EDB 2024 — ALL RIGHTS RESERVED.


©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
pgvector

©EDB 2024 — ALL RIGHTS RESERVED.


Vectors in your PostgreSQL database (pgvector)


Specialized vector data type

Indexing

Search capabilities

Distance operators

©EDB 2024 — ALL RIGHTS RESERVED.


pgvector is an extension

CREATE EXTENSION vector;

©EDB 2024 — ALL RIGHTS RESERVED.


vector is a data type

©EDB 2024 — ALL RIGHTS RESERVED.


vector is a data type

CREATE SCHEMA p2d2;


CREATE TABLE p2d2.pictures (
id BIGSERIAL PRIMARY KEY
, filename text NOT NULL
, embedding vector(768) NOT NULL
);

©EDB 2024 — ALL RIGHTS RESERVED.


Storage

p2d2.pictures
id BIGSERIAL PLAIN
filename text EXTENDED
embedding vector(768) EXTERNAL

©EDB 2024 — ALL RIGHTS RESERVED.


Go for PLAIN

ALTER TABLE p2d2.pictures


ALTER COLUMN embedding
SET STORAGE PLAIN;

©EDB 2024 — ALL RIGHTS RESERVED.


Storage

p2d2.pictures
id BIGSERIAL PLAIN
filename text EXTENDED
embedding vector(768) PLAIN

©EDB 2024 — ALL RIGHTS RESERVED.


Lab – Part I

©EDB 2024 — ALL RIGHTS RESERVED.


Each Participant
Ip address: https://42.0.0.101 Self-Signed Cert

Credentials: user42 - secrettowel


Database: p2d2db42

psql -d p2d2db42
CREATE EXTENSION vector;
\dx

©EDB 2024 — ALL RIGHTS RESERVED.


CREATE TABLE
psql -d p2d2db42
CREATE SCHEMA p2d2;
CREATE TABLE p2d2.pictures (
id BIGSERIAL PRIMARY KEY
, filename text NOT NULL
, embedding vector(768) NOT NULL
);
ALTER TABLE p2d2.pictures
ALTER COLUMN embedding SET STORAGE PLAIN;

©EDB 2024 — ALL RIGHTS RESERVED.


Lab – Part II

©EDB 2024 — ALL RIGHTS RESERVED.


©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
/home/dataset/chihuahua_muffin

https://www.kaggle.com/datasets/samuelcortinhas/muffin-vs-chihuahua-image-classification/

©EDB 2024 — ALL RIGHTS RESERVED.


python3 with imgbeddings
OpenAI’s CLIP model via HuggingFace transformers

/home/dataset/chihuahua_muffin

https://www.kaggle.com/datasets/samuelcortinhas/muffin-vs-chihuahua-image-classification/

©EDB 2024 — ALL RIGHTS RESERVED.


Load Embeddings
Check /home/user42/.load_embeddings
load_embeddings.py

©EDB 2024 — ALL RIGHTS RESERVED.


Load Embeddings
Check /home/user42/.load_embeddings
load_embeddings.py

ibed = imgbeddings()
ibed.to_embeddings(img)

©EDB 2024 — ALL RIGHTS RESERVED.


Load Embeddings
Check /home/user42/.load_embeddings
load_embeddings.py

ibed = imgbeddings() psycopg2.connect(…)

ibed.to_embeddings(img) conn.

©EDB 2024 — ALL RIGHTS RESERVED.


Load Embeddings
Check /home/user42/.load_embeddings
load_embeddings.py

ibed = imgbeddings() psycopg2.connect(…)

ibed.to_embeddings(img) conn.commit()

©EDB 2024 — ALL RIGHTS RESERVED.


Load Embeddings
./load_embeddings.py \
/home/dataset/chihuahua_muffin/chihuahua \
/var/run/postgresql 5432 p2d2db42 user42

./load_embeddings.py \
/home/dataset/chihuahua_muffin/muffin \
/var/run/postgresql 5432 p2d2db42 user42

©EDB 2024 — ALL RIGHTS RESERVED.


Lab – Part III

©EDB 2024 — ALL RIGHTS RESERVED.


querying

©EDB 2024 — ALL RIGHTS RESERVED.


©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
Querying

©EDB 2024 — ALL RIGHTS RESERVED.


Querying

SELECT embedding <-> '[0.2, 0.1, 0.0]' AS distance


FROM p2d2.pictures;

©EDB 2024 — ALL RIGHTS RESERVED.


Querying

SELECT embedding <-> '[0.2, 0.1, 0.0]' AS distance


FROM p2d2.pictures;
SELECT embedding <+> '[0.2, 0.1, 0.0]' AS l1_distance
FROM p2d2.pictures;

©EDB 2024 — ALL RIGHTS RESERVED.


Querying

SELECT embedding <-> '[0.2, 0.1, 0.0]' AS distance


FROM p2d2.pictures;
SELECT embedding <+> '[0.2, 0.1, 0.0]' AS l1_distance
FROM p2d2.pictures;
SELECT embedding <=> '[0.2, 0.1, 0.0]' AS cosine_simil
FROM p2d2.pictures;

©EDB 2024 — ALL RIGHTS RESERVED.


Querying

SELECT embedding <-> '[0.2, 0.1, 0.0]' AS distance


FROM p2d2.pictures;
SELECT embedding <+> '[0.2, 0.1, 0.0]' AS l1_distance
FROM p2d2.pictures;
SELECT embedding <=> '[0.2, 0.1, 0.0]' AS cosine_simil
FROM p2d2.pictures;
SELECT embedding <#> '[0.2, 0.1, 0.0]' AS inner_prod
FROM p2d2.pictures;

©EDB 2024 — ALL RIGHTS RESERVED.


Querying

SELECT id, filename FROM p2d2.pictures


WHERE filename ~ '/chihuahua/' LIMIT 10;
SELECT id, filename FROM p2d2.pictures
WHERE filename ~ '/muffin/' LIMIT 10;

SELECT embedding <-> (SELECT embedding


FROM p2d2.pictures WHERE id=1)
FROM p2d2.pictures WHERE id = 2600;

©EDB 2024 — ALL RIGHTS RESERVED.


Lab – Part IV

©EDB 2024 — ALL RIGHTS RESERVED.


indexing

©EDB 2024 — ALL RIGHTS RESERVED.


pgvector has indexing

©EDB 2024 — ALL RIGHTS RESERVED.


pgvector has indexing


HNSW


IVFFlat

©EDB 2024 — ALL RIGHTS RESERVED.


pgvector has indexing


HNSW
Hierarchical Navigable Small Worlds

IVFFlat

©EDB 2024 — ALL RIGHTS RESERVED.


pgvector has indexing


HNSW
Hierarchical Navigable Small Worlds

IVFFlat
InVerted File with Flat compression

©EDB 2024 — ALL RIGHTS RESERVED.


pgvector has indexing


HNSW
Hierarchical Navigable Small Worlds

IVFFlat
InVerted File with Flat compression
Inverted lists

©EDB 2024 — ALL RIGHTS RESERVED.


Creating indexes for vectors

©EDB 2024 — ALL RIGHTS RESERVED.


Creating HNSW indexes for vectors

CREATE INDEX ON p2d2.pictures


USING hnsw (embedding vector_l2_ops);

©EDB 2024 — ALL RIGHTS RESERVED.


Creating HNSW indexes for vectors

CREATE INDEX ON p2d2.pictures


USING hnsw (embedding vector_l2_ops);
CREATE INDEX ON p2d2.pictures
USING hnsw (embedding vector_l1_ops);

©EDB 2024 — ALL RIGHTS RESERVED.


Creating HNSW indexes for vectors

CREATE INDEX ON p2d2.pictures


USING hnsw (embedding vector_l2_ops);
CREATE INDEX ON p2d2.pictures
USING hnsw (embedding vector_l1_ops);
CREATE INDEX ON p2d2.pictures
USING hnsw (embedding vector_cosine_ops);

©EDB 2024 — ALL RIGHTS RESERVED.


Creating HNSW indexes for vectors

CREATE INDEX ON p2d2.pictures


USING hnsw (embedding vector_l2_ops);
CREATE INDEX ON p2d2.pictures
USING hnsw (embedding vector_l1_ops);
CREATE INDEX ON p2d2.pictures
USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON p2d2.pictures
USING hnsw (embedding vector_ip_ops);

©EDB 2024 — ALL RIGHTS RESERVED.


Creating IVFFlat indexes for vectors

CREATE INDEX ON p2d2.pictures


USING ivfflat (embedding vector_l2_ops);
CREATE INDEX ON p2d2.pictures
USING ivfflat (embedding vector_l1_ops);
CREATE INDEX ON p2d2.pictures
USING ivfflat (embedding vector_cosine_ops);
CREATE INDEX ON p2d2.pictures
USING ivfflat (embedding vector_ip_ops);

©EDB 2024 — ALL RIGHTS RESERVED.


Querying

©EDB 2024 — ALL RIGHTS RESERVED.


Querying

SELECT embedding <-> '[0.2, 0.1, 0.0]' AS distance


FROM p2d2.pictures;
SELECT embedding <+> '[0.2, 0.1, 0.0]' AS l1_distance
FROM p2d2.pictures;
SELECT embedding <=> '[0.2, 0.1, 0.0]' AS cosine_simil
FROM p2d2.pictures;
SELECT embedding <#> '[0.2, 0.1, 0.0]' AS inner_prod
FROM p2d2.pictures;

©EDB 2024 — ALL RIGHTS RESERVED.


Lab – Part V

©EDB 2024 — ALL RIGHTS RESERVED.


The Challenge

©EDB 2024 — ALL RIGHTS RESERVED.


©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
©EDB 2024 — ALL RIGHTS RESERVED.
The Challenge
Check /home/user42/.the_challenge
chihuahua_or_muffin.py

©EDB 2024 — ALL RIGHTS RESERVED.


The Challenge
Check /home/user42/.the_challenge
chihuahua_or_muffin.py

connect
close
cursor
fetchall
Vector operator

©EDB 2024 — ALL RIGHTS RESERVED.


The Challenge
./chihuahua_or_muffin.py \
/home/dataset/chihuahua_muffin/challenge/img1a.jpg \
/var/run/postgresql 5432 p2d2db42 user42

©EDB 2024 — ALL RIGHTS RESERVED.


The Challenge
./chihuahua_or_muffin.py \
/home/dataset/chihuahua_muffin/challenge/img1a.jpg \
/var/run/postgresql 5432 p2d2db42 user42

python3 -m http.server 8042

©EDB 2024 — ALL RIGHTS RESERVED.


Closing Words

©EDB 2024 — ALL RIGHTS RESERVED.


Why not PostgreSQL?

©EDB 2024 — ALL RIGHTS RESERVED.


Why not PostgreSQL?


Relational database designed for OLTP/OLAP

Multi-Version Concurrency Control

©EDB 2024 — ALL RIGHTS RESERVED.


Why not PostgreSQL?


Relational database designed for OLTP/OLAP

Multi-Version Concurrency Control

Vertical Scalability

©EDB 2024 — ALL RIGHTS RESERVED.


Why PostgreSQL?

©EDB 2024 — ALL RIGHTS RESERVED.


Why PostgreSQL?


Shortcomings of specialized vector databases
– Lack of transactional support
– Difficult to combine queries with other filters

©EDB 2024 — ALL RIGHTS RESERVED.


Why PostgreSQL?


Shortcomings of specialized vector databases
– Lack of transactional support
– Difficult to combine queries with other filters

Integration with existing software

©EDB 2024 — ALL RIGHTS RESERVED.


Why PostgreSQL?


Shortcomings of specialized vector databases
– Lack of transactional support
– Difficult to combine queries with other filters

Integration with existing software

Integration with existing data

©EDB 2024 — ALL RIGHTS RESERVED.


Why PostgreSQL?


Shortcomings of specialized vector databases
– Lack of transactional support
– Difficult to combine queries with other filters

Integration with existing software

Integration with existing data

Established community

©EDB 2024 — ALL RIGHTS RESERVED.


Why PostgreSQL?


Shortcomings of specialized vector databases
– Lack of transactional support
– Difficult to combine queries with other filters

Integration with existing software

Integration with existing data

Established community – to be improved

©EDB 2024 — ALL RIGHTS RESERVED.


What’s the buzz?

©EDB 2024 — ALL RIGHTS RESERVED.


Take Away


Multi-disciplinary work

©EDB 2024 — ALL RIGHTS RESERVED.


Take Away


Multi-disciplinary work

pgvector already turns PostgreSQL into a vector
database

pgvector keeps improving

©EDB 2024 — ALL RIGHTS RESERVED.


Take Away


Multi-disciplinary work

pgvector already turns PostgreSQL into a vector
database

pgvector keeps improving

Look at the larger picture and collaborate

©EDB 2024 — ALL RIGHTS RESERVED.


Thank you

©EDB 2024 — ALL RIGHTS RESERVED.

You might also like