Vector Search Demo Commands
Vector Search Demo Commands
@@@@@@@@@@@
COMMANDS
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@
===================================================
Sample Table Creation With Vector Datatype:
===================================================
===================================================================================
DEMO TO ILLUSTRATE THE WORKFLOW OF VECTOR SEARCH ARCHITECTURE USING PDF DOCUMENTS
===================================================================================
Using the DBMS_VECTOR package, load your embedding model into the Oracle Database.
You must specify the directory where you stored your model in ONNX format as well
as describe what type of model it is and how you want to use it.
$ sqlplus vector/<passwd>@<pdb_service_name>
Note:
At minimum, the JSON metadata must describe the machine learning 'function'
supported by the model.
SQL>
insert into doc_chunks
select dt.id doc_id, et.embed_id chunk_id, et.embed_data chunk_data,
to_vector(et.embed_vector) chunk_embedding
from
documentation_tab dt,
dbms_vector_chain.utl_to_embeddings(
dbms_vector_chain.utl_to_chunks(dbms_vector_chain.utl_to_text(dt.data),
json('{"normalize":"all"}')),
json('{"provider":"database", "model":"doc_model"}')) t,
JSON_TABLE(t.column_value, '$[*]' COLUMNS (embed_id NUMBER PATH '$.embed_id',
embed_data VARCHAR2(4000) PATH '$.embed_data', embed_vector CLOB PATH
'$.embed_vector')) et;
SQL> commit;
For a similarity search you will need query vectors. Here you enter your query text
and generate an associated vector embedding.
PRINT query_vector
6. Run a similarity search to find, within your books, the first four most relevant
chunks that talk about backup and recovery
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using the generated query vector, you search similar chunks in the DOC_CHUNKS
table. For this, you use the VECTOR_DISTANCE SQL function and the FETCH SQL clause
to retrieve the most similar chunks.
7. Run a multi-vector similarity search to find, within your books, the first four
most relevant chunks in the first two most relevant books.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8. Create an In-Memory Neighbor Graph Vector Index on the vector embeddings that
you created
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~
When dealing with huge vector embedding spaces, you may want to create vector
indexes to accelerate your similarity searches.
Instead of scanning each and every vector embedding in your table, a vector index
uses heuristics to reduce the search space
to accelerate the similarity search. This is called approximate similarity search.
JSON_SERIALIZE(IDX_PARAMSRETURNINGVARCHAR2PRETTY)
________________________________________________________________
{
"type" : "HNSW",
"num_neighbors" : 32,
"efConstruction" : 300,
"distance" : "COSINE",
"accuracy" : 95,
"vector_type" : "FLOAT32",
"vector_dimension" : 384,
"degree_of_parallelism" : 1,
"pdb_id" : 3,
"indexed_col" : "CHUNK_EMBEDDING"
}
To get an idea about the size of your In-Memory Neighbor Graph Vector Index in
memory, you can use the V$VECTOR_MEMORY_POOL view.
See Size the Vector Pool for more information about sizing the vector pool to allow
for vector index creation and maintenance
Use the VECTOR_DISTANCE function and the FETCH APPROX SQL clause to retrieve the
most similar chunks using your vector index
11. Determine your vector index performance for your approximate similarity
searches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~
After a vector index is created, you may be interested to know how accurate your
approximate vector searches are.
The index accuracy reporting feature allows you to determine the accuracy of your
vector indexes.
SQL>
SET SERVEROUTPUT ON
declare
report varchar2(128);
begin
report := dbms_vector.index_accuracy_query(
OWNER_NAME => 'VECTOR',
INDEX_NAME => 'DOCS_HNSW_IDX',
qv => :query_vector,
top_K => 10,
target_accuracy => 90 );
dbms_output.put_line(report);
end;
/
===================================================================================
=====================
A comprehensive example of importing pretrained ONNX embedding model and generating
vector embeddings
===================================================================================
=====================
Instead of DBMS_VECTOR package, you can also use DBMS_DATA_MINING package to import
the pretrained ONNX embedding model.
Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure to import the model and
declare the input name.
The following code gives an example:
CONN dmuser/<password>@<pdbname>;
DECLARE
m_blob BLOB default empty_blob();
m_src_loc BFILE ;
BEGIN
DBMS_LOB.createtemporary (m_blob, FALSE);
m_src_loc := BFILENAME('DM_DUMP', 'my_embedding_model.onnx');
DBMS_LOB.fileopen (m_src_loc, DBMS_LOB.file_readonly);
DBMS_LOB.loadfromfile (m_blob, m_src_loc, DBMS_LOB.getlength (m_src_loc));
DBMS_LOB.CLOSE(m_src_loc);
DBMS_DATA_MINING.import_onnx_model ('doc_model', m_blob, JSON('{"function" :
"embedding", "embeddingOutput" : "embedding", "input": {"input": ["DATA"]}}'));
DBMS_LOB.freetemporary (m_blob);
END;
/
DECLARE
model_source BLOB := NULL;
BEGIN
-- get BLOB holding onnx model
model_source := DBMS_CLOUD.GET_OBJECT(
credential_name => 'myCredential',
object_uri => 'https://objectstorage.us-phoenix -1.oraclecloud.com/' ||
'n/namespace -string/b/bucketname/o/myONNXmodel.onnx');
DBMS_DATA_MINING.IMPORT_ONNX_MODEL(
"myonnxmodel",
model_source,
JSON('{ function : "embedding" })
);
END;
/
===================================================================================
=====================
A comprehensive example of importing pretrained ONNX embedding model using
DBMS_DATA_MINING package
===================================================================================
=====================
Instead of DBMS_VECTOR package, you can also use DBMS_DATA_MINING package to import
the pretrained ONNX embedding model.
The following code gives an example:
===================================================================================
=====================
EASY TO UNDERSTAND EXAMPLE WITHOUT THE PDF DOCUMENT SCENARIO. DIRECT TEXT CONTENT
SCENARIO
===================================================================================
=====================
conn docuser/password@CDB_PDB;
SET ECHO ON
SET FEEDBACK 1
SET NUMWIDTH 10
SET LINESIZE 80
SET TRIMSPOOL ON
SET TAB OFF
SET PAGESIZE 10000
SET LONG 10000
commit;
desc doc_chunks;
set linesize 100
set long 1000
col id for 999
col chunk_id for 99999
col chunk_offset for 99999
col chunk_length for 99999
col chunk for a30
col vector for a100
select id, chunk_id, chunk_offset, chunk_length, chunk from doc_chunks;
select vector from doc_chunks where rownum <= 1;
===================================================================================
=====================
SQL RAG EXAMPLE
===================================================================================
=====================
This scenario allows you to run a similarity search for specific documentation
content based on a user query.
Once documentation chunks are retrieved, they are concatenated and a prompt is
generated to ask an LLM to answer the user question
using retrieved chunks.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++
conn sys/password AS sysdba
BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => '*',
ace => xs$ace_type(privilege_list => xs$name_list('connect'),
principal_name => 'VECTOR',
principal_type => xs_acl.ptype_db));
END;
/
conn docuser/password;
BEGIN
DBMS_VECTOR_CHAIN.DROP_CREDENTIAL(credential_name => 'OCI_CRED');
EXCEPTION
WHEN OTHERS THEN NULL;
END;
/
DECLARE
jo json_object_t;
BEGIN
jo := json_object_t();
jo.put('user_ocid', '<user ocid>');
jo.put('tenancy_ocid', '<tenancy ocid>');
jo.put('compartment_ocid', '<compartment ocid>');
jo.put('private_key', '<private key>');
jo.put('fingerprint', '<fingerprint>');
DBMS_OUTPUT.PUT_LINE(jo.to_string);
DBMS_VECTOR_CHAIN.CREATE_CREDENTIAL(
credential_name => 'OCID_CRED',
params => json(jo.to_string));
END;
/
DECLARE
input CLOB;
params CLOB;
output CLOB;
BEGIN
input := :prompt;
params := '{
"provider" : "ocigenai",
"credential_name" : "OCI_CRED",
"url" :
https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/
generateText,
"model" : "cohere.command"
}';
output := DBMS_VECTOR_CHAIN.UTL_TO_GENERATE_TEXT(input, json(params));
DBMS_OUTPUT.PUT_LINE(output);
IF output IS NOT NULL THEN
DBMS_LOB.FREETEMPORARY(output);
END IF;
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE(SQLERRM);
DBMS_OUTPUT.PUT_LINE(SQLCODE);
END;
/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++