0% found this document useful (0 votes)
143 views14 pages

Oracle AI Vector Search Professional

The document contains a series of questions and answers related to vector embeddings, similarity search, and the use of Oracle Database 23ai. It covers topics such as storage options for vector embeddings, the importance of using the same embedding model for similarity searches, and SQL operations related to vector columns. Additionally, it discusses the integration of Generative AI services and the use of specific PL/SQL packages and functions for managing vector data.

Uploaded by

shibijith.m24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views14 pages

Oracle AI Vector Search Professional

The document contains a series of questions and answers related to vector embeddings, similarity search, and the use of Oracle Database 23ai. It covers topics such as storage options for vector embeddings, the importance of using the same embedding model for similarity searches, and SQL operations related to vector columns. Additionally, it discusses the integration of Generative AI services and the use of specific PL/SQL packages and functions for managing vector data.

Uploaded by

shibijith.m24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1.

When generating vector embeddings outside the database, what is the most suitable option
for storing the embeddings for later use?

in a CSV file.
In a binary FVEC file with the relational data in a CSV file.
In the database as BLOB (Binary Large Object) data
In a dedicated vector database.

2. When generating vector embeddings for a new dataset outside of Oracle Database 23ai,
which factor is crucial to ensure meaningful similarity search results?
The choice of programming language used to process the dataset (for example, Python,
Java).

The physical location where the vector embeddings are stored.


The storage format of the new dataset (for example, CSV, JSON).
The same vector embedding model must be used for vectorizing the data and creating a
query vector.

3. You are working with vector search in Oracle Database 23ai and need to ensure the integrity
of your vector data during storage and retrieval. Which factor is crucial for maintaining the
accuracy and reliability of your vector search results?

Using the same embedding model for both vector creation and similarity search.
Regularly updating vector embeddings to reflect changes in the source data.
The specific distance algorithm employed for vector comparisons.
The physical storage location of the vector data.

4. Which DDL operation is NOT permitted on a table containing a VECTOR column in Oracle
Database 23ai?

Creating a new table using CTAS CREATE TABLE AS SELECT that includes the VECTOR column
from the original table.
Dropping an existing VECTOR column from the table.
Modifying the data type of an existing VECTOR column to a non-VECTOR type.
Adding a new VECTOR column to the table.

5. Which SQL statement correctly adds a VECTOR column named v with 4 dimensions and
FLOAT 32 format to an existing table named my table?.

ALTER TABLE my_table MODIFY (V VECTOR (4, FLOAT32)).


ALTER TABLE my_table ADD (V VECTOR (4, FLOAT32)).
UPDATE my_table SET v - VECTOR (4, FLOAT32).
ALTER TABLE my_table ADD v VECTOR (4, FLOAT32).
6. A machine learning team is using IVF indexes in Oracle Database 23ai to find similar images
in a large dataset. During testing, they observe that the search results are often incomplete,
missing relevant images. They suspect the issue lies in the number of partitions probed. How
should they improve the search accuracy?

Add the TARGET ACCURACY clause to the query with a higher value for the accuracy.
Change the index type to HNSW for better accuracy.
Increase the VECTOR MEMORY SIZE initialization parameter.
Re-create the index with a higher EFCONSTRUCTION value.

7. What happens when querying with an IVF index if you increase the value of the NEIGHBOR
PARTITION probes parameter?
The number of centroids decreases.
Accuracy decreases.
Index creation time is reduced.
More partitions are probed, improving accuracy, but also increasing query latency.

8. Which PL/SQL package is primarily used for interacting with Generative Al services in Oracle
Database 23ai?
DBMS AI.
DBMS ML.
DBMS VECTOR CHAIN.
DBMS GENAI.

9. Which SQL function is used to create a vector embedding for a given text string in Oracle
Database 23ai?

GENERATE EMBEDDING.
CREATE VECTOR_EMBEDDING.
EMBED TEXT.
VECTOR EMBEDDING.

10. Which PL/SQL function converts documents such as PDF, DOC, JSON, XML, or HTML to plain
text?

DBMS VECTOR.TEXT_TO_PLAIN.
DBMS VECTOR_CHAIN. UTL TO TEXT.
DBMS VECTOR CHAIN.UTIL_TO_CHUNKS.
DBMS VECTOR.CONVERT_TO_TEXT.
11. What is the primary purpose of the DBMS_VECTOR_CHAIN_UTL_TO_CHUNS package in a
RAG application?

To generate vector embeddings from a text document.


To load a document into the database.
To split a large document into smaller chunks to improve vector quality by minimizing token
truncation.
To convert a document into a single, large text string.

12. What is the first step in setting up the practice environment for Select Al?

Optionally create an OCI compartment.


Create a policy to enable access to OCI Generative Al.
Drop any compartment that does not use OCI Generative Al.

13. How is the security interaction between Autonomous Database and OCI Generative Al
managed in the context of Select Al?

a) By encrypting all communication between the Autonomous Database and OCI


Generative Al using TLS/SSL protocols.
b) By utilizing Resource Principals, which grant the Autonomous Database instance
access to OCI Generative Al without exposing sensitive credentials.
c) By establishing a secure VPN tunnel between the Autonomous Database and OCI
Generative Al service.
d) By requiring users to manually enter their OCI API keys each time they execute a
natural language query.

14. You are storing 1,000 embeddings in a VECTOR column, each with 256 dimensions using
FLOAT32. What is the approximate size of the data on disk?
a) 1 MB.
b) 4 MB.
c) 256 KB.
d) 1 GB.

15. Which Oracle Cloud Infrastructure (OCI) service is directly integrated with Select Al?

a) 000 Language.
b) OCI Generative Al.
c) OCT Vision.
d) OCI Data Science.

16. Which is NOT a feature or capability related to Al and Vector Search in Exadata?
a) Native Support for Vector Search Only within the Database Server.
b) Vector Replication with Golden Gate.
c) Loading Vector Data using SQL *Loader.
d) Al Smart Scan.

17. Which statement best describes the core functionality and benefit of Retrieval Augmented
Generation (RAG) in Oracle Database 23ai?

a) It empowers LLMs to interact with private enterprise data stored within the
database, leading to more context-aware and precise responses to user queries.
b) It primarily aims to optimize the performance and efficiency of LLMs by using
advanced data retrieval techniques, thus minimizing response times, and reducing
computational overhead.
c) It allows users to train their own specialized LLMs directly within the Oracle
Database environment using their internal data, thereby reducing the reliance on
external Al providers.
d) It enables Large Language Models (LLMs) to access and process real-time data
streams from diverse sources to generate the most up-to-date insights.

18. If a query vector uses a different distance metric than the one used to create the index, what
happens?
The query fails.
An exact match search is triggered.
The index automatically updates.
A warning is logged, but the query executes.

19. What are the key advantages and considerations of using Retrieval Augmented Generation
(RAG) in the context of Oracle Al Vector Search?

It excels at optimizing the performance and efficiency of LLM inference through advanced
caching and precomputation techniques, leading to faster response times but potentially
storage requirements.
It prioritizes real-time data extraction and summarization from various sources to ensure the
LLM always has the most up-to-date information.
It focuses on training specialized LLMs within the database environment for specific tasks,
offering greater control over model behavior and data privacy but potentially requiring more
development effort.
It leverages existing database security and access controls, thereby enabling secure and
controlled access to both the database content and the LLM.

20. Which Python library is used to vectorize text chunks and the user's question in the following
example?
import oracledb
connection oracledb, connect (uner-un, password-pw, den-es)
table name - Page
with connection.cursor() as cursort
Create the table
create_table_sql
CREATE TABLE IF NOT EXISTS (table_name) (
id NUMBER PRIMARY KEY,
payload CLOR CHECK (payload TS JSON).
vector VECTOR)
try:
cursor.execute(create_table_sql)
except oracledb.DatabaseError as es:
raise
connection.autocommit True
from sentence_transformers import Sentence Transformer encoder
Sentence Transformer ('all MiniLM-L12-v2').

a) sentence_transformers.
b) oci.
c) oracledb.
d) Json.

21. What is the function of the COSINE parameter in the SQL query used to retrieve similar
vectors?
‘’’
topk = 3
sqlf"select payload, vector distance (vector, vector, COSINE) as score
from (table_name) order by score fetch approxirat (topk) rows only"".

It filters out vectors with a cosine similarity below a certain threshold.


It converts the vectors to a format compatible with the SQL database.
It indicates that the cosine distance metric should be used to measure similarity
between vectors.
It specifies the type of vector encoding used in the database.

22. You are tasked with finding the closest matching sentences across books, where each book
has multiple paragraphs and sentences. Which SQL structure should you use?
a. GROUP BY with vector operations.
b. FETCH PARTITIONS BY clause.
c. A nested query with ORDER BY.
d. Exact similarity search with a single query vector.
23. In the following Python code, what is the significance of prepending the source filename to
each text chunk before storing it in the vector database?
‘’’
docs = [{"text": filename + "/" + section, 'path': filename} for filename, sections in faqs.item()
for section in sections]
# Sample the resulting data
docs [:2].

‘’’

a) It preserves context and aids in the retrieval process by associating each


vectorized chunk with its original source file.
b) It helps differentiate between chunks from different files but has no impact
on vectorization.
c) It speeds up the vectorization process by providing a unique identifier for
each chunk.
d) It improves the accuracy of the LLM by providing additional training data.

24. How does an application use vector similarity search to retrieve relevant information from a
database, and how is this information then integrated into the generation process?

a) Encodes the question and database chunks into vectors, finds the most
similar using cosine similarity, and includes them in the LLM prompt.
b) Trains a separate LLM on the database and uses it to answer, ignoring the
general LLM.
c) Converts the question to keywords, searches for matches, and inserts the
text into the response.
d) Clusters similar text chunks and randomly selects one from the most
relevant cluster.

25. When using SQL "Loader to load vector data for search applications, what is a critical
consideration regarding the formatting of the vector data within the input CSV file?

a) Enclose vector components in curly braces ({}).


b) As FVEC is a binary format and the vector dimensions have a known width,
fixed offsets can be used to make parsing the vectors fast and efficient.
c) Use sparse format for vector data.
d) Rely on SQL "Loader's automatic normalization of vector data.

26. Which function is used to generate vector embeddings within an Oracle database?

a) DBMS_VECTOR_CHAIN.UTL_TO_CHUNKS.
b) DBMS_VECTOR_CHAIN.UTL_TO_TEXT.
c) DBMS_VECTOR_CHAIN.UTL_TO_EMBEDDINGS.
d) DBMS_VECTOR_CHAIN.UTL_TO_GENERATE_TEXT.
27. Which statement best describes the capability of Oracle Data Pump for handling vector data
in the context of vector search applications?

a) Data Pump only export and import vector data if the vector embeddings are stored
as BLOB (Binary Large Object) data types in the database.
b) Data Pump treats vector embeddings as regular text strings, which can lead to data
corruption or loss of precision when transferring vector data for vector search.
c) Data Pump provides native support for exporting and importing tables containing
vector data types, facilitating the transfer of vector data for vector search
applications.
d) Because of the complexity of vector data, Data Pump requires a specialized plug-in
to handle the export and import operations involving vector data types.

28. What happens when you attempt to insert a vector with an incorrect number of dimensions
into a VECTOR column with a defined number of dimensions?

a) The database truncates the vector to fit the defined dimensions.


b) The database pads the vector with zeros to match the defined dimensions.
c) The insert operation fails, and an error message is thrown.
d) The database ignores the defined dimensions and inserts the vector as is.

29. In Oracle Database 23ai, which data type is used to store vector embeddings for similarity
search?

a) VECTOR2.
b) BLOB.
c) VECTOR.
d) VARCHAR2.

30. What is created to facilitate the use of OCI Generative Al with Autonomous Database?

a) An Al profile for OCI Generative Al.


b) A dedicated OCI compartment.
c) A new user account with elevated privileges.

31. Why would you choose to NOT define a specific size for the VECTOR column during
development?

a. It impacts the accuracy of similarity searches.


b. It restricts the database to a single embedding model.
c. It limits the length of text that can be vectorized.
d. Different external embedding models produce vectors with varying dimensions and
data types.

32. What is the correct order of steps for building a RAG application using PL/SQL in Oracle
Database 23ai?

a. Load ONNX Model, Vectorize Question, Load Document, Split Text into Chunks,
Create Embeddings, Perform Vector Search, Generate Output.
b. Load Document, Split Text into Chunks, Load ONNX Model, Create Embeddings,
Vectorize Question, Perform Vector Search, Generate Output.
c. Data pumlVectorize Question, Load ONNX Model, Load Document, Split Text into
Chunks, Create Embeddings, Perform Vector Search, Generate Output.
d. Load Document, Load ONNX Model, Split Text into Chunks, Create Embeddings,
Vectorize Question, Perform Vector Search, Generate Output.

33. What is the primary purpose of a similarity search in Oracle Database 23ai?

a. optimize relational database operations.


b. to compute distances between all data points in a database.
c. To find exact matches in BLOB data.
d. To retrieve the most semantically similar entries using distance metrics between
different vectors.

34. What is the advantage of using Euclidean Squared Distance rather than Euclidean Distance in
similarity search queries.

a. it is the default distance metric for Oracle Al Vector Search.


b. It supports hierarchical partitioning of vectors.
c. It is simpler and faster because it avoids square-root calculations.
d. It guarantees higher accuracy than Euclidean Distance.

35. You need to prioritize accuracy over speed in a similarity search for a dataset of images.
Which should you use?

a. Approximate similarity search with HNSW indexing and target accuracy of 70%.
b. Multisector similarity search with partitioning.
c. Exact similarity search using a full table scan.
d. Approximate similarity search with IVF indexing and target accuracy of 70%.

36. What is the significance of splitting text into chunks in the process of loading data into Oracle
Al Vector Search?
a. To reduce the computational burden on the embedding model.
b. To facilitate parallel processing of the data during vectorization.
c. To minimize token truncation as each vector embedding model has its own
maximum token limit.

37. What is the purpose of the VECTOR_DISTANCE function in Oracle Database 23ai similarity
search?

a. To fetch rows that match exact vector embeddings.


b. To create vector indexes for efficient searches.
c. To group vectors by their exact scores.
d. To calculate the distance between vectors using a specified metric.

38. You are asked with creating a table to store vector embeddings with the following
characteristics: Each vector must have exactly 512 dimensions. The dimensions should be
stored as 32-bit floating point numbers. Which SQL statement should you use?

a. CREATE Table vectors (id NUMBER, embedding VECTOR (512)).


b. CREATE TABLE vectors (id NUMBER, embedding VECTOR).
c. CREATE TabLE vectors (id NUMBER, embedding VECTOR (*, INT8)).
d. CREATE TABLE vectors (id NUMBER, embedding VECTOR (512, FLOAT32));.

39. Which function should you use to determine the storage format of a vector?

a. VECTOR_DIMENSION_FORMAT.
b. VECTOR_CHUNKS.
c. VECTOR_NORM.
d. VECTOR_EMBEDDING.

40. What security enhancement is introduced in Exadata system software 24ai?

a. Integration with third party security tool.


b. Enhanced encryption algorithm for data at rest.
c. Snmp security

RAG

41. You need to generate a vector from the string [1.2, 3.4] in FLOAT32 format with 2
dimensions. Which function will you use?

a. TO_VECTOR.
b. VECTOR_DISTANCE.
c. FROM_VECTOR.
d. VECTOR_SERIALIZE.
42. What is the primary purpose of the VECTOR_EMBEDDING function in Oracle Database 23ai?

a. To calculate vector dimensions.


b. To calculate vector distances.
c. To serialize vectors into a string.
d. To generate a single vector embedding for data.

43. What is a key characteristic of HNSW vector indexes?

a. They are hierarchical with multilayered connections.


b. They require exact match for searches.
c. They are disk-based structures.
d. They use hash-based clustering.

44. What is the primary function of Al Smart Scan in Exadata System Software 24ai?

a. To provide real-time monitoring and diagnostics for Al applications.


b. To accelerate Al workloads by leveraging Exadata RDMA Memory (XRMEM), Exadata
Smart Cache, and on-storage processing.
c. To automatically optimize database queries for improved performance.

45. Which parameter is used to define the number of closest vector candidates considered
during HNSW index creation?

a. EFCONSTRUCTION.
b. VECTOR_MEMORY_SIZE.
c. NEIGHBOURS.
d. TARGET_ACCURACY.

46. You want to quickly retrieve the top-10 matches for a query vector from a dataset of billions
of vectors, prioritizing speed over exact accuracy. What is the best approach?

a. Exact similarity search using flat search.


b. Approximate similarity search with a low target accuracy setting.
c. Relational filtering combined with an exact search.
d. Exact similarity search with a high target accuracy setting.

47. Which is a characteristic of an approximate similarity search in Oracle Database 23ai?

a. It compares every vector in the dataset.


b. It trades off accuracy for faster performance.
c. Always guarantees 100% accuracy.
d. It is slower than exact similarity search.

48. Which operation is NOT permitted on tables containing VECTOR columns?

a. SELECT.
b. UPDATE.
c. DELETE.
d. JOIN ON VECTOR columns.

49. Which is a characteristic of an approximate similarity search in Oracle Database 23ai?

a. It compares every vector in the dataset.


b. trades off accuracy for faster performance.
c. always guarantees 100% accuracy.
d. it is slower than exact similarity search.

50. You are asked to fetch the top five vectors nearest to a query vector, but only for a specific
category of documents. Which query structure should you use?

a. Use UNION ALL with vector operations.


b. Perform the similarity search without a WHERE clause.
c. Apply relational filters and a similarity search in the query.
d. Use VECTOR_INDEX_HINT and NO WHERE clause.

51. What is the primary function of an embedding model in the context of vector search?

a. To define the schema for a vector database.


b. To execute similarity search operations within a database.
c. To transform text or data into numerical vector representations.
d. To store vectors in a structured format for efficient retrieval.

52. What is the significance of using local ONNX models for embedding within the database?

a. Support for legacy SQL*Plus clients.


b. Improved accuracy compared to external models.
c. Reduced embedding dimensions for faster processing.
d. Enhanced security because data remains within the database.

53. Which of the following actions will result in an error when using
VECTOR_DIMENSION_COUNT () in Oracle Database 23ai?
a. Providing a vector with a dimensionality that exceeds the specified dimension count.
b. Using a vector with a data type that is not supported by the function.
c. Providing a vector with duplicate values for its components.
d. Calling the function on a vector that has been created with to vector().

54. An application needs to fetch the top-3 matching sentences from a dataset of books while
ensuring a balance between speed and accuracy. Which query structure should you use?

a. Approximate similarity search with the VECTOR_DISTANCE function.


b. Exact similarity search with Euclidean distance.
c. Multisector similarity search with approximate fetching and target accuracy.
d. A combination of relational filters and similarity search.

55. You are asked with finding the closest matching sentences across books, where each book
has multiple paragraphs and sentences. Which SQL structure should you use?

a. A nested query with ORDER BY.


b. Exact similarity search with a single query vector.
c. GROUP BY with vector operations.
d. FETCH PARTITIONS BY clause.

56. What is the primary difference between the HNSW and IVF vector indexes in Oracle
Database 23ai?

a. Both operate identically but differ in memory usage.


b. OHNSW guarantees accuracy, whereas IVF sacrifices performance for accuracy.
c. OHNSW uses an in-memory neighbor graph for faster approximate searches,
whereas IVF use the buffer cache with partitions.
d. OHNSW is partition based, whereas IVF uses neighbor graphs for indexing.

57. A database administrator wants to change the VECTOR MEMORY SIZE parameter for a
pluggable database (PDB) in Oracle Database 23a. Which SQL command is correct?

a. ALTER SYSTEM SET vector memory size-1G SCOPE=BOTH.


b. ALTER DATABASE SET vector_memory_size=1G SCOPE-VECTOR/.
c. ALTER SYSTEM SET vector_memory_size=1G SCOPE=SGA.
d. ALTER SYSTEM RESET vector_memory_size;

58. Which vector index available in Oracle Database 23ai is known for its speed and accuracy,
making it a preferred choice for vector search?
a. Binary Tree (BT) index.
b. Inverted File System (IFS) index.
c. Inverted File System (IFS) index.
d. Hierarchical Navigable Small World (HNSW) index.

59. What is the purpose of the Vector Pool in Oracle Database 23ai?

a. To manage database partitioning.


b. To store HNSW vector indexes and IVF index metadata.
c. To enable longer SQL execution.
d. To store non-vector data types.

60. What is the default distance metric used by the VECTOR DISTANCE function if none is
specified?
a. Euclidean.
b. Hamming.
c. Cosine.
d. Manhattan.

61. In Oracle Database 23ai, which SQL function calculates the distance between two vectors
using the Euclidean metric?

a. L1 DISTANCE.
b. L2 DISTANCE.
c. HAMMING DISTANCE.
d. COSINE DISTANCE.

62. What is a key advantage of using Goldengate 23ai for managing and distributing vector data
for Al applications?

a. Real-time vector data updates across locations.


b. Automatic translation of vector embeddings between formats.
c. Specialized vector embedding compression.
d. Built-in version control for vector data.

63. What happens when you attempt to insert a vector with an incorrect number of dimensions
into a VECTOR column with a defined number of dimensions?

a. The database pads the vector with zeros to match the defined dimensions.
b. The database ignores the defined dimensions and inserts the vector as is.
c. The database truncates the vector to fit the defined dimensions.
d. The insert operation fails, and an error message is thrown.
64. Which function should you use to determine the storage format of a vector?

a. VECTOR_CHUNKS
b. VECTOR_EMBEDDING
c. VECTOR_NORM
d. VECTOR_DIMENSION_FORMAT

65. What is a key characteristic of HNSW vector indexes?


a. They are hierarchical with multilayered connections
b. They require exact match for searches.
c. They are disk-based structure
d. They use hash-based clustering
66. Which operation is not permitted on tables containing VECTOR columns?
a. SELECT
b. UPDATE
c. DELETE
d. JOIN on VECTOR COLUMNS

67. What is the first step in setting up the practice environment for Select AI?
a. Optionally create an OCI compartment
b. Create a policy to enable access to OCI Generative AI
c. Drop any compartment that does not use OCI Generative AI
d. Create a new user account with elevated privileges

68. You are tasked with creating a table to store vector embeddings with the followig
characterstics:Each vector must have exactly 512 dimensions ,aand the dimesnions should be
stored as 32 bitfloating poit numbers.Which sql statement should you use?

a. CREATE TABLE vectors (id NUMBER,embedding VECTOR(512))


b. CREATE TABLE vectors(id NUMBER,embedding VECTOR)
c. CREATE TABLE vectors(id NUMBER,embedding VECTOR(*,INT8))
d. CREATE TABLE vectors(id NUMBER,embedding VECTOR(512,FLOAT32)

You might also like