0% found this document useful (0 votes)

324 views8 pages

Vector Database

Uploaded by

jevos67504

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

324 views8 pages

Vector Database

Uploaded by

jevos67504

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

396

CONTENTS

• About Vector Databases

Getting Started
• Key Concepts of Vector Databases
− Embeddings and Dimensions
− Distance Metrics and Similarity

With Vector Databases − Vector Indexes

− Scalability
− Use Cases
• Getting Started

• Conclusion

MIGUEL GARCÍA LORENZO

VP OF ENGINEERING, NEXTAIL

Vector databases are specialized databases designed for scenarios Figure 1: Vector database overview
where understanding the context, similarity, or pattern is more
important than matching exact values. Leveraging the mathematics
of vectors and the principles of geometry to understand and organize
the data, these capabilities are essential to boosting the power of
analytical and generative artificial intelligence (AI).

The explosion of AI and machine learning (ML) technologies is the key

driver behind the rapid growth of vector databases in the last two
years, providing greater value via performance, agility, and cost.

Unlike other evolutions in databases, vector databases were not made

to replace any technology but to solve new cases for which there was
no existing technological alternative. The main purpose of this Refcard
is to provide a clear and accessible overview of vector databases,
outlining their importance, applications, and underlying principles.

In addition, we will use a functional example throughout to better

demonstrate key points and objectives.

ABOUT VECTOR DATABASES

A vector database is a specialized database for storing,
searching, and managing information as vectors, which are the
numerical representation of objects in a high-dimensional space
(e.g., documents, text, images, videos, audio) that capture certain
features of the object itself.

This numerical representation is called a vector embedding, or simply

embedding, which we will dive into more detail later on.

© DZONE | REFCARD | APRIL 2024 1

REFCARD | GETTING STARTED WITH VECTOR DATABASES

Vector embeddings are created using ML models that are able to • Support complex queries and APIs: Enable complex queries
translate the semantic and qualitative value of the object into a that combine vector similarity searches with traditional
numerical representation. There are a variety of ML models for each database queries.
data type, such as text, audio, image, and other embedding models. • Security and access control: Contain built-in security features,
The use of a vector database is not a mandatory requirement to be able such as authentication and authorization, data encryption,
to generate or use vector embeddings. This is because there are many data isolation, and access control mechanisms, that are
vector index libraries focused on storing embeddings with in-memory
essential for enterprise applications and compliance with data
indexes, but vector databases are highly recommended for enterprise
protection regulations.
architectures, production, and when working with high concurrency
• Seamless integration and SDKs: Integrate seamlessly with
and data volume.
existing data ecosystems, providing integration libraries
Nowadays, vector databases are designed to support the association of for several programming languages, a variety of APIs (e.g.,
that embedding with the object metadata, which can include a variety GraphQL, RESTful), and integrations with Apache Kafka.
of information such as the structured definition and object definition.
• Support for CRUD operations: Vector databases allow you to
Having this information alongside vectors enables more sophisticated
add, update, and delete objects with their vectors. This is so
querying, filtering, and management of capabilities that are similar to
that users don't have to reindex the entire database when any
the queries made in traditional databases. This certainly makes vector
underlying data changes.
databases more integrable, versatile, and interpretable with end users
and within data architectures. TRADITIONAL RELATIONAL vs. VECTOR DATABASE
Traditional or relational databases are indispensable for applications
Figure 2: Metadata
requiring structured and semi-structured data that will return the exact
match to the query. These databases store the information in rows or
documents, and at the end of each row, there is a record that provides
structured information such as product attributes or customer details.

Vector databases, on the other hand, are optimized for storing and
searching through high-dimensional vector data that will return items
based on similarity metrics rather than exact matches.

Figure 3: Differences between traditional and vector databases

Vector databases are a complete system designed to manage

embeddings at scale. Here are the key differentiators and advantages
of using vector databases:

• Persistence and durability: Allow data to be stored on disk as

well as in-memory and provide fault-tolerant features like data
replication or regular backups.
• High availability and reliability: Operate continuously and
provide tolerance to failures and errors based on clustering and
data replication architectures.
• Scalability: Scale horizontally across multiple nodes.
• Optimized performance and cost effectiveness: Handle
and organize data through high-dimensional vectors that can
contain thousands of dimensions.

© DZONE | REFCARD | APRIL 2024 2

REFCARD | GETTING STARTED WITH VECTOR DATABASES

KEY CONCEPTS OF VECTOR DATABASES Obviously, with two dimensions, we cannot capture the essence of
Using vector databases involves understanding their fundamental the products. Dimensionality plays a crucial role in how well these
concepts: embeddings, indexes, and distance and similarity. embeddings can capture the relevant features of the products. More
dimensions may provide more accuracy but also more resources in
EMBEDDINGS AND DIMENSIONS
terms of compute, memory, latency, and cost.
As we explained previously, embeddings are numerical
representations of objects that capture their semantic meaning and VECTOR EMBEDDING MODELS INTEGRATION
relationships in a high-dimensional space that includes semantic Some vector databases provide seamless integration with embedding
relationships, contextual usage, or features. This numerical models, allowing us to generate vector embeddings from raw data
representation is composed by an array of numbers in which each and seamlessly integrate ML models into database operations. This
element corresponds to a specific dimension. feature simplifies the development process and abstracts away the

Figure 4: Embedding representation complexities involved in generating and using vector embeddings for
both data insertion and querying processes.

Figure 7: Embeddings generation patterns

The number of dimensions in embeddings are so important because

each dimension corresponds to a feature that we capture from the
object. It is represented as a numerical and quantitative value, and it
also defines the dimensional map where each object will be located.

Let’s consider a simple example with a numerical representation

of words, where the words are the definition of each fashion retail
product stored in our transaction database. Imagine if we could
capture the essence of these targets with only two dimensions.

Figure 5: Array of embeddings

In Figure 6, we can see the dimensional representation of these objects

to visualize their similarity. T-shirts are closer because both are the same
product with different colors. The jacket is closer to t-shirts because
they share attributes like sleeves and a collar. Furthest to the right are
the jeans, which don't share attributes with the other products.

Figure 6: Dimensional map

© DZONE | REFCARD | APRIL 2024 3

REFCARD | GETTING STARTED WITH VECTOR DATABASES

Table 1: Embedding generation comparative MANHATTAN DISTANCE

Manhattan distance (L1 norm) sums the absolute differences of their
WITH MODEL
EXAMPLES WITHOUT INTEGRATION
INTEGRATIONS coordinates.
Data 1. Before we can insert each We can insert each
Figure 10: Manhattan
ingestion object, we must call our Model to object directly in
generate a vector embedding. the vector database,
2. Then we can insert our data delegating the
with the vector. transformation to
the database.

Query 1. Before we run a query, we must We can run a query

call our Model to generate a vector directly in the vector
embedding from our query first. database, delegating
2. Then we can run a query with the transformation to
that vector. the database.

DISTANCE METRICS AND SIMILARITY

Distance metrics are mathematical measures and functions used to The choice of distance metric and similarity measure has a profound
determine the distance (similarity) between two elements in a vector impact on the behavior and performance of ML models; however, the
space. In the context of embeddings, distance metrics evaluate how recommendation is to use the same distance metric as the metric used
far apart two embeddings are. A similarity query search retrieves the to train the given model.
embeddings that are similar to a given input based on a distance metric;
VECTOR INDEXES
this input can be a vector embedding, text, or another object. There are
Vector indexes are specialized data structures designed to efficiently
several distance metrics. The most popular ones are the following.
store, organize, and query high-dimensional vector embeddings.
COSINE SIMILARITY These indexes provide fast search queries in a cost-effective way. There
Cosine similarity measures the cosine of the angle between two vector are several indexing strategies that are optimized for handling the
embeddings, and it's often used as a distance metric in text analysis complexity and scale of the vector space. Some examples include:
and other domains where the magnitude of the vector is less important • Approximate nearest neighbor (ANN)
than the direction.
• Inverted index
Figure 8: Cosine
• Locality-sensitive hashing (LSH)

Generally, each database implements a subset of these index strategies,

and in some cases, they are customized for better performance.

SCALABILITY
Vector databases are usually highly scalable solutions that support
vertical and horizontal scaling. Horizontal scaling is based on two
fundamental strategies: sharding and replication. Both strategies are
crucial for managing large-scale and distributed databases.

EUCLIDEAN DISTANCE SHARDING

Euclidean distance measures the straight-line distance between two Sharding involves dividing a database into smaller, more manageable

points in Euclidean space. pieces called shards. Each shard contains a subset of the database's
data, making it responsible for a particular segment of the data.
Figure 9: Euclidean
Table 2: Key sharding advantages and considerations

ADVANTAGES CONSIDERATIONS

By distributing the data across Implementing sharding can

multiple servers, sharding can be complex, especially in
reduce the load on any single server, terms of data distribution,
leading to improved performance. shard management, and query
processing across shards.

TABLE CONTINUES ON NEXT PAGE

© DZONE | REFCARD | APRIL 2024 4

REFCARD | GETTING STARTED WITH VECTOR DATABASES

Sharding allows a database to Ensuring even distribution of VECTOR DATA IN GENERATIVE AI: RETRIEVAL-
scale by adding more shards across data and avoiding hotspots AUGMENTED GENERATION
additional servers, effectively where one shard receives Generative AI and large language models (LLMs) have certain
handling more data and users significantly more queries than
limitations given they must be trained with a large amount of data.
without degradation in performance. others can be challenging.
These trainings impose high costs in terms of time, resources, and
It can be cost effective to add more Query throughput does not
money. As a result, these models are usually trained with general
servers with moderate specifications improve when adding more
than to scale up a single server with sharded nodes. contexts and are not constantly updated with the latest information.
high specifications.
Retrieval-augmented generation (RAG) plays a crucial role because
it was developed to improve the response quality in specific contexts
REPLICATION
using a technique that incorporates an external source of relevant and
Replication involves creating copies of a database on multiple nodes
updated information into the generative process. A vector database is
within the cluster.
particularly well suited for implementing RAG models due to its unique

Table 3: Key advantages and considerations for replication capabilities in handling high-dimensional data, performing efficient
similarity searches, and integrating seamlessly with AI/ML workflows.
ADVANTAGES CONSIDERATIONS
Figure 11: Overview of RAG architecture
Replication ensures that the Maintaining data consistency across
database remains available replicas, especially in write-heavy
for read operations even if environments, can be challenging
some servers are down. and may require sophisticated
synchronization mechanisms.

Replication provides a Replication requires additional storage

mechanism for disaster and network resources, as data is
recovery as data is backed duplicated across multiple servers.
up across multiple locations.

Replication can improve In asynchronous replication setups,

the read scalability of a there can be a lag between when data is
database system by allowing written to the primary index and when
read queries to be distributed it is replicated to the secondary indexes. Using vector databases in the RAG integration pattern has the following
across multiple replicas. This lag can impact applications that advantages:
require real-time or near-real-time data
consistency across replicas.
• Semantic understanding: Vector embeddings capture the
nuanced semantic relationships within data, whether text,
images, or audio. This deep understanding is essential for
USE CASES
generative models to produce high-quality, realistic outputs
Vector databases and embeddings are crucial for several key use cases,
that are contextually relevant to the input or prompt.
including semantic search, vector data in generative AI, and more.
• Dimensionality reduction: By representing complex data in
SEMANTIC SEARCH a lower-dimensional vector space, this is aimed to reduce vast
You can retrieve information by leveraging the capabilities of vector datasets to make it feasible for AI models to process and learn from.
embeddings to understand and match the semantic context of queries
• Quality and precision: The precision of similarity search in
with relevant content.
vector databases ensures that the information retrieved for

Searches are performed by calculating the similarity between the generation is of high relevance and quality.

query vector and document vectors in the database, using some of the • Seamless integration: Vector databases provide APIs, SDKs, and
previously explained metrics, such as cosine similarity. Some of the tools that make it easy to integrate with various AI/ML frameworks.
applications would be: This flexibility facilitates the development and deployment of RAG
models, allowing researchers and developers to focus on model
• Recommendation systems: Perform similarity searches to
optimization rather than data management challenges.
find items that match a user's interests, providing accurate and
timely recommendations to enhance the user experience. • Context generation: Vector embeddings capture the semantic
essence of text, images, videos, and more, enabling AI models
• Customer support: Obtain the most relevant information to
to understand context and generate new content that is
solve customers' doubts, questions, or problems.
contextually similar or related.
• Knowledge management: Find relevant information quickly
• Scalability: Vector databases provide a scalable solution that
from the organization's knowledge composed by documents,
can manage large-scale information without compromising
slides, videos, or reports in enterprise systems.
retrieval performance.

REFCARD | GETTING STARTED WITH VECTOR DATABASES

Vector databases provide the technological foundation necessary for GETTING STARTED
the effective implementation of RAG models and make them an optimal To get started, we have conducted a practical exercise below that
choice for interaction with large-scale knowledge bases. demonstrates the use of a vector database for identifying comparable
products in a fashion retail scenario (i.e., semantic search use case).
OTHER SPECIFIC USES CASES
We'll go through setting up the environment, loading fashion product
Beyond the main use cases discussed above are several others, such as:
data into the open-source vector database, and querying it to find
• Anomaly detection: Embeddings capture nuanced relationships
similar items.
and patterns within data, making it possible to detect anomalies
that might not be evident through traditional methods. For the environment, ensure the following tools are installed:

• Retail comparable products: By converting product features • Docker 24 or higher

into vector embeddings, retailers can quickly find products • Docker Compose v2
with similar characteristics (e.g., design, material, price, sales). • Python 3.8 or higher

DATA SAMPLE
The following is a list of the datasets that we will use during this practical exercise based on the concepts explained in previous sections:

Table 4: Data sample

NAME SECTION FAMILY FIT COMPOSITION COLOR

Relaxed Fit Tee Men T-shirts Non-stretch, Relaxed fit 100% cotton. Jersey. Crewneck, Short sleeves Red

Relaxed Fit Tee Men T-shirts Non-stretch, Relaxed fit 100% cotton. Jersey. Crewneck, Short sleeves Green

Trucker Jacket Men Jackets Standard fit 100% cotton, Denim, Point collar, Long sleeves Gray

Slim Welt Pocket Jeans Women Jeans Mid rise: 8 3/4'', Inseam: 62% cotton~28% viscose, ECOVERO™)~8% elastomultiester~2% Black
30'', Leg opening: 13'' elastane, Denim, Stretch, Zip fly, 5-pocket styling

Baggy Dad Utility Pants Women Jeans Mid rise, Straight leg 95% cotton, 5% recycled cotton, Denim, No Stretch Green

The Perfect Tee Women T-shirts Standard fit, Model 100% cotton, Crewneck, Short sleeves White
wears a size small

Lelou Shrunken Moto Women Jackets Slim fit 100% polyurethane - releases plastic microfibers into the Black
Jacket environment during washing, Long sleeves

STEP 1: START UP YOUR VECTOR DATABASE

- weaviate_data:/var/lib/weaviate
In this example, we are going to use the following Docker Compose
restart: on-failure:0
file to locally run our vector database instance, using the open-source
environment:
Weaviate vector database in the following configuration: QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
---
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
version: '3.4'
DEFAULT_VECTORIZER_MODULE: 'text2vec-
services:
transformers'
weaviate:
TRANSFORMERS_INFERENCE_API: [Link]
command:
transformers:8080
- --host
ENABLE_MODULES: 'text2vec-transformers'
- [Link]
CLUSTER_HOSTNAME: 'node1'
- --port
t2v-transformers:
- '8080'
- --scheme image: semitechnologies/transformers-
- http inference:sentence-transformers-multi-qa-MiniLM-L6-
image: [Link]/semitechnologies/ cos-v1
weaviate:1.24.4 environment:
ports: ENABLE_CUDA: '0'
- 8080:8080 volumes:
- 50051:50051 weaviate_data:
volumes: ...

CODE CONTINUES IN NEXT COLUMN

REFCARD | GETTING STARTED WITH VECTOR DATABASES

In this example, the most relevant part is the modules' configuration:

"family": "T-SHIRTS",
• DEFAULT_VECTORIZER_MODULE is the vectorization module, "fit": "Non-stretch, Relaxed fit",
"composition": "100% cotton. Jersey.
which transforms objects into embeddings by default (or you
Crewneck, Short sleeves",
need to enter a vector for each data point that you add manually). "color": "Green"
},
• TRANSFORMERS_INFERENCE_API is the location of the inference
{
API where this API is located. In our case, we are running this
"name": "TRUCKER JACKET",
service in another image defined in the Docker Compose file. "section": "MEN",
"family": "JACKETS",
• ENABLE_MODULES are enabled inside Weaviate. We are going to use
"fit": "Standard fit",
text2vec-transformer to vectorize the products' data objects. "composition": "100% cotton, Denim, Point
• t2v-transformers is the image with "text2vec-transformer" collar, Long sleeves",
"color": "Gray"
service.
},
Once we create the Docker Compose file, all we have to do is execute it: {
"name": "SLIM WELT POCKET JEANS",
# Docker Compose runs two images the Weaviate "section": "WOMEN",
database and t2v-transformers-1 "family": "JEANS",
"fit": "Mid rise: 8 3/4'', Inseam: 30'', Leg
$ sudo docker compose up -d
opening: 13''",
"composition": "62% cotton, 28% viscose
To check if our vector database is running, we will run the following (ECOVERO™), 8% elastomultiester, 2% elastane, Denim,
commands: Stretch, Zip fly, 5-pocket styling",
"color": "Black"
# Check if the container's status is up. },
$ sudo docker ps {
"name": "BAGGY DAD UTILITY PANTS",
CONTAINER ID …
"section": "WOMEN",
STATUS
"family": "JEANS",
16dbc16744a8 …
"fit": "Mid rise, Straight leg",
Up 2 minutes
"composition": "95% cotton, 5% recycled
cb4175cec9a2 …
cotton, Denim, No Stretch",
Up 2 minutes
"color": "Green"
# Check database status by querying the API },
$ curl -X GET [Link] {
"name": "THE PERFECT TEE",
# In case of error, check the logs "section": "WOMEN",
$ docker compose logs -f --tail 100 weaviate "family": "T-SHIRTS",
"fit": "Standard fit, Model wears a size
small",
STEP 2: INSTALL THE CLIENT LIBRARY
"composition": "100% cotton, Crewneck, Short
Next, install the Weaviate Python client: sleeves",
"color": "White"
$ pip install weaviate-client
},
{
STEP 3: PREPARING YOUR FASHION RETAIL DATA "name": "LELOU SHRUNKEN MOTO JACKET",
"section": "WOMEN",
Prepare a dataset of fashion retail products based on Table 4. Each
"family": "JACKETS",
product should have attributes like name, description, or composition. "fit": "Slim fit",
"composition": "100% polyurethane - releases
products_data = [
plastic microfibers into the environment during
{
washing, Long sleeves",
"name": "Relaxed Fit Tee",
"color": "Black"
"section": "MEN",
}
"family": "T-SHIRTS",
]
"fit": "Non-stretch, Relaxed fit",
"composition": "100% cotton. Jersey.
Crewneck, Short sleeves", STEP 4: CREATE A COLLECTION
"color": "Red" To create a collection, we need to define the collection and schema for
},
the products' data objects. There are two options here:
{
"name": "Relaxed Fit Tee", 1. Create a schema that includes these properties
"section": "MEN",
2. Let your vector database auto-detect and generate the
CODE CONTINUES IN NEXT COLUMN properties automatically

REFCARD | GETTING STARTED WITH VECTOR DATABASES

In this case, we are going to use the second option, using Weaviate as
for o in [Link]:
our example: print([Link])
print([Link])
import weaviate
finally:
# Defined previously Step 3 [Link]()
products_data = [{....}]
This query uses the NEAR_TEXT function to find products with
# Connect with default parameters
client = weaviate.connect_to_local() descriptions similar to the given concept. Weaviate will return
products that its AI considers semantically similar based on the vector
# Check if the connection was successful
embeddings of their descriptions.
try:
client.is_ready()
STEP 6: OUTPUT
print("Successfully connected to Weaviate.")
products_collection = [Link]( The output of this query returns the two closest products, including
name="Products", some of the object properties and the distance:
vectorizer_config=[Link].
Vectorizer.text2vec_transformers( Successfully connected to Weaviate.
vectorize_collection_name=True {'family': 'T-SHIRTS', 'color': 'Red', 'name':
) 'Relaxed Fit Tee'}
) 0.0
{'family': 'T-SHIRTS', 'color': 'White', 'name':
products_objs = list() 'THE PERFECT TEE'}
for i,d in enumerate(products_data): 0.0
products_objs.append({
"name": d["name"],
"section": d["section"], CONCLUSION
"family" : d["family"], This Refcard provides an overview of vector database fundamentals
"fit": d["fit"], as well as a practical application in fashion retail. By customizing the
"composition": d["composition"],
dataset and queries, you can explore the full potential of vector
"color": d["color"],
}) databases for similarity searches and other AI-driven applications.
This is just the starting point to get you started in the world of
products_collection.data.insert_many(products_
vectors. ML models and vectors represent powerful tools in the area
objs)
of machine learning and artificial intelligence, offering a nuanced and
finally:
high-dimensional representation of complex data. Vector databases
[Link]()
are not a magical solution that provides immediate value, yet like all
good wine, engineers — and wineries alike — must employ careful
STEP 5: SIMILARITY QUERY
experimentation, parameter optimization, and ongoing evaluation.
Once your data is indexed, we can query for similar products using
Weaviate's vector search capabilities. For example, to find products
similar to a "Red T-Shirt" or "Jeans for women," you can use a search
query with its description: WRITTEN BY MIGUEL GARCÍA LORENZO,
VP OF ENGINEERING, NEXTAIL
import weaviate
Miguel is VP of Engineering at Nextail. He has 10+
import [Link] as wvc years in data space leading teams and building high-
performance solutions. A book lover and advocate of
# Connect with default parameters platform design as a service and data as a product.
client = weaviate.connect_to_local()

# Check if the connection was successful

try:
3343 Perimeter Hill Dr, Suite 100
client.is_ready() Nashville, TN 37211
print("Successfully connected to Weaviate.") 888.678.0399 | 919.678.0300

products = [Link]("Products") At DZone, we foster a collaborative environment that empowers developers and
tech professionals to share knowledge, build skills, and solve problems through
content, code, and community. We thoughtfully — and with intention — challenge
response = [Link].near_text( the status quo and value diverse perspectives so that, as one, we can inspire
query="Red T-Shirt", positive change through technology.
return_metadata=[Link].
MetadataQuery(distance=True), Copyright © 2024 DZone. All rights reserved. No part of this publication may be
limit=2, reproduced, stored in a retrieval system, or transmitted, in any form or by means
of electronic, mechanical, photocopying, or otherwise, without prior written
return_properties=["name", "family", "color"] permission of the publisher.
)

CODE CONTINUES IN NEXT COLUMN

Whitepaper - Embeddings & Vector Stores
No ratings yet
Whitepaper - Embeddings & Vector Stores
52 pages
Advanced LangChain AI Assistant Framework For Comp
No ratings yet
Advanced LangChain AI Assistant Framework For Comp
7 pages
Python Notes
No ratings yet
Python Notes
279 pages
TensorFlow Overview and Release History
No ratings yet
TensorFlow Overview and Release History
12 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
Lecture6 Tfidf
No ratings yet
Lecture6 Tfidf
45 pages
Word2Vec: Skip-Gram vs CBOW Explained
100% (1)
Word2Vec: Skip-Gram vs CBOW Explained
37 pages
AI Seminar Report for B.Tech CSE
No ratings yet
AI Seminar Report for B.Tech CSE
24 pages
CSC445: Neural Networks
No ratings yet
CSC445: Neural Networks
51 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
RAG - Genai
No ratings yet
RAG - Genai
11 pages
AutoGen - The Automated Program Generator
No ratings yet
AutoGen - The Automated Program Generator
196 pages
AI-Enhanced QA: EmbeddingAlign RAG
No ratings yet
AI-Enhanced QA: EmbeddingAlign RAG
7 pages
LLM Fince-Tuning
No ratings yet
LLM Fince-Tuning
16 pages
RAG Systems Evaluation Guide
No ratings yet
RAG Systems Evaluation Guide
8 pages
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
No ratings yet
GTC'24 Special Event - Build A RAG-powered Application With A Human Voice Interface (SE62869) - Deck - FINAL - 1714408879420001sjpp
108 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Reducing Hallucination in AI Dialogue
No ratings yet
Reducing Hallucination in AI Dialogue
21 pages
Testing LLM Applications and Quality
No ratings yet
Testing LLM Applications and Quality
41 pages
Predicting BMW Prices with Regression
No ratings yet
Predicting BMW Prices with Regression
5 pages
Neural Network Learning Guide
No ratings yet
Neural Network Learning Guide
43 pages
Ai Final
No ratings yet
Ai Final
52 pages
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
No ratings yet
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
76 pages
Building Living Software Systems With Generative & Agentic AI
No ratings yet
Building Living Software Systems With Generative & Agentic AI
6 pages
RAG for NLP Experts
No ratings yet
RAG for NLP Experts
2 pages
Machine Learning in Cybersecurity Seminar
No ratings yet
Machine Learning in Cybersecurity Seminar
18 pages
Seaborn - Plots - Jupyter Notebook
No ratings yet
Seaborn - Plots - Jupyter Notebook
36 pages
Autogen Core Concepts
No ratings yet
Autogen Core Concepts
9 pages
Hugging Face Transformers
100% (1)
Hugging Face Transformers
8 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
Internship Papers Previous
No ratings yet
Internship Papers Previous
52 pages
AI Adoption in Defense: 3D Models Study
No ratings yet
AI Adoption in Defense: 3D Models Study
117 pages
Artificial Intelligence Powerpoint Presentation Slide Template Complete Deck
100% (1)
Artificial Intelligence Powerpoint Presentation Slide Template Complete Deck
99 pages
Evaluate RAG - Phoenix
No ratings yet
Evaluate RAG - Phoenix
25 pages
LLMs' Limitations in Planning Tasks
No ratings yet
LLMs' Limitations in Planning Tasks
21 pages
Guide To RAG System Evaluation Metrics
No ratings yet
Guide To RAG System Evaluation Metrics
21 pages
Multi-layer Perceptron Overview
No ratings yet
Multi-layer Perceptron Overview
4 pages
Generative Ai
No ratings yet
Generative Ai
12 pages
Langchain PDF Reader
100% (1)
Langchain PDF Reader
15 pages
Transformers For Natural Language Processing and Computer Vision
No ratings yet
Transformers For Natural Language Processing and Computer Vision
150 pages
AI & ML Intro for Students
No ratings yet
AI & ML Intro for Students
39 pages
CS 8520: Artificial Intelligence: Knowledge Representation
100% (1)
CS 8520: Artificial Intelligence: Knowledge Representation
30 pages
LLM and Gen AI
No ratings yet
LLM and Gen AI
4 pages
LLM Chains for Product Naming and Analysis
No ratings yet
LLM Chains for Product Naming and Analysis
7 pages
Agentic AI - Identity and Access Management
No ratings yet
Agentic AI - Identity and Access Management
85 pages
Hugging Face
100% (1)
Hugging Face
11 pages
AI-Powered Music Creation Tool
No ratings yet
AI-Powered Music Creation Tool
16 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Intro - Types of Machine Learning
No ratings yet
Intro - Types of Machine Learning
24 pages
B12158 Mastering PyTorch Ebook 15 Pages
No ratings yet
B12158 Mastering PyTorch Ebook 15 Pages
15 pages
TF-IDF and Ranked Retrieval Basics
No ratings yet
TF-IDF and Ranked Retrieval Basics
51 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
Intelligent Agents: Applications & Studies
No ratings yet
Intelligent Agents: Applications & Studies
3 pages
Evolution of Large Language Models
No ratings yet
Evolution of Large Language Models
32 pages
Intro to Large Language Models
No ratings yet
Intro to Large Language Models
29 pages
Expert-Systems AI Pres
No ratings yet
Expert-Systems AI Pres
21 pages
Generative AI: Creative Chaos Unleashed
No ratings yet
Generative AI: Creative Chaos Unleashed
1 page
Advances in MultiModal Large Language Models
No ratings yet
Advances in MultiModal Large Language Models
22 pages
Introduction to Vector Databases
No ratings yet
Introduction to Vector Databases
9 pages
Vector Database
No ratings yet
Vector Database
3 pages
JDBC Guide for Java Database Connectivity
No ratings yet
JDBC Guide for Java Database Connectivity
31 pages
Control-M Server V9 MSSQL AlwaysOn
No ratings yet
Control-M Server V9 MSSQL AlwaysOn
4 pages
CSC 221 - Comp Appreciation
No ratings yet
CSC 221 - Comp Appreciation
38 pages
Group - Ii: Clinical & Business Intelligence
No ratings yet
Group - Ii: Clinical & Business Intelligence
25 pages
SSIS Architecture and Package Design Overview
No ratings yet
SSIS Architecture and Package Design Overview
36 pages
Business Analytics Course Overview
25% (4)
Business Analytics Course Overview
6 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
24 pages
Data Mining MCQs and Answers Guide
No ratings yet
Data Mining MCQs and Answers Guide
8 pages
Visual Basic Viva Questions and Answers BCA Kuk University
No ratings yet
Visual Basic Viva Questions and Answers BCA Kuk University
10 pages
Fix Dataguard Archivelog Gaps Guide
No ratings yet
Fix Dataguard Archivelog Gaps Guide
4 pages
Power-BI-DAX-Essentials AMZ Manual Snippet
No ratings yet
Power-BI-DAX-Essentials AMZ Manual Snippet
25 pages
Database Keys Assignment CS409
No ratings yet
Database Keys Assignment CS409
3 pages
Load Unstructured Data into Hive with PySpark
No ratings yet
Load Unstructured Data into Hive with PySpark
9 pages
Class 11 IP Informatics Practices Paper
No ratings yet
Class 11 IP Informatics Practices Paper
10 pages
AWS Cloud Data Ingestion Patterns Practices
No ratings yet
AWS Cloud Data Ingestion Patterns Practices
40 pages
ITE 3106 - Lesson 03 - Application Architectures
100% (1)
ITE 3106 - Lesson 03 - Application Architectures
14 pages
Lecture 6 - CS50's Introduction To Databases With SQL
No ratings yet
Lecture 6 - CS50's Introduction To Databases With SQL
14 pages
NUIX User Guide
100% (1)
NUIX User Guide
338 pages
Splunk Questions
No ratings yet
Splunk Questions
28 pages
Unit 3 Chapter 8
No ratings yet
Unit 3 Chapter 8
3 pages
Python & SQL Practical Guide
No ratings yet
Python & SQL Practical Guide
2 pages
Eulav Technical Guide for Engineers
No ratings yet
Eulav Technical Guide for Engineers
9 pages
Creating A Simple Database Application in Oracle APEX
No ratings yet
Creating A Simple Database Application in Oracle APEX
51 pages
SimaPro Faculty License Overview
No ratings yet
SimaPro Faculty License Overview
2 pages
Comprehensive Guide To Business Analytics
No ratings yet
Comprehensive Guide To Business Analytics
10 pages
Micro-Project Report - PHP
No ratings yet
Micro-Project Report - PHP
32 pages
DBMS Mini Project PPT (Template) .1 Final2
No ratings yet
DBMS Mini Project PPT (Template) .1 Final2
12 pages
4 12th Computer Science Important Questions For Slow Lanners English Medium
No ratings yet
4 12th Computer Science Important Questions For Slow Lanners English Medium
11 pages
SQL Internship Experience at Celebal Tech
No ratings yet
SQL Internship Experience at Celebal Tech
9 pages
Dubai Property
No ratings yet
Dubai Property
11 pages

Vector Database

Uploaded by

Vector Database

Uploaded by

396

• About Vector Databases

With Vector Databases − Vector Indexes

MIGUEL GARCÍA LORENZO

The explosion of AI and machine learning (ML) technologies is the key

Unlike other evolutions in databases, vector databases were not made

In addition, we will use a functional example throughout to better

ABOUT VECTOR DATABASES

This numerical representation is called a vector embedding, or simply

© DZONE | REFCARD | APRIL 2024 1

Figure 3: Differences between traditional and vector databases

Vector databases are a complete system designed to manage

• Persistence and durability: Allow data to be stored on disk as

© DZONE | REFCARD | APRIL 2024 2

Figure 7: Embeddings generation patterns

The number of dimensions in embeddings are so important because

Let’s consider a simple example with a numerical representation

Figure 5: Array of embeddings

In Figure 6, we can see the dimensional representation of these objects

Figure 6: Dimensional map

© DZONE | REFCARD | APRIL 2024 3

Table 1: Embedding generation comparative MANHATTAN DISTANCE

Query 1. Before we run a query, we must We can run a query

DISTANCE METRICS AND SIMILARITY

Generally, each database implements a subset of these index strategies,

EUCLIDEAN DISTANCE SHARDING

By distributing the data across Implementing sharding can

TABLE CONTINUES ON NEXT PAGE

© DZONE | REFCARD | APRIL 2024 4

Replication provides a Replication requires additional storage

Replication can improve In asynchronous replication setups,

© DZONE | REFCARD | APRIL 2024 5

• Retail comparable products: By converting product features • Docker 24 or higher

Table 4: Data sample

NAME SECTION FAMILY FIT COMPOSITION COLOR

STEP 1: START UP YOUR VECTOR DATABASE

CODE CONTINUES IN NEXT COLUMN

© DZONE | REFCARD | APRIL 2024 6

In this example, the most relevant part is the modules' configuration:

© DZONE | REFCARD | APRIL 2024 7

# Check if the connection was successful

CODE CONTINUES IN NEXT COLUMN

© DZONE | REFCARD | APRIL 2024 8

You might also like