0% found this document useful (0 votes)
68 views8 pages

Cookbook Examples Langchain Gemini LangChain QA Chroma WebLoad - Ipynb at Main Google-Gemini Cookbook

book

Uploaded by

canal15delamor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views8 pages

Cookbook Examples Langchain Gemini LangChain QA Chroma WebLoad - Ipynb at Main Google-Gemini Cookbook

book

Uploaded by

canal15delamor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.

ipynb at main · google-gemini/cookbook

google-gemini /
cookbook

Code Issues 33 Pull requests 21 Actions Projects Security

cookbook / examples / langchain / Gemini_LangChain_QA_Chroma_WebLoad.ipynb

shilpakancharla Migrating langchain integration examples with Gemini 1.5 Flash (#207)

3fb4b5d · last month

636 lines (636 loc) · 38.2 KB

Preview Code Blame Raw

https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 1/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook

Gemini API: Question Answering


using LangChain and Chroma

Run in Google Colab

Overview
Gemini is a family of generative AI models that lets developers generate content
and solve problems. These models are designed and trained to handle both text
and images as input.

LangChain is a data framework designed to make integration of Large Language


Models (LLM) like Gemini easier for applications.

Chroma is an open-source embedding database focused on simplicity and


developer productivity. Chroma allows users to store embeddings and their
metadata, embed documents and queries, and search the embeddings quickly.

In this notebook, you'll learn how to create an application that answers questions
using data from a website with the help of Gemini, LangChain, and Chroma.

Setup
First, you must install the packages and set the necessary environment variables.

Installation
Install LangChain's Python library, langchain and LangChain's integration
package for Gemini, langchain-google-genai . Next, install Chroma's Python
client SDK, chromadb .

https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 2/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook
In [1]: !pip install --quiet langchain-core==0.1.23
!pip install --quiet langchain==0.1.1
!pip install --quiet langchain-google-genai==0.0.6
!pip install --quiet -U langchain-community==0.0.20
!pip install --quiet chromadb

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 241.2/241.2 kB 2.9 MB/s eta


0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.4/55.4 kB 1.6 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.0/53.0 kB 3.2 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 802.4/802.4 kB 6.0 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 12.8 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 kB 2.7 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 302.9/302.9 kB 15.7 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 20.0 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 26.3 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 25.6 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 32.5 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 40.8 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 46.1 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 48.2 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 33.4 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 43.2 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 47.9 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 42.7 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 47.2 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 32.8 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 63.0 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 65.4 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 42.0 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 61.2 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 75.2 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 146.9/146.9 kB 4.0 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 598.7/598.7 kB 20.8 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 559.5/559.5 kB 8.5 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 42.8 MB/s eta 0:0
0:00
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 3/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92.0/92.0 kB 7.0 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.4/62.4 kB 7.6 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.3/41.3 kB 3.6 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.8/6.8 MB 87.5 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59.9/59.9 kB 7.7 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 107.0/107.0 kB 12.7 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.3/67.3 kB 8.4 MB/s eta 0:
00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 283.7/283.7 kB 25.8 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 74.3 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.6/67.6 kB 7.5 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 145.0/145.0 kB 16.8 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 8.5 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.9/71.9 kB 8.4 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.6/53.6 kB 6.0 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 9.1 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 7.4 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.0/46.0 kB 6.1 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 52.5/52.5 kB 6.8 MB/s eta 0:
00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.5/130.5 kB 15.6 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 341.4/341.4 kB 32.3 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 87.5 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 64.3 MB/s eta 0:0
0:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.2/130.2 kB 15.0 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 30.3 MB/s eta
0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 11.0 MB/s eta
0:00:00
Building wheel for pypika (pyproject.toml) ... done

In [2]: from langchain import PromptTemplate


from langchain import hub
from langchain.docstore.document import Document
from langchain.document_loaders import WebBaseLoader
from langchain.schema import StrOutputParser
from langchain.schema.prompt_template import format_document
from langchain.schema.runnable import RunnablePassthrough
from langchain.vectorstores import Chroma

https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 4/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook

Configure your API key


To run the following cell, your API key must be stored in a Colab Secret named
GOOGLE_API_KEY . If you don't already have an API key, or you're not sure how
to create a Colab Secret, see Authentication for an example.

In [3]: import os
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

Basic steps
LLMs are trained offline on a large corpus of public data. Hence they cannot
answer questions based on custom or private data accurately without additional
context.

If you want to make use of LLMs to answer questions based on private data, you
have to provide the relevant documents as context alongside your prompt. This
approach is called Retrieval Augmented Generation (RAG).

You will use this approach to create a question-answering assistant using the
Gemini text model integrated through LangChain. The assistant is expected to
answer questions about the Gemini model. To make this possible you will add
more context to the assistant using data from a website.

In this tutorial, you'll implement the two main components in an RAG-based


architecture:

1. Retriever

Based on the user's query, the retriever retrieves relevant snippets that add
context from the document. In this tutorial, the document is the website data.
The relevant snippets are passed as context to the next stage - "Generator".

2. Generator

The relevant snippets from the website data are passed to the LLM along
with the user's query to generate accurate answers.

You'll learn more about these stages in the upcoming sections while
implementing the application.

Retriever
In this stage, you will perform the following steps:

1. Read and parse the website data using LangChain.


https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 5/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook

2. Create embeddings of the website data.

Embeddings are numerical representations (vectors) of text. Hence, text with


similar meaning will have similar embedding vectors. You'll make use of
Gemini's embedding model to create the embedding vectors of the website
data.

3. Store the embeddings in Chroma's vector store.

Chroma is a vector database. The Chroma vector store helps in the efficient
retrieval of similar vectors. Thus, for adding context to the prompt for the
LLM, relevant embeddings of the text matching the user's question can be
retrieved easily using Chroma.

4. Create a Retriever from the Chroma vector store.

The retriever will be used to pass relevant website embeddings to the LLM
along with user queries.

Read and parse the website data


LangChain provides a wide variety of document loaders. To read the website data
as a document, you will use the WebBaseLoader from LangChain.

To know more about how to read and parse input data from different sources
using the document loaders of LangChain, read LangChain's document loaders
guide.

In [4]: loader = WebBaseLoader("https://blog.google/technology/ai/google-gemini-a


docs = loader.load()

If you only want to select a specific portion of the website data to add context to
the prompt, you can use regex, text slicing, or text splitting.

In this example, you'll use Python's split() function to extract the required
portion of the text. The extracted text should be converted back to LangChain's
Document format.

In [5]: # Extract the text from the website data document


text_content = docs[0].page_content

# The text content between the substrings "code, audio, image and video."
# "Cloud TPU v5p" is relevant for this tutorial. You can use Python's `sp
# to select the required content.
text_content_1 = text_content.split("code, audio, image and video.",1)[1]
final_text = text_content_1.split("Cloud TPU v5p",1)[0]

# Convert the text to LangChain's `Document` format


docs = [Document(page_content=final_text, metadata={"source": "local"})]

https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 6/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook

Initialize Gemini's embedding model


To create the embeddings from the website data, you'll use Gemini's embedding
model, embedding-001 which supports creating text embeddings.

To use this embedding model, you have to import


GoogleGenerativeAIEmbeddings from LangChain. To know more about the
embedding model, read Google AI's language documentation.

In [6]: from langchain_google_genai import GoogleGenerativeAIEmbeddings

gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-

Store the data using Chroma


To create a Chroma vector database from the website data, you will use the
from_documents function of Chroma . Under the hood, this function creates
embeddings from the documents created by the document loader of LangChain
using any specified embedding model and stores them in a Chroma vector
database.

You have to specify the docs you created from the website data using
LangChain's WebBasedLoader and the gemini_embeddings as the
embedding model when invoking the from_documents function to create the
vector database from the website data. You can also specify a directory in the
persist_directory argument to store the vector store on the disk. If you
don't specify a directory, the data will be ephemeral in-memory.

In [7]: # Save to disk


vectorstore = Chroma.from_documents(
documents=docs, # Data
embedding=gemini_embeddings, # Embedding model
persist_directory="./chroma_db" # Directory to save
)

Create a retriever using Chroma


You'll now create a retriever that can retrieve website data embeddings from the
newly created Chroma vector store. This retriever can be later used to pass
embeddings that provide more context to the LLM for answering user's queries.

To load the vector store that you previously stored in the disk, you can specify the
name of the directory that contains the vector store in persist_directory and
the embedding model in the embedding_function arguments of Chroma's
initializer.
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 7/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook

You can then invoke the as_retriever function of Chroma on the vector
store to create a retriever.

In [8]: # Load from disk


vectorstore_disk = Chroma(
persist_directory="./chroma_db", # Director
embedding_function=gemini_embeddings # Embeddin
)
# Get the Retriever interface for the store to use later.
# When an unstructured query is given to a retriever it will return docum
# Read more about retrievers in the following link.
# https://python.langchain.com/docs/modules/data_connection/retrievers/
#
# Since only 1 document is stored in the Chroma vector store, search_kwar
# is set to 1 to decrease the `k` value of chroma's similarity search fro
# 1. If you don't pass this value, you will get a warning.
retriever = vectorstore_disk.as_retriever(search_kwargs={"k": 1})

# Check if the retriever is working by trying to fetch the relevant docs


# to the word 'MMLU' (Massive Multitask Language Understanding). If the l
# the retriever is functioning well.
print(len(retriever.get_relevant_documents("MMLU")))

Generator
The Generator prompts the LLM for an answer when the user asks a question. The
retriever you created in the previous stage from the Chroma vector store will be
used to pass relevant embeddings from the website data to the LLM to provide
more context to the user's query.

You'll perform the following steps in this stage:

1. Chain together the following:

A prompt for extracting the relevant embeddings using the retriever.

https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 8/8

You might also like