Cookbook Examples Langchain Gemini LangChain QA Chroma WebLoad - Ipynb at Main Google-Gemini Cookbook
Cookbook Examples Langchain Gemini LangChain QA Chroma WebLoad - Ipynb at Main Google-Gemini Cookbook
google-gemini /
cookbook
shilpakancharla Migrating langchain integration examples with Gemini 1.5 Flash (#207)
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 1/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook
Overview
Gemini is a family of generative AI models that lets developers generate content
and solve problems. These models are designed and trained to handle both text
and images as input.
In this notebook, you'll learn how to create an application that answers questions
using data from a website with the help of Gemini, LangChain, and Chroma.
Setup
First, you must install the packages and set the necessary environment variables.
Installation
Install LangChain's Python library, langchain and LangChain's integration
package for Gemini, langchain-google-genai . Next, install Chroma's Python
client SDK, chromadb .
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 2/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook
In [1]: !pip install --quiet langchain-core==0.1.23
!pip install --quiet langchain==0.1.1
!pip install --quiet langchain-google-genai==0.0.6
!pip install --quiet -U langchain-community==0.0.20
!pip install --quiet chromadb
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 4/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook
In [3]: import os
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
Basic steps
LLMs are trained offline on a large corpus of public data. Hence they cannot
answer questions based on custom or private data accurately without additional
context.
If you want to make use of LLMs to answer questions based on private data, you
have to provide the relevant documents as context alongside your prompt. This
approach is called Retrieval Augmented Generation (RAG).
You will use this approach to create a question-answering assistant using the
Gemini text model integrated through LangChain. The assistant is expected to
answer questions about the Gemini model. To make this possible you will add
more context to the assistant using data from a website.
1. Retriever
Based on the user's query, the retriever retrieves relevant snippets that add
context from the document. In this tutorial, the document is the website data.
The relevant snippets are passed as context to the next stage - "Generator".
2. Generator
The relevant snippets from the website data are passed to the LLM along
with the user's query to generate accurate answers.
You'll learn more about these stages in the upcoming sections while
implementing the application.
Retriever
In this stage, you will perform the following steps:
Chroma is a vector database. The Chroma vector store helps in the efficient
retrieval of similar vectors. Thus, for adding context to the prompt for the
LLM, relevant embeddings of the text matching the user's question can be
retrieved easily using Chroma.
The retriever will be used to pass relevant website embeddings to the LLM
along with user queries.
To know more about how to read and parse input data from different sources
using the document loaders of LangChain, read LangChain's document loaders
guide.
If you only want to select a specific portion of the website data to add context to
the prompt, you can use regex, text slicing, or text splitting.
In this example, you'll use Python's split() function to extract the required
portion of the text. The extracted text should be converted back to LangChain's
Document format.
# The text content between the substrings "code, audio, image and video."
# "Cloud TPU v5p" is relevant for this tutorial. You can use Python's `sp
# to select the required content.
text_content_1 = text_content.split("code, audio, image and video.",1)[1]
final_text = text_content_1.split("Cloud TPU v5p",1)[0]
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 6/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook
gemini_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-
You have to specify the docs you created from the website data using
LangChain's WebBasedLoader and the gemini_embeddings as the
embedding model when invoking the from_documents function to create the
vector database from the website data. You can also specify a directory in the
persist_directory argument to store the vector store on the disk. If you
don't specify a directory, the data will be ephemeral in-memory.
To load the vector store that you previously stored in the disk, you can specify the
name of the directory that contains the vector store in persist_directory and
the embedding model in the embedding_function arguments of Chroma's
initializer.
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 7/8
27/7/24, 20:04 cookbook/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb at main · google-gemini/cookbook
You can then invoke the as_retriever function of Chroma on the vector
store to create a retriever.
Generator
The Generator prompts the LLM for an answer when the user asks a question. The
retriever you created in the previous stage from the Chroma vector store will be
used to pass relevant embeddings from the website data to the LLM to provide
more context to the user's query.
https://github.com/google-gemini/cookbook/blob/main/examples/langchain/Gemini_LangChain_QA_Chroma_WebLoad.ipynb 8/8