0% found this document useful (0 votes)
26 views3 pages

Discord Taz

Uploaded by

Seth Thunder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Discord Taz

Uploaded by

Seth Thunder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

To make it a chatbot so that I can have multiple users asking, would I have to do

some changes to my code besides changing it to async? You know chatgpt has millions
of users that ask it question but each one from his account. I want something like
that for mine
2) when I wanna load pdf files for example, should I use PyPDFLoader or
PDFMinerLoader or any other pdf type loader? I tried asking LLMs about which is
better but couldn�t find an answer and on YT, people just use PyPdfLoader
3) When splitting my doc to chunks, what do people split them based on? Like if you
consider any chatbot website where u can upload ur doc, there�s definitely several
different structures like a pdf thats just purely text or one that has sections
with subheaders then text or one that has tables and images with it etc� how would
I use my separators exactly? I couldn�t find anything about that and not sure whats
even the optimum chunk size and chunk overlap( based on the structure ofc)
4) If I am chatting withal a csv file or mysql, is it better to use an agent to
answer my questions or should I use the same idea of chains and all that?
5) if you look into my code earlier, I did use chroma but if I wanna deploy it, I
cant have my knowledge base stored locally and persisted locally right? How can I
store and persist on a cloud service and which one would you recommend?
SethThunder
OP
� Yesterday at 6:23 PM
6) Regarding the loaders, which one is better from the 2 to load directory:
DirectoryLoader (but I specify in the parameters to just load PDF files) or
PyPDFDirectoryLoader? The reason I am asking this is because in the DirectoryLoader
there is a parameter that allows me to use multithreading
7) How can I evaluate my RAG performance as a %? Ive tried reading on linked in
and some YT videos but for some reason there isnt 1 definitive way. One says method
X is good while another says its bad and Y is better
8) For the chain types like stuff, refine, map reduce and map rerank, how can I
know which one is better for my use case? For my code earlier since its just 1 pdf
I assume stuff is fine but when is it too big to use stuff and switch to map
rerank?
9) For retrieving from my knowledge base, whats the best or based on what do I
choose my retrieve type? Similarity search or MMR etc�
SethThunder
OP
� Yesterday at 6:32 PM
10) does Chroma have a concept similar to Deeplake�s deepmemory? Deepmemory can
increase my RAG accuracy by 20-30%
@taz I think those are my questions for now and sorry for asking too many but I�ve
been reading literally everywhere but couldn�t find answers. I�m not sure if thats
because LC is a new framework or it�s because I�m a beginner in the entire
programming world. I definitely want to make my career in LLMs as I�ve been very
hooked to it.
taz � Yesterday at 7:12 PM
1) Other than async you'll have to think about how you want to isolate user
sessions e.g. each user should not be able to see the other user's chat data. Then
how about the docs they can operate on, are they all going to have access to the
same docs, if not then you need to think about how each user will have his own set
of documents (maybe you can model that user per collection or metadata in Chroma)
2) Depends what you wanna do with the doc, some libs allow you to extract images
and tables
I think LC has functionality around that
but my advice is, start simple use whatever works and create abstraction (this can
be as simple as wrapping the PyPdfLoader code in a function) to be able to easily
swap out libraries
3) Chunking is an art form ?? There is no one best solution but there are couple of
things you need to keep in mind - chunks should be less than the maximum input
sequence of your embedding model, if you are using OpenAI that's large - 8000ish
tokens, with others it is relatively less. And second thing to keep in mind is
LLM's context window. An example would be if you take the top 10 results and you
feed those to an LLM where each chunk (result from the search) is 2000 tokens then
you'll end up with 20k tokens which not so many LLMs can handle. To that effect I
think LC might also have some functionality to help
taz � Yesterday at 7:19 PM
4) Agents, I think might have an edge here, LC has functionality around that too
5) No specific recommendation on the cloud provider, all of them work fine and
prices are comparable. We have a few cloud deployment blueprints -
https://github.com/chroma-core/chroma/tree/main/examples/deployments
GitHub
chroma/examples/deployments at main � chroma-core/chroma
the AI-native open-source embedding database. Contribute to chroma-core/chroma
development by creating an account on GitHub.
chroma/examples/deployments at main � chroma-core/chroma
6) if you have large set of files then multi-treaded loading might help
7) I usually suggest Ragas to people - https://github.com/explodinggradients/ragas
GitHub
GitHub - explodinggradients/ragas: Evaluation framework for your Re...
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines -
GitHub - explodinggradients/ragas: Evaluation framework for your Retrieval
Augmented Generation (RAG) pipelines
GitHub - explodinggradients/ragas: Evaluation framework for your Re...
also have a look at this, though you will have to adapt it to LC -
https://docs.ragas.io/en/latest/howtos/integrations/llamaindex.html
Evaluating LlamaIndex | Ragas
taz � Yesterday at 7:27 PM
LC also have their own eval framework, LangSmith I think
8) stuff is fine for small number of docs
for complex or numerous docs refine/map reduce. Refine does things iteratively by
passing each doc to the LLM for answer + intermediate answer from previous doc,
whereas map reduce will pass each doc to LLM get an answer then combine the answers
iteratively to arrive to a single answer
if you need scoring or raking of the relevancy of results then map rerank
taz � Yesterday at 7:37 PM
9) use similarity if you want closest matches, use MMR if you want diversity of
results
10) not familiar with deepmemory, there are however hyper parameters of the HNSW
lib which Chroma uses to increase accuracy (memory tradeoff) of results
SethThunder
OP
� Yesterday at 8:29 PM
1) Do you have any reference on how I can isolate sessions? I did understand the
concept of what you wrote but the implementation is where I'm sort of lost. In my
case, the users will be able to operate on the same docs and yeah I am interested
in using chroma more than any other DB

5) Based on what do people normally pick a cloud provider? What should I look into
before deciding?

8) How small is small for stuff?

9) MMR would work well if my use case involves setting temperature = 1 or a high
number right?

10) I won't be able to use memory at all or does it reduce the number of msgs it
can remember?
taz � Yesterday at 11:45 PM
10) no, but chroma will consume more memory in order to give you better results
taz � Today at 12:05 PM
9) If you're looking for diversity in search results then MMR is good choice
5) The easiest I find to start with is AWS
SethThunder
OP
� Today at 7:31 PM
Thanks a lot @taz

You might also like