Large Language Model Based Search Tool Prototype

NASA aims to make research more efficient through tools like search tools. The document proposes a prototype search tool that uses large language models to interact with NASA's dataset. It was built using PandasAI and OpenAI libraries in Google Colab and allows natural language queries of the dataset. However, PandasAI had a bug that limited custom prompts, but the tool demonstrates searching for authors and publications.

Uploaded by

nataliasbackup

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views2 pages

Large Language Model Based Search Tool Prototype

Uploaded by

nataliasbackup

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

LARGE LANGUAGE MODEL BASED

SEARCH TOOL PROTOTYPE

NASA is a research-based institution, and its primary mission is to conduct research

and development in aeronautics, space exploration, and related fields, but research is a
very time consuming process, but with the right tools this process can be made more
efficient. One such tool is “search tool” , the ability to find the data you are looking for in
a short span of time in plain english or even roughly formatted queries can reduce the
time and increase productivity. Here I am suggesting a prototype search tool that uses a
large language model to interact with NASA's ntrs dataset.

My solution is based on openAI llm, and a relatively new library known as pandasAI,
using these two libraries along with other libraries, I have built a Google Colab based
search tool, which talks to NASA's ntrs dataset.

The techniques and tools I have used can be incorporated into the existing NASA’s
search website or it can be used as a standalone app, or even as a chatbot on the
search site.

I will now describe a little bit about what the two main libraries do and what worked and
what didn’t work for me. OpenAI offers an API platform that provides its latest models
that can be used to build different apps. PandasAI is a Python library that integrates
generative AI capabilities into Pandas, the popular data analysis and manipulation tool.
It is designed to be used in conjunction with Pandas, and it makes data analysis
conversational, allowing users to ask questions to their data in natural language. It can
show dataframes, plot graphs and bars also.

However, I found a minor bug in PandasAI that limited my options to develop a more
powerful solution. The problem that I faced in pandasAI is that there custom_prompt is
not working, whenever, I tried to pass a function to custom_prompt to make more
complex queries, as pandasAI can not build complex queries with short sentences like
“find who is the lead author in astronomy field ” it gives me a generic solution. The
workaround I found to this problem is that I am directly sending the prompt to the chat
function, but the prompt is very detailed along with some examples.
The downside of this bug is that I was not able to use embeddings and any vector store,
however, to the best of my knowledge they are working on this bug, and soon this can
be fixed.
I tried to use gradio to show the dataframe but the pandasAI’s smart dataframe didn’t
seem to work, I don’t know whether it is my fault or it is pandasAI fault. However
streamlit can be used as there is a middleware for that in PandasAI.
Now, a little bit about the code and dataset, I have used ntrs-public-metadata.json.gz
dataset, the dataset has used different structures with different columns some of them
are plain strings, or simple list while others have dictionaries inside lists. I have used
four different columns out of 28 columns to make research related queries. The
columns I have used are ‘authorAffiliations’, ‘organization’ , ‘subjectCategories’, ‘Stitype’.
To address the problems statements like who is the lead author in a particular field,
what type of publications an author has published, what organization does he/she
belong to. More columns like ‘curated’ and ‘publishing date’ can also be added. Or
instead of making another dataframe the methods I have used can be applied on the
original dataframe as well . I have used pandas explode function to expand the number
of authors in different rows and a custom function to convert subjectCategories into
strings

PROMPTS:-
Below are a few example prompts I have used to check-
1. how many rows are in {x1} and {data2}
2. How many unique authors are in {x1}
3. Complex query that address the question who is the lead author in a particular
field : "use the provided dataframe to find the rows where the subject is
astronomy you can use the following query as an example
x1.query(subjectCategories.str.contains(Astronomy)). save them in another
data frame. you can use the code
x1.query(subjectCategories.str.contains(Physics)) to generate python code.
find the value counts of each author in this new dataframe. the following code
shows an example code
x1[x1[subjectCategories]==Astronomy][authorAffiliations].value_counts().
provide the counts of each author as output"

dsbda Unit4
No ratings yet
dsbda Unit4
110 pages
Pandas Library Documentation
No ratings yet
Pandas Library Documentation
16 pages
Ayushi Data Science Final File
No ratings yet
Ayushi Data Science Final File
30 pages
PHARMACOLOGY OF THE CARDIOVASCULAR SYSTEM MCQS COMPILED BY KANDY EMMA MBChB KIU
100% (1)
PHARMACOLOGY OF THE CARDIOVASCULAR SYSTEM MCQS COMPILED BY KANDY EMMA MBChB KIU
52 pages
16. PYTHON PACKAGES TO LEARN DATA SCIENCE E-BOOK
No ratings yet
16. PYTHON PACKAGES TO LEARN DATA SCIENCE E-BOOK
76 pages
Unit 7: Problem Solving Real World Programming Problems
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
Artists & Illustrators - January 2020
100% (2)
Artists & Illustrators - January 2020
86 pages
Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
No ratings yet
Law Nov 21 Notes by CA Kevin Haria For Updates, Join Channel @KEVINHARIA
349 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
NLP Mini Project
No ratings yet
NLP Mini Project
19 pages
QUANT FOUNDATION TEST PART 1
No ratings yet
QUANT FOUNDATION TEST PART 1
2 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
Lab Manual (AI)
100% (1)
Lab Manual (AI)
17 pages
Sentiments Analysis Code Analysis
No ratings yet
Sentiments Analysis Code Analysis
42 pages
Japan and Philippines Similarities Differences
No ratings yet
Japan and Philippines Similarities Differences
5 pages
Preparing Teachers for a Changing World
No ratings yet
Preparing Teachers for a Changing World
175 pages
surbhi
No ratings yet
surbhi
12 pages
Pandas (6)
No ratings yet
Pandas (6)
9 pages
41_DS_PL_MF
No ratings yet
41_DS_PL_MF
20 pages
David J.A. Clines, Deconstructing Job
No ratings yet
David J.A. Clines, Deconstructing Job
16 pages
(Dark) Overlord, Vol. 7 The Invaders of The Great Tomb
No ratings yet
(Dark) Overlord, Vol. 7 The Invaders of The Great Tomb
280 pages
Unit 5 PythonPackages (Numpy,Pandas,Tkinter)
No ratings yet
Unit 5 PythonPackages (Numpy,Pandas,Tkinter)
68 pages
Lecture 2-E-Force Field
No ratings yet
Lecture 2-E-Force Field
76 pages
Hands-on Lab- API Examples Random User and Fruityvice API Examples
No ratings yet
Hands-on Lab- API Examples Random User and Fruityvice API Examples
6 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Efficient Python Tricks and Tools For Data Scientists
100% (1)
Efficient Python Tricks and Tools For Data Scientists
23 pages
A082 - Shubham Kumar - Practical No. 2
No ratings yet
A082 - Shubham Kumar - Practical No. 2
6 pages
smita ml labbbb-1-10
No ratings yet
smita ml labbbb-1-10
10 pages
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
CH 3 2
No ratings yet
CH 3 2
17 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
Python Libraries
No ratings yet
Python Libraries
77 pages
Adobe Scan 15 Apr 2025
No ratings yet
Adobe Scan 15 Apr 2025
19 pages
Python Libraries
No ratings yet
Python Libraries
17 pages
4220 6 (DataFormat)
No ratings yet
4220 6 (DataFormat)
15 pages
Python Complet Test
No ratings yet
Python Complet Test
3 pages
Introduction to Popular-1
No ratings yet
Introduction to Popular-1
15 pages
Exp-1
No ratings yet
Exp-1
22 pages
Data Science Lab-KTU
No ratings yet
Data Science Lab-KTU
5 pages
Top 18 Python Libraries
100% (1)
Top 18 Python Libraries
11 pages
Natural Language Processing
No ratings yet
Natural Language Processing
38 pages
Exp No. 1-3 (MLC)
No ratings yet
Exp No. 1-3 (MLC)
12 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Lista de Documentales
No ratings yet
Lista de Documentales
66 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
11 pages
ATL Skills
100% (1)
ATL Skills
1 page
practical 1
No ratings yet
practical 1
2 pages
suraj report file
No ratings yet
suraj report file
17 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Epidemiology, Pathogenesis, Microbiology, And Diagnosis of Hospital-Acquired and Ventilator-Associated Pneumonia in Adults - UpToDate
No ratings yet
Epidemiology, Pathogenesis, Microbiology, And Diagnosis of Hospital-Acquired and Ventilator-Associated Pneumonia in Adults - UpToDate
19 pages
AD-502 Machine Learning Lab_Exp 1-10 (1)
No ratings yet
AD-502 Machine Learning Lab_Exp 1-10 (1)
13 pages
Micro Project Report Format
No ratings yet
Micro Project Report Format
11 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Assignment 01
No ratings yet
Assignment 01
7 pages
Strategy Marketing Plans and Small Organisations
No ratings yet
Strategy Marketing Plans and Small Organisations
119 pages
DVAP - Final Project Report
No ratings yet
DVAP - Final Project Report
27 pages
Part A Assignment_No_1
No ratings yet
Part A Assignment_No_1
7 pages
SMA 3
No ratings yet
SMA 3
3 pages
DVAP - Final Project Report
No ratings yet
DVAP - Final Project Report
27 pages
AIES Assignment1
No ratings yet
AIES Assignment1
15 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
GEELY Corporate Presentation PDF
No ratings yet
GEELY Corporate Presentation PDF
20 pages
What Kind of Strategies Can Fourth Grade Student Use To Master Learning Their Multiplication Facts
No ratings yet
What Kind of Strategies Can Fourth Grade Student Use To Master Learning Their Multiplication Facts
59 pages
10 Algorithms That Dominate The World
No ratings yet
10 Algorithms That Dominate The World
26 pages
Se Study On Tesla Motors: Analysis of The Business Model and Growth Strategy
No ratings yet
Se Study On Tesla Motors: Analysis of The Business Model and Growth Strategy
26 pages
Appendix e
No ratings yet
Appendix e
17 pages
Method Statement For Electrical Works
No ratings yet
Method Statement For Electrical Works
14 pages
Jayram Aryal Etabs Report
No ratings yet
Jayram Aryal Etabs Report
45 pages
ATS320 AC ATS Controller User Manual V1.0
No ratings yet
ATS320 AC ATS Controller User Manual V1.0
17 pages
Sales Vs Advertisement Case Study
No ratings yet
Sales Vs Advertisement Case Study
14 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Horoscope Matching: Venkat.. Lakshmi
No ratings yet
Horoscope Matching: Venkat.. Lakshmi
6 pages
Software Design
No ratings yet
Software Design
5 pages
Kursus Jurulatih Utama Kurikulum Standard Sekolah Rendah (KSSR) 2011 Bahasa Inggeris-Tahun 2
No ratings yet
Kursus Jurulatih Utama Kurikulum Standard Sekolah Rendah (KSSR) 2011 Bahasa Inggeris-Tahun 2
26 pages
Synchro Reviewer (Multiple Choice)
No ratings yet
Synchro Reviewer (Multiple Choice)
5 pages
Guide AIX Monitoring
No ratings yet
Guide AIX Monitoring
31 pages
TIME CIRCUITS Version MAR 2019: Enter A Date To Destination Time
No ratings yet
TIME CIRCUITS Version MAR 2019: Enter A Date To Destination Time
5 pages
Corrosion
No ratings yet
Corrosion
6 pages
red takis nutrition facts - Google Search
No ratings yet
red takis nutrition facts - Google Search
1 page
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
From Everand
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Michael Walker
5/5 (1)
Pandas Cookbook: Practical recipes for scientific computing, time series, and exploratory data analysis using Python
From Everand
Pandas Cookbook: Practical recipes for scientific computing, time series, and exploratory data analysis using Python
William Ayd
No ratings yet
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
From Everand
Introduction to Algorithms & Data Structures: A solid foundation for the real world of machine learning and data analytics
Bolakale Aremu
No ratings yet
Express Guide: Learn Any Web Builder or Content Management System
From Everand
Express Guide: Learn Any Web Builder or Content Management System
Martin Berlove
No ratings yet
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
From Everand
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
Fabio Nelli
No ratings yet
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
From Everand
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
Prateek Gupta
No ratings yet

Large Language Model Based Search Tool Prototype

Uploaded by

Large Language Model Based Search Tool Prototype

Uploaded by

LARGE LANGUAGE MODEL BASED

SEARCH TOOL PROTOTYPE

NASA is a research-based institution, and its primary mission is to conduct research

You might also like