How To Enhance AI Chatbots With Real-Time Data From Bright Data Using OpenAI and LangChain - by Victor Yakubu - Jan, 2025 - Python in Plain English
How To Enhance AI Chatbots With Real-Time Data From Bright Data Using OpenAI and LangChain - by Victor Yakubu - Jan, 2025 - Python in Plain English
Get unlimited access to the best of Medium for less than $1/week. Become a member
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 1/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
This article covers the process of enhancing your AI chatbot by using Bright Data’s
dataset — which is used to power AI and LLMs, OpenAI’s cutting-edge language
models, and LangChain’s advanced data processing tools. You will use these tools to
create a chatbot that delivers accurate, context-rich, real-time responses to user
queries.
1. Enhanced Accuracy
Chatbots can provide precise answers that reflect the latest information available by
incorporating up-to-date datasets like Bright Data’s Wikipedia Articles. This ensures
users receive accurate responses, especially for topics where facts are constantly
changing, such as scientific discoveries, current events, or product updates.
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 2/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
4. Competitive Advantage
Businesses that deploy chatbots with real-time data gain an advantage over
competitors using static systems. They can respond faster to evolving customer
needs and more effectively adapt to market changes.
Suppose the specific data you require isn’t available. In that case, Bright Data also
enables you to build a custom web data pipeline tailored to your needs using either
the code or no code option available. This flexibility makes the platform a go-to
choice for developers and non-developers seeking robust, scalable, and relevant
data for their AI systems.
For this article, you’ll use Wikipedia articles about engineering from Bright Data’s
Data for AI dataset to build a chatbot. Follow these steps to access and prepare the
dataset:
This will redirect you to a curated list of datasets specifically designed for AI
applications.
Within this category, select “Wikipedia articles about engineering.” This dataset
contains structured information relevant to engineering topics, perfect for
training or enhancing AI chatbots.
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 4/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
You can choose between JSON or CSV formats (this article uses the CSV format).
Once purchased, the dataset will be downloaded to your local machine. Save it
as Wikipedia_articles.csv for easy access.
What are the benefits of using Bright Data’s data for AI Dataset?
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 5/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
6. Scalability: You can fetch large datasets without manual collection or worrying
about changes in website structure or anti-scraping mechanisms.
Step 1: Prerequisites
Before proceeding, ensure you have the following tools and data set up:
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 6/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
2. OpenAI API Key: Register with OpenAI and generate your API key.
3. Python Environment: Install the necessary Python libraries using the following
command:
4. Project Structure: Organize your files as outlined in the folder structure below:
your_project_folder/
├── app.py
├── wiki_chatbot.py
├── Wikipedia_articles.csv
└── templates/
└── index.html
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 7/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
Categories: {row['categories']}
""", axis=1)
Use the FAISS library to build a vector database from the processed text,
allowing the chatbot to retrieve relevant information efficiently.
self.qa_chain = ConversationalRetrievalChain.from_llm(
llm=self.llm,
retriever=self.vectorstore.as_retriever(),
return_source_documents=True
)
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 8/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
import pandas as pd
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
from typing import List, Dict
class WikipediaChatbot:
def __init__(self, openai_api_key: str):
"""
Initialize the Wikipedia chatbot with OpenAI API key
Args:
openai_api_key (str): Your OpenAI API key
"""
self.openai_api_key = openai_api_key
os.environ["OPENAI_API_KEY"] = openai_api_key
Args:
csv_path (str): Path to the CSV file containing Wikipedia data
"""
# Read CSV file
df = pd.read_csv(csv_path)
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 9/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
chunk_overlap=200,
length_function=len
)
texts = []
for text in df['combined_text'].tolist():
texts.extend(text_splitter.split_text(text))
Args:
query (str): User's question
Returns:
Dict: Response containing answer and source documents
"""
# Using invoke() instead of direct call
result = self.qa_chain.invoke({
"question": query,
"chat_history": self.chat_history
})
return {
"answer": result["answer"],
"sources": [doc.page_content for doc in result["source_documents"]]
}
Instantiate the chatbot and load the processed dataset during app initialisation.
chatbot.load_and_process_data(csv_path)
The /api/chat endpoint receives user queries, fetches responses from the
chatbot, and returns answers with sources.
@app.route('/api/chat', methods=['POST'])
def chat():
data = request.json
user_message = data.get('message', '')
response = chatbot.get_response(user_message)
return jsonify({
'answer': response['answer'],
'sources': response['sources'][:3]
})
app = Flask(__name__)
@app.route('/')
def home():
return render_template('index.html')
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 11/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
@app.route('/api/chat', methods=['POST'])
def chat():
try:
data = request.json
user_message = data.get('message', '')
if not user_message:
return jsonify({'error': 'No message provided'}), 400
response = chatbot.get_response(user_message)
return jsonify({
'answer': response['answer'],
'sources': response['sources'][:3] # Limiting to top 3 sources for
})
except Exception as e:
return jsonify({'error': str(e)}), 500
if __name__ == '__main__':
app.run(debug=True)
1. Chatbox UI:
The interface allows users to type questions, view bot responses, and check
sources.
CSS is used to style the chat container, user and bot messages, and loading
indicators.
JavaScript captures user input and sends it to the /api/chat endpoint using
Axios.
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 12/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
3. Display Responses:
Bot responses, along with truncated source information, are displayed in the
chat window.
You can find the code for the index.html file here.
python app.py
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 13/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
Key Features
Source Attribution: Users can verify information with links to original sources.
Scalable Design: The setup allows integration with other Bright Data datasets for
diverse applications.
Final Thoughts
This article covered the process of building a chatbot with real-time data obtained
through Bright Data’s Data for AI platform. It demonstrated how straightforward it is
to access structured public data, specifically tailored for AI and LLMs, without the
need for complex scripting or manual extraction.
By integrating the Wikipedia datasets with OpenAI models and LangChain, you built
a chatbot that is capable of providing accurate and up-to-date responses in even the
most specialised fields in engineering.
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 14/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
Follow
New Python content every day. Follow to join our 3.5M+ monthly readers.
Follow
Software Developer and Technical Writer. I am passionate about simplifying the web, one post at a time.
No responses yet
Respond
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 15/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
How to Scrape and Analyse Data for Free using AI: From Collection to
Insight
Learn how to combine web scraping, proxies, and AI-powered language models to automate
data extraction and gain actionable insights…
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 16/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
Jan 20 1.97K 47
Jan 11 1.7K 21
Use Langchain, OpenAI, Bright Data, and NextJS to build an AI tool that scrapes, extracts, and
analyzes data for free.
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 18/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
Jan 12 76
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 19/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
5d ago 253 3
Lists
ChatGPT prompts
51 stories · 2549 saves
Shobhit Agarwal
Jan 28 2.4K 26
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 20/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
Daniel Avila
6d ago 306 10
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 21/22
2025/2/9 晚上8:11 How to Enhance AI Chatbots with Real-Time Data from Bright Data using OpenAI and LangChain | by Victor Yakubu | Jan, 20…
5d ago 297 3
Joey O'Neill
Jan 14 48
https://python.plainenglish.io/how-to-enhance-ai-chatbots-with-real-time-data-from-bright-data-using-openai-and-langchain-793bb88934f8 22/22