Infosys Internship 4.0 Project Documentation NEW
Infosys Internship 4.0 Project Documentation NEW
Team Members:
•Introduction:
The Infosys Chatbot project aims to develop an intelligent and interactive chatbot
that enhances customer engagement and operational efficiency across various
business sectors. Leveraging advanced technologies such as artificial intelligence
(AI), machine learning (ML), and natural language processing (NLP), the chatbot is
designed to provide real-time support, automate routine tasks, and deliver
personalized experiences to users.
Objectives:
• Provide instant responses to customer queries.
• Improve customer satisfaction by offering 24/7 assistance
• To enhance user experience by allowing real-time interactions through a
user-friendly interface.
1
CHATBOT
Significance:
• Customer Satisfaction : By providing quick and accurate responses, the
chatbot significantly enhances the customer experience.
• Scalability: The chatbot can handle a large volume of interactions
simultaneously, making it scalable for businesses of all sizes.
• Innovation: The project showcases Infosys' commitment to leveraging
cutting-edge technology to solve business challenges.
•Project Scope:
Include :
d
• Task 1: Web Scraping - Extracting information from Infosys's website
2
CHATBOT
•Requirements:
Functional Requirements
Web Scraping : Utilize web scraping tools such as requests and BeautifulSoup to
extract relevant content from designated sections of the Infosys website. This
includes sections like the company overview, history, subsidiaries, and newsroom
pages. The scraped data will be structured and cleaned for further processing.
Data Storage : Store the scraped text data as embeddings in the Qdrant vector
database. This process involves converting the text data into vector
representations (embeddings) that can be efficiently searched and retrieved
when needed.
Integration: Connect the system with the OpenAI API to leverage the Retrieval-
Augmented Generation (RAG) model. This integration allows the chatbot to use
advanced natural language processing capabilities to generate enhanced,
contextually relevant responses to user queries.
User Interface: Develop a responsive and intuitive user interface using the
Streamlit framework. The UI should allow users to interact with the chatbot easily,
input queries, and receive responses in a user-friendly manner.
Non-Functional Requirements
Performance : Ensure the system delivers quick response times for user queries.
The chatbot should process and return answers promptly, maintaining high
performance under varying loads.
Scalabilit : Design the system to handle multiple concurrent user interactions
y
3
CHATBOT
Security: Implement robust security measures to protect user data and ensure
compliance with data privacy regulations (e.g., GDPR). This includes data
encryption, secure API communication, and regular security audits.
•Technical Stack:
Programming Languages
• Python: Used for backend processing, web scraping, data handling, and
integration with external APIs. Its extensive support for machine learning
and AI integration also makes it ideal for this project.
Frameworks/Libraries:
4
CHATBOT
Databases:
• Qdrant : Qdrant is a vector database designed for efficient similarity search
and storage of high-dimensional vector embeddings. It is crucial for
handling the embeddings generated from the scraped text data, enabling
quick and relevant responses to user queries.
Tools/Platforms:
• GitHub: Used for version control and collaboration, allowing for efficient
management of code changes and team collaboration.
• Docker: Used for containerizing the application, ensuring consistency across
different environments and simplifying deployment processes.
• Streamlit: Provides an easy-to-use platform for deploying and sharing
Streamlit applications, enabling seamless user interactions.
•Architecture/Design:
System Architecture
• Web Scraping Module: Retrieves content from specific sections of the
Infosys website (about, history, subsidiaries, newsroom) using Python's
requests and BeautifulSoup libraries.
• Data Processing and Storage: Converts the extracted text data into
embeddings using a language model. Stores these embeddings in the
Qdrant vector database, allowing efficient retrieval based on similarity
searches.
5
CHATBOT
Data Extraction:
• Challenge: Handling dynamic and complex web content during the
scraping
process.
•Solution: Refined scraping algorithms to manage dynamic content loading
and utilized techniques like parsing JavaScript-rendered data.
Integration Complexity:
6
CHATBOT
•Testing: Unit
Tests:
• Focus on validating individual module functionalities to ensure each
part works correctly in isolation.
• Examples include testing the web scraping functions to ensure accurate
data extraction and checking the API endpoints.
Integration Tests:
• Verify that the different modules (web scraping, data storage, AI
integration, and UI) work together seamlessly.
• Tests include ensuring data flows correctly from scraping to storage and
then to AI for response generation.
System Tests:
• Conduct end-to-end testing to verify the complete functionality of the
system, simulating user interaction scenarios.
• Ensure that user queries are processed correctly, and responses are
generated and displayed accurately in the UI.
Results
Identified and Resolved Issues:
• Data parsing inconsistencies were detected and corrected to improve the
accuracy of the extracted information.
• API integration mismatches were addressed to ensure smooth
communication between the components.
Performance Tests:
• Conducted to confirm that the system can handle expected user loads with
good scalability and responsiveness.
7
CHATBOT
Installation:
• Ensure Python is installed.
• Install required packages: beautifulsoup4, requests, streamlit, openai, and
qdrant-client .
Configuration:
• Set up API keys and sensitive information using environment variables or a
.env file.
• Ensure Qdrant is configured and running locally or on a server.
8
CHATBOT
•Conclusion :
The Infosys Chatbot project successfully developed an intelligent and responsive
chatbot capable of handling a variety of Infosys-related queries. This achievement
was made possible through the integration of several key components:
Effective Web Scraping: Extracted relevant content from various sections of the
Infosys website using BeautifulSoup and requests, providing a robust dataset for
the chatbot.
Efficient Data Storage and Retrieval: Utilized Qdrant, a vector database, to store
text embeddings, ensuring quick and efficient data retrieval for processing user
queries.
9
CHATBOT
10