Paddu 2
Paddu 2
1. Can you explain your project where you extensively used Generative AI technology?
o EDA Chatbot: This project involved developing an Exploratory Data Analysis (EDA)
chatbot to guide users through the data analysis process. By integrating NLP
techniques, the chatbot could understand user queries, generate summary statistics,
and visualize data distributions, making EDA accessible to non-technical users.
2. What problem does your project aim to solve, and why did you choose this approach?
o The EDA chatbot addresses the difficulty non-technical users face in exploring and
analyzing datasets. By automating the EDA process, the tool simplifies data
exploration, making insights more accessible and actionable.
3. How does your project work? Please explain the process in detail (why, where, when, and
how).
o The chatbot uses Python libraries like Pandas and NumPy for data manipulation,
Matplotlib/Seaborn for visualizations, and NLP libraries like NLTK and SpaCy to
interpret user queries. It processes datasets, performs exploratory analyses, and
provides visual outputs, all within a conversational framework.
o Tools and libraries: Python, Pandas, NumPy, Matplotlib, Seaborn, NLTK, SpaCy, and
Flask.
o While LLMs were not explicitly mentioned, NLP techniques were employed using
libraries like NLTK and SpaCy to interpret user inputs in the EDA chatbot.
o Tasks like query understanding, tokenization, and named entity recognition were
performed using NLP libraries.
3. Why did you select a particular LLM (e.g., GPT, BERT) for your project?
o The choice of SpaCy and NLTK was influenced by their ease of integration and
capability to handle specific NLP tasks efficiently.
4. How did you handle the limitations or challenges of using LLMs in your project?
o Challenges like handling diverse dataset formats were mitigated by extensive testing
and ensuring flexibility in the chatbot’s architecture.
1. Did you use RAG in your project? If yes, how was it implemented?
o Not explicitly mentioned in the project. However, retrieval mechanisms were
indirectly employed to process user queries and generate relevant responses.
2. How did you design the retrieval process to fetch relevant information?
o Query processing was based on NLP techniques to identify user needs and match
them with appropriate analytical responses.
3. What data sources or knowledge bases did you use for retrieval?
4. How did you ensure that the generated responses were accurate and contextual?
o Extensive testing and user feedback ensured the chatbot’s responses were relevant
and reliable.
Finetuning Questions
1. Did you fine-tune the LLM or other models in your project? If yes, how?
o Fine-tuning was not explicitly mentioned but feature engineering was extensively
used in the Customer Churn Prediction project.
2. What datasets were used for fine-tuning, and how were they prepared?
o Cleaned and preprocessed customer behavior data and transaction history were
used.
o Improved model accuracy and reliability, as seen in the Customer Churn Prediction
project.
o Challenges like feature selection and hyperparameter tuning were tackled through
iterative experimentation.
o NLP tokenization techniques were used to chunk user inputs for analysis.
o Not directly mentioned, but embeddings were likely not a part of the described
projects.
3. How were the embeddings used in your project (e.g., for search, similarity, clustering)?
3. What were the main challenges you faced in implementing NLP techniques?
2. How did you optimize the performance of your deep learning model?
o Not applicable.
3. Did you use any pre-trained models? If so, how did you integrate them?
4. How did you ensure that your model generalizes well to unseen data?
o Logistic Regression and Random Forest were used in the Customer Churn Prediction
project.
2. How did you perform feature engineering or selection for your ML models?
o By identifying and using key customer behavior features to enhance model accuracy.
MLOps Questions
o Deployment specifics were not described, but Flask was used for hosting the EDA
chatbot.
3. What tools and techniques did you use for CI/CD in your ML pipeline?
o Not applicable.
4. How did you manage the lifecycle of your machine learning models?
1. Can you write a Python script to preprocess a dataset for NLP tasks?
o Yes, leveraging libraries like Pandas, NLTK, and SpaCy for tokenization, lemmatization,
and cleaning.
2. How would you implement a basic neural network using Python and a deep learning
library (e.g., TensorFlow, PyTorch)?
6.
o These metrics evaluate model performance, balancing false positives and false
negatives, crucial for imbalanced datasets.
4. How would you design a scalable system for deploying your project in a production
environment?
1. Can you explain your project where you extensively used Generative AI technology?
One of the key projects was the development of an Exploratory Data Analysis (EDA)
Chatbot. This chatbot leverages Natural Language Processing (NLP) techniques to interact
with users and assist them in performing EDA tasks. The project aimed to simplify data
exploration for non-technical users by automating tasks like generating summary statistics,
identifying outliers, and visualizing distributions or relationships within datasets.
2. What problem does your project aim to solve, and why did you choose this approach?
EDA Chatbot Problem Statement: Many non-technical users lack the expertise to perform
EDA, which is a critical step in understanding data and preparing it for modeling. Tools like
Python and R require coding knowledge, creating a barrier for business users, analysts, or
small-scale entrepreneurs.
Approach: A chatbot using Generative AI/NLP was chosen to act as a conversational guide,
breaking down complex EDA tasks into simple, understandable actions. By integrating Python
libraries such as Pandas, NumPy, and Matplotlib, and an NLP framework like Rasa, the
chatbot could understand queries and deliver actionable insights in natural language.
3. How does your project work? Please explain the process in detail.
1. User Input: The chatbot receives a query, such as "Can you show me the distribution
of sales data?"
2. Natural Language Understanding (NLU): NLP libraries process the text, extracting
entities (e.g., "sales data") and intent (e.g., "show distribution").
3. Dataset Processing: The chatbot validates the dataset provided by the user, ensuring
compatibility (e.g., handling missing values, normalizing data).
4. EDA Task Execution: Depending on the query, the chatbot performs tasks like
generating histograms, summary tables, or correlation matrices using Matplotlib and
Seaborn.
The EDA chatbot incorporated NLP techniques, and in a potential upgrade, LLMs like GPT
could be integrated for better contextual understanding and conversational capabilities. For
now, libraries like SpaCy and NLTK were used to process and understand user inputs
effectively.
In this project:
o Query Understanding: Extracting intent and relevant entities from user queries (e.g.,
“average sales in 2023”).
3. Why did you select a particular LLM (e.g., GPT, BERT) for your project?
While I didn’t use GPT or BERT directly, I relied on Rasa’s dialogue management capabilities,
which are suitable for domain-specific chatbots requiring minimal computational overhead.
4. How did you handle the limitations of using LLMs in your project?
By ensuring the chatbot’s scope was well-defined to prevent inaccuracies. For example:
NLP Questions
Steps included:
ML Questions
In the Customer Churn Prediction project, algorithms like Logistic Regression, Random
Forest, and Gradient Boosting were implemented.
For feature selection, I used Recursive Feature Elimination (RFE) and analyzed feature
importance scores.
1. Can you write a Python script to preprocess a dataset for NLP tasks?
import pandas as pd
# Load Dataset
data = pd.read_csv('dataset.csv')
def preprocess_text(text):
# Tokenize
tokens = word_tokenize(text.lower())
# Lemmatize
lemmatizer = WordNetLemmatizer()
# Apply to Dataset
data['processed_text'] = data['text_column'].apply(preprocess_text)
print(data.head())
import tensorflow as tf
model = Sequential([
])