0% found this document useful (0 votes)
87 views34 pages

Gen - AI Project Report

Uploaded by

SHASHANK PATIL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views34 pages

Gen - AI Project Report

Uploaded by

SHASHANK PATIL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

PROJECT STAGE-1 REPORT

ON

GenAI-ModelHub
(SDG -9 Industry,
Innovation and
Infrastructure)

SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE DEGREE


OF
BACHELOR OF ENGINEERING IN
ARTIFICIAL INTELLIGENCE AND DATA SCICENCE

By

Ajinkya Temgire Exam. No: B1902502121


Digvijay Mane Exam. No: B1902502067
Priyanka Patil Exam. No: B1902502085
Arpita Pol Exam. No: B1902502097

Under the Guidance of


Mr. S. D. Kale

Submitted to
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
AISSMS INSTITUTE OF INFORMATION TECHNOLOGY (IOIT),
PUNE - 411001
Academic Year (2024 -2025)
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CERTIFICATE

This is to certify that the Project Stage-1 Report entitled

GenAI-ModelHub
Submitted By

Ajinkya Temgire Exam. No: B1902502121


Digvijay Mane Exam. No: B1902502067
Priyanka Patil Exam. No: B1902502085
Arpita Pol Exam. No: B1902502097

is a bonafide work carried out by them under the supervision of Mr. S. D. Kale and it is approved
for the partial fulfilment of the requirement of Savitribai Phule Pune University for the Project
Stage-1 in the Final Year of Artificial Intelligence and Data Science.

Mr. S. D. Kale Dr. R.A. Jamadar Dr. P. B. Mane


Guide H.O.D Principal
Dept. of AI&DS Dept. of AI&DS AISSMS IOIT, Pune

Place: Pune

Date: / /2024 External Examiner

ii
2
GenAI-ModelHub: Generative AI powered Data Science Automation platform

ACKNOWLEDGEMENT

We would like to take this opportunity to thank all the people who were part of this seminar
in numerous ways, people who gave un-ending support right from the initial stage.
In particular we wish to thank Mr. S. D. Kale as internal project guide who gave their co-
operation timely and precious guidance without which this project would not have been a
success. We thank them for reviewing the entire project with painstaking efforts and more of
their, unbanning ability to spot the mistakes.
We would like to thank our H.O.D Dr. R.A. Jamadar for his continuous
encouragement, support and guidance at each and every stage of project.
And last but not the least we would like to thank all my friends who were Associated with me
and helped me in preparing my project. The project named “GenAI-ModelHub” would not
have been possible without the extensive support of people who were directly or indirectly
involved in its successful execution.

Project Group Members:

Ajinkya Temgire Exam. No: B1902502121


Digvijay Mane Exam. No: B1902502067
Priyanka Patil Exam. No: B1902502085
Arpita Pol Exam. No: B1902502097

iii
3
GenAI-ModelHub: Generative AI powered Data Science Automation platform

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA


SCIENCE

iv
4
GenAI-ModelHub: Generative AI powered Data Science Automation platform

ABSTRACT

Data science and machine learning are fields characterized by rapid growth and increasing complexity,
posing a range of challenges for practitioners. From inefficiencies in querying databases with SQL and
Pandas to the complexities of model optimization and hyperparameter tuning, these tasks demand
considerable time and expertise. ML Model HUB is designed to streamline these processes by offering an
integrated platform that utilizes advanced technologies like Large Language Models (LLMs) and Retrieval-
Augmented Generation (RAG) concepts. By bridging the gap between SQL and Pandas for efficient data
querying, generating robust baseline models, and optimizing deep learning architectures, ML Model HUB
seeks to simplify and enhance the data science workflow. The platform's contribution lies in reducing the
technical barriers associated with machine learning and data science tasks, empowering users to build
optimized models more efficiently. This project aims to provide a practical solution to ongoing issues in
model development and data processing, thereby enhancing productivity and innovation in the field

v
5
GenAI-ModelHub: Generative AI powered Data Science Automation platform

TABLE OF CONTENTS

Chapters Title Page Number


Acknowledgment iii
Abstract v
Table of Contents vi
Chapter 1 Introduction
1.1 Introduction 2
1.2 Motivation 3
1.3 Problem Statement 3
1.4 Objectives of the proposed work 4
Chapter 2 Literature Review
2.1 Literature Review 6
2.2 Literature Summary Table 7-10
Chapter 3 Requirements and Analysis
3.1 Requirement Analysis 12
3.2 Problem Analysis 13
3.3 Software Requirement 13
3.4 Hardware Requirements 13
Chapter 4 Design
4.1 Architecture 15
4.2 Project Modules 16
4.3 Algorithm Strategy 17
4.4 Mathematical Model 18
Chapter 5 Conclusion
5.1 Conclusion 20
Chapter 6 Project Stage II Plan
6.1 Project Stage II Plan 22
Chapter 7 References
7.1 References 24
Appendix
Project Title Mapping with PO’s and
1 PSO’s (With Justification) 26-27
2 Critical Thinking Questionnaire 27
Poster Presentation Participation
3 Certificate 28

vi
6
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CHAPTER 1
INTRODUCTION

1
GenAI-ModelHub: Generative AI powered Data Science Automation platform

INTRODUCTION

1.1 Introduction
In the rapidly evolving fields of data science and machine learning, practitioners encounter numerous
challenges that hinder efficient workflow and optimal model performance. A significant issue lies in the
complexity and inefficiency of data querying, where transitioning between SQL databases and Pandas-
based data frames often leads to redundancies and errors. This inefficiency can slow down the entire data
processing pipeline, which is fundamental to model development. Furthermore, generating robust baseline
models, fine-tuning hyperparameters, and optimizing deep learning architectures are tasks that are not only
time-consuming but also demand a high level of expertise. Each of these tasks involves intricate knowledge
of various tools and techniques, from parameter optimization to model architecture design, which can
overwhelm even experienced data scientists.

The need for a new solution that can streamline these processes has become critical, particularly given the
growing demand for faster, more accurate model deployments. Existing platforms primarily focus on
specific aspects of model development, lacking comprehensive support that covers the full data science
workflow, from data querying to model optimization. ML Model HUB is proposed as an integrated platform
to address this gap, leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation
(RAG) to assist users across the entire data science pipeline. These technologies aim to make data querying
more efficient, help in generating robust baseline models, and facilitate fine-tuning and optimization in deep
learning models.

This report will discuss the detailed problem definition, justifying the need for an advanced platform like
ML Model HUB. It will compare existing systems, highlighting how ML Model HUB's innovative use of
LLMs and RAG sets it apart. The report will also outline the organization of ML Model HUB’s features
and the benefits of its comprehensive approach, ultimately demonstrating how this platform can enhance
productivity and innovation in the data science community.

2
GenAI-ModelHub: Generative AI powered Data Science Automation platform

1.2 Motivation

The motivation for Gen-AI Model HUB stems from the persistent challenges that data scientists and
machine learning practitioners face in managing complex workflows. In the data science pipeline, efficient
data querying and transformation are essential yet time-consuming tasks, especially when transitioning
between SQL-based databases and Pandas data frames for analysis. These transitions often require
duplicative efforts and a deep understanding of both environments, making it difficult to move quickly from
data retrieval to model building. Additionally, building robust baseline models and fine-tuning
hyperparameters involve intricate processes that require significant expertise. Deep learning models, while
powerful, further complicate the workflow with their demands on computational resources and knowledge
of architecture optimization. While there are individual tools for querying, model selection, and tuning, an
end-to-end platform that seamlessly integrates all these functionalities remains unavailable.

ML Model HUB aims to address this gap by leveraging the latest advancements in Large Language
Models (LLMs) and Retrieval-Augmented Generation (RAG) to create a unified platform that supports
every stage of the data science process. By automating and simplifying data querying, ML Model HUB
facilitates smoother transitions between SQL and Pandas, minimizing the time spent on redundant tasks.
The integration of LLMs allows the platform to assist in generating baseline models, offering intelligent
suggestions for hyperparameter tuning, and optimizing deep learning architectures, thus making advanced
techniques accessible to practitioners of all expertise levels. This comprehensive approach not only
improves productivity but also empowers data scientists to focus on deriving insights and creating
innovative solutions, freeing them from the complexities of managing fragmented tools and processes.

1.3 Problem Statement

Data scientists and analysts frequently encounter challenges in efficiently transitioning between SQL and
Pandas for data manipulation and querying, leading to a workflow that is both cumbersome and prone to
errors. Moreover, the tendency to bypass the establishment of robust baseline models in favor of more
complex approaches often results in suboptimal outcomes and overlooked insights. Hyperparameter
tuning, a crucial yet complex process, demands expert knowledge and iterative experimentation, further
complicating the model development process. Additionally, the determination of optimal configurations
for convolutional layers, pooling layers, strides, and filters in deep learning models remains a significant
challenge, often relying on trial and error. This project addresses these critical challenges by developing
an integrated platform that seamlessly bridges the gap between SQL and Pandas, automates the creation
of baseline models, streamlines hyperparameter tuning, and optimizes deep learning architectures, all
while providing comprehensive guidance and support through advanced AI-driven tool.

3
GenAI-ModelHub: Generative AI powered Data Science Automation platform

1.4 Objectives

The primary objectives of this research are:

1. To convert user queries into SQL and Pandas queries using LLMs with RAG, supported by an
interactive chatbot for real-time assistance.

2. To efficiently generate baseline models with automated preprocessing, training, and accuracy
evaluation, complemented by data visualizations and detailed statistical insights.

3. To enable users to fine-tune model hyperparameters through an intuitive UI, providing explanations
on bias-variance trade-offs and recommending optimal settings.

4. To identify optimal deep learning parameters (e.g., convolution layers, pooling layers) for image
data through automated analysis and a user-friendly experimentation UI.

4
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CHAPTER 2
LITERATURE REVIEW

5
GenAI-ModelHub: Generative AI powered Data Science Automation platform

LITERATURE REVIEW

2.1 Literature Review

The paper by Rutuja Nikum, Vaishnavi Shinde, and Vijay Khadse focuses on a noise filter mechanism
within neural machine translation (NMT) for translating textual queries into Python source code using
transformer models. They developed a system employing a self-attention-based encoder-decoder
transformer architecture to translate English queries into Python code, achieving a BLEU score of 0.78. The
model was retrained with merged datasets to improve accuracy, and the resulting system includes a Flask-
based UI, enhancing user interaction and accessibility. This work demonstrates the potential of transformer-
based models in automated code generation for query-based programming assistance. [11]

Mohammad Latif Siddiq, Shafayat H. Mujumder, Maisha R. Mim, Sourav Jajodia, and Joanna C.S. Santos
conducted an empirical study investigating the presence of code smells and security vulnerabilities in
transformer-based code generation techniques, specifically in models like GPT-Neo and GitHub Copilot.
Utilizing tools such as Pylint and Bandit, they discovered that code generated by these models often contains
issues rooted in the training datasets, which may have inadvertently included faulty code. Their findings
highlight a significant concern regarding the reliability of AI-generated code and underscore the need for
research on improving code quality in transformer-based generation tools. [10]

The research by M.R. Aadhil Rushdy and Uthayasanker Thayasivam proposes a Seq2Seq-based transformer
model with enhanced encoding and decoding techniques specifically for Text-to-SQL generation. Their
model achieved high performance on the Spider dataset, with an Exact Match accuracy of 72.7% and
Execution accuracy of 80.2%, illustrating its efficiency in converting natural language to SQL queries. This
study emphasizes the value of advanced encoding-decoding strategies in improving transformer-based
Text-to-SQL performance, a promising development for query translation applications in data science and
database management. [9]

In a comprehensive survey, Monica and Parul Agrawal review hyperparameter optimization methods
essential for enhancing machine learning model performance and generalization. Their study focuses on
techniques such as grid search, random search, and Bayesian optimization, as well as meta-heuristic
approaches like genetic and evolutionary algorithms. They emphasize the efficacy of Bayesian optimization
for complex tuning and propose that future work involving advanced meta-heuristic algorithms could
further streamline hyperparameter tuning. This survey offers valuable insights into the role of optimized
hyperparameters in achieving robust machine learning models and sets a foundation for exploring
innovative optimization strategies.

6
GenAI-ModelHub: Generative AI powered Data Science Automation platform

2.2 Literature Summary Table:

Research Paper Year of Publication & Methodology Major Findings


Title Authors Adapted
Application of Noise 2023 – Seq2Seq-based The research
Filter Mechanism for M.R. Aadhil Transformer Model introduces a
T5-Based Text- Rushdy, with Enhanced Seq2Seq-based
to-SQL Generation Uthayasanker Encoding and Text-to-SQL model
Thayasivam Decoding that enhances
Techniques. performance on the
Spider dataset,
achieving 72.7% Exact
Match and 80.2%
Execution accuracy.

A Survey on 2024 – Grid Search, The study


Hyperparameter Monica, Random Search, underscores the
Optimization of Parul Agrawal Bayesian importance of
Machine Learning Optimization, hyperparameter
Models Genetic Algorithm, optimization in
Evolutionary boosting the
Algorithm. performance and
generalization of
machine learning
models. It highlights
the effectiveness of
techniques like grid
search and Bayesian
optimization, and
suggests that
advanced meta-
heuristic algorithms
could further
enhance future
hyperparameter
tuning.

7
GenAI-ModelHub: Generative AI powered Data Science Automation platform

Textual Query 2022 – Neural Machine The research


Translation into Rutuja Nikum, Translation (NMT) developed a system
Python Source Code Vaishnavi Shinde, using transformer
using Transformers Vijay Khadse architecture for
translating English
queries into Python
code, achieving a
BLEU score of 0.78.
The model utilizes a
self-attention-based
encoder-decoder and
has been retrained
with merged
datasets to enhance
accuracy, with a
Flask-based UI for
improved user
experience.

An Empirical Study 2022- Transformer-based The research found


of Code Smells in Mohammad Latif code Generation that transformer-
Transformers-based Siddiq, techniques using based code
Code Generation Shafayat H. tools Pylint and generation models,
Techniques Mujumder, Bandit such as GPT-Neo and
Maisha R. Mim, GitHub Copilot, often
Sourav jajodia, generate
Joanna C.S. Santos source code
containing code
smells and security
flaws, likely due to
the presence of such
issues in the training
datasets. These
findings underscore
the need for caution
when using
automatically
generated code and
emphasize the
importance of
further research to
improve the quality
of generated code.

8
GenAI-ModelHub: Generative AI powered Data Science Automation platform

On Optimization Quoc V. Le, Jiquan Comparison of The study finds that L-


Methods for Deep Ngiam, Adam Coates, Stochastic Gradient BFGS and CG
Learning Abhik Lahiri, Bobby Descent (SGD) with outperform SGDs in
Prochnow, Andrew Y. alternative optimization specific scenarios, such
Ng methods such as as low-dimensional
Limited-memory BFGS problems (e.g.,
(L-BFGS) and convolutional neural
Conjugate Gradient networks) and tasks
(CG) with line search. involving sparsity (e.g.,
Emphasis on algorithmic sparse autoencoders). L-
extensions like sparsity BFGS achieves a state-
regularization and of-the-art 0.69% error
hardware extensions for rate on the MNIST
distributed optimization. dataset without
distortions or
pretraining. Different
optimization methods
are shown to excel in
different problem
contexts, challenging the
widespread preference
for SGDs.

SQL-PaLM: Improved Ruoxi Sun1 , Sercan Ö. SQL-PaLM uses the SQL-PaLM improves
large language model Arik1 , Alex Muzio1 , PaLM-2 LLM with few- Text-to-SQL accuracy
adaptation for Text-to- Lesly Miculicich1 , shot prompting, on Spider and BIRD by
SQL (extended) Satya Gundabathula1 , instruction fine-tuning, expanding training data
Pengcheng Yin2 , and synthetic data diversity, integrating
Hanjun Dai2 , Hootan augmentation. relevant database
Techniques like content, and optimizing
Nakhost1 , Rajarishi
execution-based error input size via column
Sinha1 , Zifeng Wang1 ,
filtering, selective selection. Test-time
Tomas Pfister1 column encoding, and refinement further boosts
test-time refinement are accuracy, offering
applied to enhance Text- insights into model
to-SQL accuracy on strengths and real-world
complex databases. scaling.

Hyperparameter Haidar Osman, The study investigates Hyperparameter tuning


Optimization to Mohammad Ghafari, and the effects of improves the prediction
Improve Bug Oscar Nierstrasz hyperparameter accuracy of IBK
Prediction Accuracy optimization on bug significantly and either
prediction accuracy improves or maintains
using two machine accuracy for SVM. The
learning models: k- study shows that default
nearest neighbors (IBK) hyperparameters are
and support vector often suboptimal, and
machines (SVM). recommends tuning as a
Experiments are critical step in machine
conducted on five open- learning-based bug
source Java projects to prediction.
assess model
performance with
optimized vs. default
hyperparameters.

9
GenAI-ModelHub: Generative AI powered Data Science Automation platform

A Combinatorial Krishna Khadka, The study introduces a T-way testing


Approach to Jaganmohan novel hyperparameter significantly reduces the
Hyperparameter Chandrasekaran, Yu Lei, optimization (HPO) number of necessary
Optimization Raghu N. Kacker, method using t-way evaluations compared to
D. Richard Kuhn testing, a combinatorial traditional methods
approach that selects ‘t’ (Grid Search, Random
out of ‘n’ Search, Bayesian
hyperparameters to Optimization), achieving
systematically cover similar or better
parameter interactions. performance with fewer
This method aims to runs. This efficient
narrow down the search approach proves
space and reduce advantageous for large
computational demands datasets and complex
while tuning models models, offering a
across diverse datasets. promising alternative to
resource-intensive HPO
techniques.

10
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CHAPTER 3
REQUIRMENT & ANALYSIS

11
GenAI-ModelHub: Generative AI powered Data Science Automation platform

3.1 Requirement Analysis:

The requirement analysis for ML Model HUB involves identifying and outlining the technical, functional,
and operational needs necessary to create a robust platform that integrates data science and machine learning
workflows. This section outlines the specific requirements in steps and points under each step to ensure a
structured approach for development.

Step 1: Functional Requirements

1. Data Querying and Integration


o Ability to handle SQL and Pandas queries seamlessly to minimize redundant data processing.
o Capability to convert SQL queries into Pandas and vice versa for efficient data handling.
o Support for various data sources, including databases and file types (e.g., CSV, JSON).
2. Baseline Model Generation
o Provide options to generate initial models (e.g., regression, classification) based on user input data.
o Capability to recommend algorithms based on data characteristics (e.g., regression for continuous
variables).
o Incorporate basic model evaluation metrics for immediate feedback on baseline performance.
3. Hyperparameter Optimization
o Support for hyperparameter tuning techniques such as grid search, random search, and Bayesian
optimization.
o Option to select and configure optimization techniques for model improvement.
4. Deep Learning Model Optimization
o Provide pre-built architectures (e.g., CNN, RNN) for common deep learning tasks.
o Support for model customization to adjust layers, units, and activations as per user needs.
o Incorporate optimization strategies to improve performance, such as learning rate tuning and dropout
regularization.

Step 2: Technical Requirements

1. Large Language Model (LLM) Integration


o Integrate LLMs to assist with automated data querying and translation between SQL and Pandas.
o Implement Retrieval-Augmented Generation (RAG) to enhance LLM response relevance for specific
data queries.
2. Backend and Data Processing
o Utilize efficient backend support for data processing and model training (e.g., Python with
frameworks like Flask and PyTorch).
o Enable processing and storage of large datasets and ensure real-time responsiveness.
3. Model Management and Deployment
o Implement model versioning and tracking to manage and monitor different model iterations.
o Include options for model deployment and export, allowing users to download or deploy models
easily.

Step 3: Performance Requirements

1. Scalability and Efficiency


o Design the system to handle a high volume of queries and model training requests simultaneously.
o Optimize system performance to reduce latency during data querying, model training, and
hyperparameter tuning.
2. Accuracy and Robustness
o Ensure high accuracy and reliable performance of baseline models, with mechanisms to improve
performance through tuning.

12
GenAI-ModelHub: Generative AI powered Data Science Automation platform

o Provide accurate data translation between SQL and Pandas without loss of data integrity or context.

3.2 Problem Analysis:

The requirement analysis for Gen-AI Model HUB focuses on creating a unified platform that simplifies
complex data science and machine learning workflows. A core requirement is the ability to support both
SQL and Pandas queries, enabling seamless data manipulation and integration from various sources without
redundant transformations. This dual-query capability will bridge the common gap between structured
database environments and Python’s data handling, saving practitioners time and effort. Another key
functional requirement is the generation of baseline machine learning models. By providing automated
options for creating initial models based on data characteristics, ML Model HUB will streamline model
development for users with varying levels of expertise. To maximize usability, the platform will feature a
user-friendly interface that integrates real-time feedback and visualizations, allowing users to track data
insights and model performance effortlessly.

Beyond the core functional requirements, Gen-AI Model HUB requires advanced technical capabilities,
particularly in hyperparameter optimization and deep learning model customization. Techniques such as
grid search, random search, and Bayesian optimization will be built into the platform, with options to apply
meta-heuristic algorithms for more effective tuning. The platform also needs robust backend processing for
handling data-intensive tasks and model training. Integrating Large Language Models (LLMs) and
Retrieval-Augmented Generation (RAG) technologies will enable automated translation of data queries and
boost user efficiency. Security and compliance are additional requirements, as the platform must safeguard
user data and restrict access to authorized users, ensuring that data and model integrity are maintained
throughout the workflow. Together, these requirements lay the foundation for a powerful, accessible tool
that addresses the pressing needs of modern data science and machine learning practitioners.

3.2 Software Requirement

Compiler/Editor: Jupyter Notebook, VS Code

GUI: HTML, CSS, JS

Libraries: SciKit Learn, TensorFlow, LangChain, Google.genAI, Flask, Pandas, NumPy, PyTorch

Database: SQL

Platform: Windows

Language: Python

3.3 Hardware Requirement


Processor - PC with minimum 4 cores @ 3.1GHz
RAM - 8GB.
Graphics Card - 2GB Graphics.

13
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CHAPTER 4
DESIGN

14
GenAI-ModelHub: Generative AI powered Data Science Automation platform

4.1 Architecture

4.1.1 Query Translator 4.1.2 Automated Baseline Model

4.1.3 Advanced Optimisation 4.1.4 CNN Optimisation

4.1.4 Overall System

15
GenAI-ModelHub: Generative AI powered Data Science Automation platform

4.2 Project Modules:

1. Query Translator:
The Query Translator module serves as a bridge between SQL and Pandas, enabling users to translate
their natural language queries into executable SQL and Pandas code.
The steps involved are:
A) Upload Database: Users upload a database file (e.g., .db).
B) Natural Language Input: Users input their queries in natural language.
C) LLM Processing: The system processes the query using an LLM to understand the intent and
generate the appropriate SQL and Pandas code.
D) Output Generation: The generated output includes the SQL query and Pandas code along with a
description explaining the operations performed.

2. Automated Baseline Model:


This component automates the creation of baseline models to provide a reference point for further
model improvements.
The steps involved are:
A) Upload Raw Data: Users upload raw data files (e.g., .csv, .json).
B) Data Cleaning: Basic data cleaning is performed using LLMs to ensure data quality.
C) Conversion and Encoding: Data is numerically converted and encoded as necessary.
D) Model Training: Multiple machine learning models are trained on the processed data.
E) Accuracy Presentation: The accuracy of each model is presented to the user.
F) Code Generation: The system generates the overall code blocks needed for data cleaning and
model training, and provides the cleaned data in a downloadable format. Additionally, statistical
inferences and visualizations (e.g., using Pandas profiling) are provided, with R code for the same.

3. Hyperparameter Tuning:
This module focuses on optimizing the hyperparameters of machine learning models to improve
performance.
The steps involved are:
A) Upload Cleaned Data: Users upload cleaned data files (e.g., .csv, .json).
B) Model Selection: Users select their preferred machine learning model.
C) Hyperparameter Tuning: Hyperparameters are tuned using techniques like Grid Search Cross-
Validation (CV) and Bayesian Optimization.
D) Accuracy Evaluation: The system provides accuracy metrics based on the tuned hyperparameters.
E) Code and Description: The system generates the overall code block and provides explanations on
how each hyperparameter affects the bias-variance trade off.
F) Recommendation: The system recommends optimal hyperparameters for the user’s model.

4. Deep Learning (CNN) Optimization:


This component aims to determine the optimal configuration for deep learning models, specifically
Convolutional Neural Networks (CNNs)
The steps involved are:
A) Upload Data Link: Users provide a link to the dataset containing images for model training.
B) Data Processing: Data is processed using LLMs to ensure it is in an appropriate format for
training.
C) Baseline Model Generation: The system identifies the best baseline model and optimal parameters.
D) User-defined Settings: Users can experiment with different layer settings using the UI.
E) Code Generation:
The system generates the overall code block necessary for model training with the optimal settings.

16
GenAI-ModelHub: Generative AI powered Data Science Automation platform

4.3Algorithm Strategy

For the Gen- AI Model HUB project, the algorithm strategy involves combining a series of machine
learning and deep learning techniques to ensure an efficient, seamless process for data querying, model
generation, optimization, and deployment. Here's an outline of the algorithm strategy:

1. Data Querying and Integration


o Algorithm: Use a hybrid approach combining SQL parsing algorithms and Pandas
DataFrame manipulation techniques. The SQL queries will be transformed into equivalent
Pandas operations, and vice versa, using pre-defined mappings and machine learning models
(e.g., sequence-to-sequence models or transformers).
o Optimization: Implement caching and indexing algorithms for large datasets to reduce
query execution time and improve efficiency.
2. Baseline Model Generation
o Algorithm: A selection algorithm is implemented to automatically choose a suitable
machine learning model based on dataset characteristics. This may involve algorithms like
decision trees (for classification), linear regression (for regression tasks), and k-nearest
neighbors (for quick baseline predictions).
o Model Selection: Use clustering techniques (such as K-means or hierarchical clustering) to
analyze the dataset structure and recommend suitable models based on the clustering results.
3. Hyperparameter Optimization
o Algorithm: Implement traditional hyperparameter optimization algorithms like Grid Search
and Random Search. Additionally, employ more advanced techniques such as Bayesian
Optimization using Gaussian Processes to iteratively explore hyperparameter space and
improve model performance.
o Metaheuristic Algorithms: Integrate Genetic Algorithms and Simulated Annealing for
hyperparameter optimization, allowing the model to escape local optima and find more
optimal solutions.
4. Deep Learning Architecture Optimization
o Algorithm: Use search-based optimization algorithms like Reinforcement Learning to
automatically adjust deep learning architecture parameters such as the number of layers,
neurons, activation functions, and learning rates.
o Transfer Learning: Incorporate transfer learning techniques where pre-trained models are
fine-tuned based on domain-specific data, reducing the need for extensive computational
resources.
5. Large Language Model (LLM) and RAG Integration
o Algorithm: Implement transformer-based architectures for LLMs (e.g., GPT, T5) to convert
SQL queries into Python code or Pandas DataFrame operations. The LLMs can be fine-tuned
on a dataset of typical user queries and their corresponding answers to improve performance.
o RAG Approach: Use a retrieval-based model where relevant past queries and code
examples are retrieved and combined with the generated output to produce more accurate
and context-aware responses.

17
GenAI-ModelHub: Generative AI powered Data Science Automation platform

4.4 Mathematical Model

18
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CHAPTER 5
CONCLUSION

19
GenAI-ModelHub: Generative AI powered Data Science Automation platform

5.1 Conclusion

In conclusion, the Gen-AI Model HUB project offers a powerful, integrated solution to the challenges faced
by data scientists and machine learning practitioners. By combining seamless data querying, automated
baseline model generation, and advanced hyperparameter optimization techniques, the platform simplifies
and accelerates the process of building and refining machine learning models. Leveraging cutting-edge
technologies such as Large Language Models and Retrieval-Augmented Generation, it enhances both the
efficiency and user experience of model development. With a user-friendly interface and robust backend
support, ML Model HUB streamlines workflows, making complex data science tasks more accessible and
less time-consuming, ultimately empowering users to build high-performance models more effectively.

20
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CHAPTER 6
PROJECT STAGE-2 PLAN

21
GenAI-ModelHub: Generative AI powered Data Science Automation platform

6.1 Project Stage II Plan

1. Building Deep Learning feature.


2. Creating our own data for the project description.
3. Building Documentation of hyperparameters on more than 10 ML algorithms.
4. Creating personalised chatbots for different modules of the project.

22
GenAI-ModelHub: Generative AI powered Data Science Automation platform

CHAPTER 7
REFERENCES

23
GenAI-ModelHub: Generative AI powered Data Science Automation platform

REFERENCES

7.1 Reference:
[1] Application of Noise Filter Mechanism for T5-Based Text-to-SQL Generation by M.R. Aadhil
Rushdy1, Uthayasanker Thayasivam2. 1Department of Computer Science and Engineering
University of Moratuwa Katubedda, Sri Lanka, 2Department of Computer Science and Engineering
University of Moratuwa Katubedda, Sri Lanka.
[2] A Survey on Hyperparameter Optimization of Machine Learning Models by Monica (Department
of CSE&IT Jaypee Institute of Information Technology Noida, India), Parul Agrawal (Department
of CSE&IT Jaypee Institute of Information Technology Noida, India).
[3] Textual Query Translation into Python Source Code using Transformers by Rutuja Nikum 1,
Vaishnavi Shinde2, Vijay Khadse3. 1,2,3Computer Engineering College of Engineering Pune, India.
[4] An Empirical Study of Code Smells in Transformer-based Code Generation Techniques by
Mohammed Latif Siddiq1, Shafayat H. Majumder2 , Maisha R. Mim2 , Sourov Jajodia2 , Joanna C. S.
Santos1, 1Department of Computer Science and Engineering, University of Notre Dame, USA,
2
Department of Computer Science, Bangladesh University of Engineering and Technology, Dhaka,
Bangladesh.
[5] A Combinatorial Approach to Hyperparameter Optimization by Krishna Khadka1, Jaganmohan
Chandrasekaran2, Yu Lei3, Raghu N. Kacker4, D. Richard Kuhn5. 1,3University of Texas at Arlington,
Arlington, TX, USA, 2National Security Institute, Virginia Tech Arlington, VA, USA, 4,5Information
Technology Laboratory, National Institute of Standards and Technology.
[6] Hyperparameter Optimization to Improve Bug Prediction Accuracy by Haidar Osman,
Mohammad Ghafari, and Oscar Nierstrasz Software Composition Group, University of Bern, Bern,
Switzerland.
[7] Conversion of Natural Language Query to SQL Query by Abhilasha Kate 1, Satish Kamble2,
Aishwarya Bodkhe3, Mrunal Joshi4. 1,2,3,4Dept. of IT Engineering PVG’s COET, Pune, India.
[8] On Optimization Methods for Deep Learning by Quoc V. Le1, Jiquan Ngiam, Adam Coates,
Abhik Lahiri, Bobby Prochnow, Andrew Y. Ng. Computer Science Department, Stanford University,
Stanford, CA 94305, USA

24
GenAI-ModelHub: Generative AI powered Data Science Automation platform

APPENDIX

25
GenAI-ModelHub: Generative AI powered Data Science Automation platform

1. Project Title Mapping with PO’s and PSO’s (With Justification)

Gr. Project Projec


No Title t PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
Guide
B9 GenAI- Mr. 3 2 1 3 3 2 1 1 1 2 2 2 3 2
ModelHub Satish
Kale

Map the project title with PO's and PSO's on the scale of 3. 3-
Substantial mapping
2-Moderate mapping1-
Low mapping

Detailed mapping:
POs:
1. Engineering Knowledge: Your project applies advanced AI techniques like LLMs and RAG to solve
complex data science problems, combining knowledge from computer science, data engineering, and AI to
develop robust solutions.
2. Problem Analysis: It addresses the need for a unified data science platform that simplifies the transition
between SQL and Pandas, providing substantiated solutions using data preprocessing and model generation.
3. Design/Development of Solutions: The platform is designed to meet user needs for efficient model
development and optimization, incorporating elements of usability, reliability, and adaptability.
4. Conduct Investigations of Complex Problems: By automating data processing and analysis, the project
uses a research-driven approach to streamline investigations and deliver actionable insights.
5. Modern Tool Usage: Leveraging modern AI tools and techniques, the platform facilitates complex tasks
such as data querying, model training, and tuning, making them accessible to users with diverse technical
skills.
7. Environment and Sustainability: The project promotes sustainable data science practices by reducing
redundant work and optimizing resource use, potentially lowering computational costs and environmental
impact.
8. Ethics: GenAI-ModelHub supports responsible data handling and transparency, ensuring AI-driven tasks
are ethically managed and users understand the impact of their models.
9. Individual and Team Work: The platform is designed to support both individual users and teams, enabling
collaborative and efficient workflows across different levels of expertise.
10. Communication: Through an intuitive chatbot interface, the project enhances communication with users
by explaining processes and providing real-time guidance, making data science more approachable.
11. Project Management and Finance: The phased development approach (research, design, development,
testing, deployment) reflects sound project management principles to deliver a high-quality platform
efficiently.
12. Life-long Learning: By providing an accessible and interactive data science platform, GenAI-ModelHub
fosters continued learning for users, especially those looking to improve their data science skills or explore
AI-driven tools.

26
GenAI-ModelHub: Generative AI powered Data Science Automation platform

PSOs:
1. Applying domain-specific knowledge:
o GenAI-ModelHub uses domain-specific AI technologies like Large Language Models (LLMs) and
Retrieval-Augmented Generation (RAG) to develop tools that enhance data science workflows.
o The platform is tailored to support data science applications, including model development,
optimization, and data querying, specific to electronics and telecommunication-related workflows in
data science.
2. Selecting and using software tools efficiently:
o The platform incorporates various software tools like SQL for data querying, Pandas for data
manipulation, and automated deep learning model generation tools, helping users efficiently manage
complex data science tasks.
o The intelligent chatbot and automation features simplify the use of multiple tools, making it easier
for users to navigate through data science workflows without needing in-depth expertise in each tool.

2. Critical Thinking Questionnaire:


1. Who benefits from the project?
=>Data scientists, Machine learning engineers, Analysts, Developers, Businesses.
2. Who is this project harmful to?
=>No direct harm, Potentially less demand for manual SQL/Pandas coding.
3. What are the strengths and weaknesses of the project?
->Strengths: Automation, Accuracy, Ease of use, GenAI integration
->Weaknesses: Dependence on AI accuracy, Complexity in implementation
4. Where would we see this project in real world?
=>Business analytics platforms, Educational tools, Data science projects, Research institutions.
5. When would this project benefit our society?
=>During data analysis tasks, Model development, Training machine learning models, educational
purposes.
6. Why is this project a problem/challenge?
=>Complex AI integration, Need for high-quality data, Model interpretability concerns.
7. How does this project benefit us/others?
=>Streamlines data processes, enhances model accuracy, saves time, Reduces manual coding.
8. Where are the areas for improvement?
=>User interface, AI model robustness, Multi-language support, Expanding dataset compatibility.

27
GenAI-ModelHub: Generative AI powered Data Science Automation platform

3. Poster Presentation Participation Certificate

28

You might also like