0% found this document useful (0 votes)
17 views4 pages

Basic Details

Ved Thorat is a 19-year-old Computer Science student from India, currently in his second year of a Bachelor's program, with a focus on machine learning and API development. He is motivated to participate in Google Summer of Code (GSoC) to gain experience in open source contributions, particularly through a project involving gprMax, where he aims to integrate an open source LLM and develop a chatbot. Ved has prior experience in Python and various machine learning projects, and he plans to dedicate 25-30 hours per week to the GSoC project during the summer.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Basic Details

Ved Thorat is a 19-year-old Computer Science student from India, currently in his second year of a Bachelor's program, with a focus on machine learning and API development. He is motivated to participate in Google Summer of Code (GSoC) to gain experience in open source contributions, particularly through a project involving gprMax, where he aims to integrate an open source LLM and develop a chatbot. Ved has prior experience in Python and various machine learning projects, and he plans to dedicate 25-30 hours per week to the GSoC project during the summer.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Basic Details

●​ Name - Ved Thorat


●​ Location and Timezone - India, GMT+5:30
●​ Education- Currently in my second year of a 4 year Bachelor’s Degree in Computer
Science
●​ Email - vedthorat1029@gmail
●​ Github : https://github.com/i3hz
●​ www/blog?
●​ Skype Username - #Will add
●​ Biography-
○​ Hello , I am Ved , a 19 year old college student from India pursuing Computer
Science.I’ve been studying machine learning for the past few months
including the underlying theory , math concepts and practical implementations
. I am proficient in Python and possess experience in API development ,
mainly for deploying machine learning models . I have also worked with RAG
models in the past. I've used both types API based (GPT- 4,Gemini) and
locally hosted(Deepseek, Llama).I have also experience fine-tuning models to
better fit specific use cases.I also enjoy contributing to open source projects
wherever I can . I am excited about GSoC as an opportunity to collaborate
with developers , improve my technical skills and make meaningful
contributions while solving real world problems.

Motivation and Experience


●​ Describe your motivation for participating in Google Summer of Code?
○​ I want to participate in GSoC so that I can learn more about open source and
programming in general . I have been using several open source applications
and I would love to contribute to one.Through this opportunity I would also like
to gain relevant experience in my domain by working on a large scale project.I
would also receive guidance from professional mentors which will help me
make valuable open source contributions.
●​ Have you participated in Google Summer of Code in the past?
○​ No I have not been able to participate in GSoC in the past.
●​ Why did you choose gprMax?
○​ I mainly chose gprMax because I found the idea of simulating electromagnetic
waves very interesting .In high school Electromagnetism was one of my
favorite topics (Relativity being another one) . Now that I am focusing on
machine learning , gprMax gives me a great opportunity to contribute while
also learning the underlying physics being used.
●​ Why did you select this project idea?
○​ I chose this idea because it matches my skills and experience. As I
mentioned before I have developed and deployed chatbots based on API
based LLM’s(gpt-4,gemini) and open source ones(Llama,Deepseek).I did look
into the other projects as well but I found this one to be the best fit given my
background.
●​ What are your expectations from us during and after successful completion of the
program?
○​ I expect to receive guidance and advice from the mentors in understanding
the project structure . I also hope to receive feedback on my code
implementations so that I can improve or modify my approach if needed.I
would also appreciate mentor support if I were to run into some issues and
can’t resolve them on my own. And I’m also eager to learn how to make long
term contributions after GSoC , so I would appreciate mentor guidance on
how to stay involved with the community.
●​ What are you hoping to learn?
○​ I am hoping to improve my technical skills , mainly in machine learning and
python. I’m also looking forward to learn more about the techniques used in
large scale open source development .I would also gain valuable experience
writing and documenting my code . I also hope to improve my teamwork skills
by having regular interactions with the mentors and my fellow contributors.
●​ How much time will you be able to devote to the project? Are you doing any other
internship this summer?
○​ I don't have any internships this summer so I will be able to focus on the
project completely . Since I am a college student I can work 2-3 hours on
weekdays after my classes and 6-7 hours on weekends . That adds up to
around 25-30 hours every week . In summer break I’ll be able to dedicate
significantly more time to the project each week as I won’t have any classes.
●​ What kind of projects have you worked in the past? What technologies did you use?
○​ I have worked on several projects in the past , most of them being in python .
■​ RAG based chatbot (Langchain ,FAISS, Flask,Open AI)-This was part
of a freelance project I did where my client wanted a chatbot that
could answer questions based on their curriculum. I used a RAG
pipeline ensuring context based answers with both voice and text
queries. The base LLM I used was gpt-4 . I then optimized the
chatbot’s performance and responsiveness using techniques like
FAISS, multithreading and dynamic chunking. During my previous
internship, I developed a similar RAG model using Llama to answer
queries about an e-commerce website.
■​ Fine-Tuning Llama – This was the same e-commerce project I
mentioned before, where I fine-tuned LLama to generate more
accurate responses aligned with the company's data and use case.
■​ Disease Detection using machine learning -It was a hackathon project
where we used Pytorch for the models and Flask for deployment . We
used transfer learning to get high accuracy on the dataset . We
deployed the models via flask so that users could access them easily

●​ What is your experience with Python, C, CUDA?


○​ Python is my primary language and I’ve been using it for over a year and I
have worked with various popular libraries . I'm confident that I can quickly
pick up new frameworks if needed.
○​ I started my programming journey with C so I do have a good idea about the
core concepts but I switched to Python soon after.
○​ My experience with CUDA is very limited . I find it quite fascinating and I have
some experience in writing cuda kernels but I’m not experienced at it.

Project Information
●​ This is your chance to provide a description of the project idea. What are the major
parts of the project? Use flowcharts, diagrams and mockups as much as possible. If
your project involves writing APIs, what might those APIs look like?
○​ As mentioned in the ideas list , this project builds upon the work done last
year where an AI chatbot was developed capable of answering questions and
assisting users in building models with gprMax. Currently, the project utilizes
the ChatGPT API.However it incurs a significant cost and is closed source .
This project aims to integrate an open source LLM and remove the need of
buying tokens . The LLM should also be computationally cheap so that the
majority of people can use it. We can also use a service like Open Router
which has a generous free tier and would provide us with access to various
LLMs which would remove the need for the model to be lightweight.
○​ Also the previous chatbot was deployed using streamlit , I was thinking about
changing it to Flask/Fast-API as they provide better performance and
scalability compared to streamlit. The new deployment will feature a simple
user interface with just a few API endpoints for core functionality. We can
even add it to a website as a widget if needed.
○​ This project also includes fine tuning the LLM on the dataset of gprMax
commands so that it is capable of building models automatically based on
simple instructions.
○​ The Major Components include integrating an open source LLM (llama
,mistral or deepseek) , creating a backend api for deployment and fine
tuning. There will also be several optimization steps but these are the major
components for this project.
●​ Discuss your assumptions
○​ I am assuming that I’ll have access to computational resources (GPU’s)
required for fine tuning the model . Google Colab’s free tier does have a GPU
option but we might run into some limitations like GPU availability and vram
bottlenecks . There are some techniques we can use to reduce the amount of
vram used such as QLoRa and memory management(low batch sizes).
○​ I am also assuming that a basic UI with just a few api endpoints will be
sufficient for users . Additional features can be added if we find them
necessary .
●​ Mention your deliverables - break down the bigger picture into smaller tasks and
explain what these might be.
1.​ LLM Integration
a.​ Evaluate and choose an open source model that we can use for the
chat bot
b.​ Modify the previous implementation so that it uses the open source
model instead of using Chat GPT
c.​ Do some testing to ensure that everything works smoothly and resolve
bugs if needed.
2.​ Fine Tuning
a.​ Preprocess the prepared dataset of gprMax commands .
b.​ Fine Tune the selected model
c.​ Testing and validating the fine tuned model’s performance .
3.​ API Development
a.​ Create and implement a simple API structure with Flask or Fast API .
b.​ Work on a basic user interface for testing purposes.
c.​ Test everything so that it works as intended.
4.​ Deployment
a.​ Deploy the application in a testing environment and conduct tests
b.​ Fix bugs or make any changes if needed.
5.​ Optimization
a.​ Implement optimization techniques to improve performance and
reduce response time for the chatbot.
6.​ Documentation
a.​ Write proper documentation and integration processes for future
contributors/users
b.​ Include examples and a getting started guide in the documentation for
new users.
●​ If you have found existing work to build on, mention it, e.g. if you plan to use a
specific algorithm/layer/model/library/framework.
○​ Well I’ll be building the chatbot from the previous year’s project. I plan to use
several tools and frameworks for this project. For the model I plan to go with
Llama,mistral or deepseek. We can either provide the option to host the
models locally (which would force us to go with something lightweight) or go
with some service like Open Router

You might also like