0% found this document useful (0 votes)

43 views29 pages

Hugging Face Repo Project Report

Uploaded by

Chinmayi HS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views29 pages

Hugging Face Repo Project Report

Uploaded by

Chinmayi HS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

VIDYAVARDHAKA COLLEGE OF ENGINEERING

GOKULAM III STAGE, MYSURU-570 002

Accredited by NAAC with A ‘Grade’, Autonomous institution affiliated to

Visvesvaraya Technological University, Belagavi

Exploring the Depths: Diving into GPTQ Llama's

capabilities in Artificial intelligence

An Internship Report submitted in partial fulfillment for the award of degree

BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE & ENGINEERING
by
CHINMAYI H [4VV20CS024]

Internship Carried Out

at
NeuroFlares, Mysore
Under the Guidance of

Internal Guide External Guide

Dr.K.PARAMESHA Mr.PRAKASH M
Professor CEO
Dept. of CS & E NeuroFlares
VVCE, Mysuru Mysuru

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Accredited by NBA, New Delhi

2023-24
Vidyavardhaka College of Engineering,
Gokulam 3rd Stage, Mysuru – 570002,
Department of Computer Science and Engineering,

CERTIFICATE
This is to certify that the internship report entitled “Exploring the Depths: Diving into GPTQ
Llama’s capabilities in Artificial Intellignece” has been successfully completed by Chinmayi

H student of 8th semester, Computer Science and Engineering, Vidyavardhaka College of

Engineering, Mysuru in partial fulfilment for the award of the degree of Bachelor of
Engineering in Computer Science and Engineering of the Visvesvaraya Technological
University, Belagavi, during the academic year 2023 – 2024. The report has been approved as
it satisfies the requirement in respect of internship prescribed for the said degree.

Signature of the internship guide Signature of the external guide Signature of the HoD

Dr.K.Paramesha Mr.Prakash M Dr.Pooja.M.R

Name of the Examiners Signature with Date

2)
ACKNOWLEDGEMENT

The Internship would not have been possible without the guidance, assistance, and

suggestions of many individuals. I would like to express my deep sense of gratitude and

indebtedness to each one who has helped me to make this Internship a success.

I heartily thank my beloved Principal, Dr. B Sadashive Gowda for his wholehearted

support and for his kind permission to undergo the Internship.

I wish to express my deepest gratitude to Dr. Pooja M R, Head of Department,

Computer Science and Engineering, VVCE for their constant encouragement and

inspiration in taking up this Internship.

I gracefully thank my internship guide, Dr. K Paramesha, Professor, Department of

Computer Science and Engineering, VVCE for their encouragement and advice

throughout the course of the internship.

I gracefully thank my external guide, Mr. Prakash M, CEO, NeuroFlares Pvt Ltd for their

encouragement and advice throughout the course of the internship.

In the end, I extend my gratitude towards my family members and friends for their

valuable suggestions and encouragement.

CHINMAYI H (4VV20CS024)
ABSTRACT

In this internship we worked on a hugging Face repository named The Bloke and the
project we explored is Vicuna 7B quantized using GPTQ 4-bit 128g. This project aims to
run these GPTQ models in text-generation-webui. The quantization process reduces the
precision of the model's parameters while minimizing the loss of performance. The
quantized model is provided in two files, with “vicuna-7B-GPTQ-4bit-128g.safetensors”
being the recommended choice. These files were created using the latest GPTQ code and
require the latest GPTQ-for-LLama to be integrated into the text-generation-webui for
usage. To utilize the quantized model for text generation tasks we have cloned the
GPTQ-for-LLaMa and text-generation-webui repositories, creating symbolic links
between them, and installing the quantized model into the web UI's models directory.
The next project was to learn how to train a new language model (Esperanto) from
scratch using Transformers and Tokenizers. This project entails training a "small"
language model, comprising 84 million parameters, on the constructed language
Esperanto, with a focus on fine-tuning it for part-of-speech tagging.

Keywords: Vicuna 7B model, Quantization, GPTQ-for-LLaMa, Model files, Text-

generation-webui, Esperanto, Part-of-speech tagging, Fine Tuning.
TABLE OF CONTENTS

TITLE PAGE NO.

1. OVERVIEW OF THE ORGANIZATION 2

1.1 About Company 2

1.2 Company History 3

1.3 Software Solution 3

1.4 Services Provided 4-5

2. TRAINING PROGRAM 6-8

3. LEARNING EXPERIENCES 9-22

3.1 Knowledge Acquired 9-11

3.2 Project Execution 12-19

3.3 Skills Learned 20

3.4 Observed Attitude and Gained Values 21

3.5 Most Challenging Task Performed 22

4. CONCLUSION 23

5. REFERENCES 24
LIST OF FIGURES

NAME PAGE NO.

Figure 1.1 List of services provided by NeuroFlares 4

Figure 3.1 The chats with the Vicuna model’s quantized GPTQ using 14
LLaMa

Figure 3.2 The GPTQ’s capacity to provide long answers 14

Figure 3.3 The codes created at interface mode 15

Figure 3.4 The model could also analyse conversations between two or 15
more people

Figure 3.5 Capable to give complex codes 16

Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 1

OVERVIEW OF THE ORGANIZATION

1.1 About Company

NeuroFlares is a service-based Information Technology start-up company which provides
software services, unique business solutions, across the world. NeuroFlares assists
companies around the world to design, develop, localize and publish their applications
across various platforms. It has a dedicated team to bear on every project they take. They
have provided services in regions such as Israel, China, US and India. The quality of service
they provide, unique solution with strategic approach for every problem and optimization
of application, gives the ability to strengthen existing relationships with clients, build new
ones & enter new fields.

Mission: Our Mission is to provide quality assurance to clients with maximum efforts
driven towards customer satisfaction and create more employment opportunities.

Vision: Our vision is to Innovate and Automate industrial system for a better quality
product, increased productivity, efficient use of materials. We believe in providing unique
solutions in the most efficient way with robust and structured methodology, with gradual
evolution from hard-work to smart- work culture.

Principles: Automation |Quality | Innovation | Technology | Customer Satisfaction

NeuroFlares has a dream of evolving into a Global IT Company, ensuring that the solutions
being delivered include best practice in I. T. with the chosen area of technology.

• They operate with complete focus to Maximize customer satisfaction.

• Develop and encourage an environment of mutual respect within company and
extending it beyond to clients.
• Encourage commitment and personal learning of workforce.
• The organization is built on the strong pillars of integrity, honesty, and self-respect.

Dept of. CSE VVCE, Mysore 2

Diving into GPTQ Llama's capabilities in Artificial intelligence

1.2 Company History

Established in 2015 in Mysore, India which started as small as team of two. We started off
as an AI based startup providing solution for Intrusion Detection. The state-of-the art
development centre is now equipped with latest technology like HoloLens, Oculus, High-
end Graphic cards and networked environment with multi-layered controls for enhanced
data security. We deliver end-to-end solutions through valuable strategy development with
our highly skilled engineering execution.

NeuroFlares India has utilized its expertise and skills in order to keep pace with the surging
need for technological breakthroughs in the society, and has accomplished the same with
absolute dedication and perseverance.

It has provided solutions in private and public sector which ranges from small scale
industries to huge business. Also provided solution to Banks, Manufacturing Companies,
Entertainment industries etc. The company is known for Automation of applications,
Quality of the software and Delivery in Time.

1.3 Software Solutions

The company has developed a number of products for its clients in this service. Their
software development processes with unique and expertise solution, combined with
excellent infrastructure which has significantly increased the “on-time and on-budget”
delivery of software. Their services begin from analysis, moving through designing,
development, testing and implementation to maintenance. Their applications come in all
sizes, be it a one-table database, or a massive client-server application. The complete
automation of applications is a major field that we specialize in. Currently, they have an
automated software solution for manufacturing sector.

NeuroFlares was started with the aim of helping customers and business to provide unique
and improved services without impacting the quality at cost-effective price. They are a one-
point Engineering consulting company who can work as a guide in any of our project with
focus and aim of cost saving without compromising the quality.

Dept of. CSE VVCE, Mysore 3

Diving into GPTQ Llama's capabilities in Artificial intelligence

1.4 Services Provided

Figure 1.1 List of services provided by NeuroFlares

• Web services
• Gaming applications
• Native Android application.
• Native Desktop Application.
• VR/MR application.
• 3D modelling and FEM automation.
• UI/UX design.
• Artificial Intelligence with Image Processing

Web Services

NeuroFlares web solutions & services to help customer reach to a wider customer base.
The web is a new and different medium for communication and requires a different
viewpoint and skill set to use it in the most effective way. We need web consulting to get
more return on our investment in our web site. The company helps us to get the most
effective solution through:

• Website Development

Dept of. CSE VVCE, Mysore 4

Diving into GPTQ Llama's capabilities in Artificial intelligence

• Web Multimedia
• Web Promotion
• Web hosting
• E-commerce

Gaming Applications

NeuroFlares will offer game development on PC, android and web games including
Background Music composition, 2D, 3D Asset Creation. Game Corner will entertain users
with the best gaming experience possible. They provide games such as arcade games,
shooting games, strategy games, sport games, adventurous games, etc., They build simple
2D games to complex multiplayer applications which targets not only Children but also
Youths, Adults and anyone who is interested in leisure and games.

Android and Desktop Application with Machine Learning

They develop android apps(native) and desktop apps(native) based on your needs for PC,
mobiles and tablets with Artificial Intelligence support. Their prior work includes, but not
limited to auto app updating and download on the server to maintain app privacy.

They use the following to achieve this:

• Restful API to communicate with endpoints in web

• Usage of Git, Bitbucket for code management
• Image, video and files upload/download
• Handling database (sqlite database)
• Firebase
• Google Map integration
• Neuromorphic design
• SDK integration

The company has offered many apps like: Map related app development, Finance apps
development, E-commerce and Shopping Cart Apps, Retail and Fashion Apps, Education
Apps, Travel Apps, Food and Restaurant Apps, Real Estate & Home Automation Apps, and
many more. And has done over 30+ projects.

Dept of. CSE VVCE, Mysore 5

Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 2
TRAINING PROGRAM
The internship duration was for 6 months starting from 16th August 2023 to 16 February
2024. In these six months we explored three major project all from Hugging Face
repositories.

The first and the major one is Quantization of Vicuna 7B model using GPTQ for LLaMa.
The second one was about exploring and learning on how to train the model from the
scratch, the model was an Esperanto model using tokenizers and transformers. Drawing
similarities from that we tried to even fine tune the model. That is the third project where
we tried to understand how to fine tune an already pretrained model.

Vicuna 7B

The Vicuna 7B model is a state-of-the-art language model with 7 billion parameters,

capable of understanding and generating human-like text. However, its large size and
computational requirements limit its applicability in certain contexts, such as edge devices
or low-power systems. To address this limitation, quantization techniques are employed to
reduce the model's size and computational overhead while preserving its functionality.

To utilize the quantized model for text generation tasks, integration with the text-
generation-webui is necessary. The process involves cloning the GPTQ-for-LLaMa and
text-generation-webui repositories, creating symbolic links between them, and installing
the quantized model into the web UI's models directory. Additionally, the dependencies for
both repositories must be installed to ensure seamless operation.

The quantized Vicuna 7B model offers a more resource-efficient alternative to the original
model, suitable for deployment in environments with limited computational resources. By
following the provided instructions and integrating the model into the text-generation-
webui, users can leverage advanced AI capabilities for text generation tasks while
optimizing resource utilization. This project contributes to making sophisticated AI
technology more accessible and applicable in real-world scenarios.

Dept of. CSE VVCE, Mysore 6

Diving into GPTQ Llama's capabilities in Artificial intelligence

Training a language model

In this second project we are training a "small" language model, comprising 84 million
parameters, on the constructed language Esperanto, with a focus on fine-tuning it for part-
of-speech tagging. The process begins with acquiring a corpus of Esperanto text, combining
portions of the OSCAR corpus from INRIA with the Leipzig Corpora Collection to create
a training dataset of 3 GB. Subsequently, a byte-level Byte-pair encoding (BPE) tokenizer,
akin to GPT-2, is trained with a vocabulary size of 52,000, featuring special tokens similar
to RoBERTa to facilitate effective language modeling tasks. The language model is then
trained from scratch on a masked language modeling (MLM) task using the transformers
library, with custom hyperparameters optimized for training efficiency. Evaluation of the
model's performance involves utilizing the FillMaskPipeline to assess its ability to predict
masked tokens, including more complex prompts to gauge its semantic understanding.
Following successful training, the model undergoes fine-tuning for part-of-speech tagging
using annotated Esperanto POS tags in the CoNLL-2003 format. Finally, the trained model
is shared with the community, accompanied by a comprehensive README.md model card
detailing its description, training parameters, evaluation results, intended uses, and
limitations, thereby contributing to the broader NLP community and showcasing the
versatility of advanced language modeling techniques.

Fine Tuning Pretrained model

The project aims to demonstrate the process of fine-tuning pretrained language models
using the Transformers library, focusing on three major deep learning frameworks:
PyTorch, TensorFlow with Keras, and native PyTorch. It begins by emphasizing the
advantages of using pretrained models, such as reducing computation costs and carbon
footprint, before delving into the fine-tuning process. The tutorial walks through the steps
of preparing a dataset, specifically the Yelp Reviews dataset, for training. It then proceeds
to explain how to fine-tune a pretrained model using each of the mentioned frameworks.
For PyTorch, it showcases the use of the Trainer class provided by Transformers, which
streamlines the training process with various options for hyperparameters and training
features.

Dept of. CSE VVCE, Mysore 7

Diving into GPTQ Llama's capabilities in Artificial intelligence

In TensorFlow with Keras, it demonstrates how to load, compile, and fit a model using the
Keras API, as well as how to use the prepare_tf_dataset method to convert datasets into a
format compatible with Keras. Lastly, for native PyTorch, it outlines how to manually post-
process tokenized datasets, create DataLoaders, set up optimizer and learning rate
scheduler, and implement the training loop. Throughout the tutorial, the focus remains on
fine-tuning pretrained models for sequence classification tasks, offering insights into best
practices and optimizations for each framework.

Dept of. CSE VVCE, Mysore 8

Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 3

LEARNING EXPERIENCES

3.1 Knowledge Acquired

Our internship sessions started with a detailed introduction to Hugging Face which is a
company and open-source community that focuses on natural language processing (NLP)
technologies, particularly in the domain of deep learning and transformers. The company
provides a variety of tools, libraries, and pretrained models for tasks such as text
classification, language translation, text generation, and more. Their most notable offering
is the Transformers library, which is an open-source library built on top of PyTorch and
TensorFlow for working with transformer-based models, such as BERT, GPT, and T5. In
addition to the Transformers library, Hugging Face also maintains a model hub where users
can access pretrained models and fine-tune them for their specific tasks, as well as a
community platform for sharing models, datasets, and research in the field of NLP. Hugging
Face is similar to GitHub, especially in terms of its role as a platform for collaboration,
sharing, and version control, but with a specific focus on natural language processing (NLP)
models and tools.

Vicuna 7B

Later on we were introduced about the models in Generative AI one of which is Vicuna 7B
model, it’s a chat assistant trained by fine-tuning LLaMA on user-shared conversations
collected from ShareGPT.

• Developed by: LMSYS

• Model type: An auto-regressive language model based on the transformer
architecture.
• License: Non-commercial license
• Finetuned from model: LLaM

Dept of. CSE VVCE, Mysore 9

Diving into GPTQ Llama's capabilities in Artificial intelligence

The primary use of Vicuna is research on large language models and chatbots. The primary
intended users of the model are researchers and hobbyists in natural language processing,
machine learning, and artificial intelligence. Vicuna v0 is fine-tuned from LLaMA with
supervised instruction fine-tuning. The training data is around 70K conversations collected
from ShareGPT. Vicuna is evaluated with standard benchmarks, human preference, and
LLM-as-a-judge.

GPTQ and LLaMa

Llama (Large Language Model Meta AI) is a family of autoregressive large language
models (LLMs), released by Meta AI starting in February 2023.

Four model sizes were trained for the first version of LLaMA: 7, 13, 33, and 65 billion
parameters. LLaMA's developers reported that the 13B parameter model's performance on
most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters)
and that the largest model was competitive with state of the art models such
as PaLM and Chinchilla. In contrast, the most powerful LLMs have generally been
accessible only through limited APIs (if at all), Meta released LLaMA's model weights to
the research community under a non-commercial license. Within a week of LLaMA's
release, its weights were leaked to the public on 4chan via BitTorrent.

In July 2023, Meta released several models such as Llama 2, using 7, 13, and 70 billion
parameters. LLaMa2 is a suite of pretrained language models while LLaMa2- chatbot is a
fine tuned chatbot that uses reinforcement learning through human feedback.

GPT: GPT (Generative Pre-trained Transformer) is a type of artificial intelligence model

that's really good at understanding and generating human-like text. It's been trained on
massive amounts of text data to learn patterns in language.
Quantization: Quantization is a process of making things simpler. When we apply
quantization to a model like GPT, we're essentially making it smaller and more efficient
while trying to keep its performance as close to the original as possible. This makes it easier
to run the model on devices with limited resources, like smartphones or tablets.

Dept of. CSE VVCE, Mysore 10

Diving into GPTQ Llama's capabilities in Artificial intelligence

GPTQ for LLaMa: So, "GPTQ for LLaMa" is about applying this quantization process
specifically to GPT models within the LLaMa community. It's a way of making these large
language models more lightweight and easier to work with for researchers and developers
within the LLaMa community, while still maintaining their ability to understand and
generate human-like text. Through these six months we were introduced with many models
and we were made to quantize the Vicuna model using GPTQ for LLaMa.

Training a language model using RoBERTa for Tokenizer

The Esperanto model is a language model trained specifically on the constructed language
Esperanto. It is designed to understand and generate text in Esperanto language, utilizing
advanced natural language processing techniques. The model is typically fine-tuned for
specific tasks such as text generation, classification, or translation within the context of
Esperanto language data. By training on Esperanto text corpora and incorporating linguistic
features specific to Esperanto, such as its regular grammar and vocabulary, the model
becomes proficient in processing and generating Esperanto text, contributing to various
NLP applications within the Esperanto-speaking community.

Fine Tuning

The fine-tuning demonstrates the process of the parameters of the pretrained model being
adjusted for specific tasks while still retaining the knowledge and representations learned
during the initial pretraining phase. This process allows the model to learn task-specific
patterns and features from the new dataset, improving its performance on the target task
using the Transformers library. By leveraging pretrained models trained on vast amounts
of text data, researchers can significantly reduce the computational resources required to
train models from scratch while achieving state-of-the-art performance. The project
showcases three different frameworks—PyTorch, TensorFlow with Keras, and native
PyTorch—and provides step-by-step guidance on preparing datasets, fine-tuning models,
and evaluating their performance. Fine-tuning pretrained models enables researchers to
adapt them to specific tasks or domains, making them more efficient and effective for real-
world applications.

Dept of. CSE VVCE, Mysore 11

Diving into GPTQ Llama's capabilities in Artificial intelligence

3.2 The Project Executions

Vicuna7B for text generation webui

The repository we work on contains a version of the Vicuna 7B model that has been
quantized (simplified) using a technique called GPTQ-for-LLaMa. This specific version of
the model has been quantized to 4-bit precision and grouped in a way that makes it suitable
for deployment on devices with limited computational resources.

1. Clone the Repositories:

1. Open your command line interface (e.g., Terminal on macOS/Linux, Command

Prompt on Windows).
2. Run the following commands to clone the required repositories:
➢ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
➢ git clone https://github.com/oobabooga/text-generation-webui

2. Create a Symbolic Link:

1. Navigate to the text-generation-webui directory:

➢ cd text-generation-webui
2. Create a symbolic link to the GPTQ-for-LLaMa repository inside the repositories
directory:
➢ mkdir -p repositories
➢ ln -s ../GPTQ-for-LLaMa repositories/GPTQ-for-LLaMa

3. Install Dependencies:

Ensure that you have all the necessary dependencies installed for both GPTQ-for-LLaMa
and text-generation-webui. The instructions to install dependencies are:

1. Navigate to the GPTQ-for-LLaMa Repository: Open a command line interface

(e.g., Terminal on macOS/Linux, Command Prompt on Windows) and navigate to
the directory where you cloned the GPTQ-for-LLaMa repository. Ensure that
Python is installed on your system.

Dept of. CSE VVCE, Mysore 12

Diving into GPTQ Llama's capabilities in Artificial intelligence

2. Install Required Python Packages: Run the following command to install the
required Python packages specified in the requirements.txt file:
➢ pip install -r requirements.txt

Installing Dependencies for text-generation-webui:

1. Navigate to the text-generation-webui Repository: Open a command line

interface and navigate to the directory where you cloned the text-generation-webui
repository.
2. Install Node.js and npm: Ensure that Node.js and npm (Node Package Manager)
are installed on your system. You can download and install Node.js from the official
website: Node.js Downloads.
3. Install Required Node.js Packages: Run the following command to install the
required Node.js packages specified in the package.json file:
➢ npm install

4. Install the Model: Place the model file vicuna-7B-GPTQ-4bit-128g.safetensors inside

the models directory within the text-generation-webui directory.

5. Launch the User Interface (UI): Run the following command to start the UI:

➢ cd text-generation-webui
➢ python server.py --model vicuna-7B-GPTQ-4bit-128g --wbits 4 --groupsize 128

6. Interact with the UI: Once the server is running, you can access the UI by opening a
web browser and navigating to the specified address (usually http://localhost:8000 by
default).Use the UI to input text prompts and generate responses using the quantized
Vicuna 7B model.

Model Files: Two model files are provided, one of which is vicuna-7B-GPTQ-4bit-
128g.safetensors, representing the quantized Vicuna 7B model in a newer safetensors
format with improved file security.

Triton and CUDA Branches: Depending on the operating system and requirements, users
can choose to use either the Triton or CUDA branch of GPTQ-for-LLaMa.

Dept of. CSE VVCE, Mysore 13

Diving into GPTQ Llama's capabilities in Artificial intelligence

RESULTS

Figure 3.1 The chats with the Vicuna model’s Figure 3.2 The GPTQ’s capacity to provide long answers
quantized GPTQ using LLaMa

Dept of. CSE VVCE, Mysore 14

Diving into GPTQ Llama's capabilities in Artificial intelligence

Figure 3.3 The codes created at interface mode

Figure 3.4 The model could also analyse conversations between two or more people

Dept of. CSE VVCE, Mysore 15

Diving into GPTQ Llama's capabilities in Artificial intelligence

Figure 3.5 Capable to give complex codes

Dept of. CSE VVCE, Mysore 16

Diving into GPTQ Llama's capabilities in Artificial intelligence

Train a new language model from scratch using Transformers and

Tokenizers

The aim of this project is to train a language model specifically for Esperanto, a constructed
language designed to be easy to learn. This model, named EsperBERTo, will be trained
from scratch using a dataset of Esperanto text.

Step 1: Dataset Collection: Gather a large corpus of text written in Esperanto from various
sources, including news articles, literature, and Wikipedia.Concatenate multiple datasets to
create a comprehensive training corpus.

Step 2: Tokenization: Before training the model, they need to convert the text into a format
that the model can understand. They use a technique called byte-level Byte-pair encoding
(BPE) to tokenize the text into smaller units called tokens.Use the tokenizers library to train
the tokenizer with a vocabulary size of 52,000 and special tokens similar to RoBERTa.

➢ from tokenizers import ByteLevelBPETokenizer

# Initialize a tokenizer

➢ tokenizer = ByteLevelBPETokenizer()

# Customize training

➢ tokenizer.train(files=paths, vocab_size=52_000, min_frequency=2,

special_tokens=[

"<s>",

"<pad>",

"</s>",

"<unk>",

"<mask>",

])

Dept of. CSE VVCE, Mysore 17

Diving into GPTQ Llama's capabilities in Artificial intelligence

# Save files to disk

➢ tokenizer.save_model(".", "esperberto")

Step 3: Language Model Training:Implement a subclass of Dataset to load data from the
tokenized text files.Train the language model using the run_language_modeling.py script
from the transformers library.Use a RoBERTa-like model architecture and train on a task
of Masked Language Modeling (MLM).

➢ from transformers import RobertaForMaskedLM, RobertaTokenizer, Trainer,

TrainingArguments
# Define model and tokenizer
➢ model = RobertaForMaskedLM.from_pretrained("roberta-base")
➢ tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
# Fine-tune model
➢ training_args = TrainingArguments(
per_device_train_batch_size=4,
num_train_epochs=3,
logging_dir='./logs',
)
➢ trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
➢ trainer.train()

Dept of. CSE VVCE, Mysore 18

Diving into GPTQ Llama's capabilities in Artificial intelligence

Step 4: Model Evaluation:Use the trained model to fill in masked words in sentences and
check the quality of predictions.

➢ from transformers import pipeline

➢ fill_mask = pipeline(
"fill-mask",
model="path_to_trained_model",
tokenizer="path_to_trained_tokenizer",
)
➢ result = fill_mask("La suno <mask>.")
➢ print(result)

Step 5: Fine-tuning for Downstream Task:Fine-tune the trained language model on a

downstream task of Part-of-Speech (POS) tagging using the run_ner.py script from
transformers.Use a dataset of annotated Esperanto POS tags formatted in the CoNLL-2003
format.

➢ from transformers import Trainer, TrainingArguments

➢ training_args = TrainingArguments(
per_device_train_batch_size=4,
num_train_epochs=3,
logging_dir='./logs',
)
➢ trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
➢ trainer.train()

Step 6: Sharing the Model:Upload the trained model to the Hugging Face model hub for
sharing with the community.

Dept of. CSE VVCE, Mysore 19

Diving into GPTQ Llama's capabilities in Artificial intelligence

3.3 Skills Learned

• Python Programming: You'll use Python to execute the code snippets provided
in the repository and interact with various libraries and frameworks.
• Training Language Models: You'll learn how to train a language model from
scratch using frameworks like Transformers and tokenizers. This includes
understanding the concepts of tokenization, model architecture, hyperparameter
tuning, and training pipelines.
• Data Preprocessing: Preprocessing text data involves tasks like cleaning,
tokenization, and formatting. You'll gain experience in preparing datasets for
training language models.
• Model Evaluation: You'll evaluate the trained models using metrics like loss
values, performance on masked token prediction tasks, and downstream task
performance (e.g., part-of-speech tagging).
• Hyperparameter Tuning: Experimenting with different sets of hyperparameters
allows you to understand their impact on model performance and training dynamics.
• Tensorboard Usage: Monitoring training progress and visualizing model
performance using Tensorboard helps in gaining insights into the training process.
• Version Control: You'll learn how to use Git for version control, including
cloning repositories and managing branches.
• Installation and Dependency Management: You'll gain experience in installing
and managing dependencies for Python packages and libraries required for running
GPTQ-for-LLaMa and text-generation-webui.
• Model Management: You'll learn how to manage model files and directories,
including linking models to the text-generation-webui repository.
• Command-Line Interface: You'll use the command line to execute commands
for launching the text-generation web UI and specifying model configurations.
• Problem-Solving: You may encounter challenges during the installation or setup
process, requiring problem-solving skills to troubleshoot and resolve issues.
• Understanding Model Formats: You'll gain an understanding of model file
formats like safetensors and how they are used in GPTQ-for-LLaMa.

Dept of. CSE VVCE, Mysore 20

Diving into GPTQ Llama's capabilities in Artificial intelligence

3.4 Observed attitudes and Gained Values

• Curiosity: Exploring new methodologies and techniques for training language

models demonstrates a curiosity for learning and experimentation.
• Persistence: Training language models from scratch can be time-consuming and
requires patience and persistence to overcome challenges encountered during the
process.
• Openness to Collaboration: Sharing models and contributing to the community
by uploading models to platforms like Hugging Face demonstrates a willingness to
collaborate and contribute to the collective learning of the community.
• Attention to Detail: Following the provided instructions carefully and accurately
demonstrates attention to detail, which is important for successful installation and
setup.
• Resourcefulness: Finding solutions to problems encountered during the setup
process fosters a sense of resourcefulness and adaptability.
• Collaboration: Leveraging community resources, such as documentation and
online forums, for troubleshooting demonstrates the value of collaboration and
knowledge sharing within the developer community.
• Continuous Improvement: Iteratively improving the setup process based on
feedback and experience reflects a commitment to continuous improvement and
optimization.

Dept of. CSE VVCE, Mysore 21

Diving into GPTQ Llama's capabilities in Artificial intelligence

3.5 Most challenging task performed

CUDA Installation (for CUDA branch): Using the CUDA branch of GPTQ-for-LLaMa,
setting up CUDA and ensuring compatibility with GPU was complex.

Issues: Compatibility issues between CUDA versions and GPU drivers, as well as
dependencies on specific CUDA versions, had raised.

Solution: The installation instructions provided in the repository. Ensured that I had the
correct CUDA version installed and compatible GPU drivers.

The commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-
generation-webui, and install GPTQ into the UI:

➢ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa

➢ git clone https://github.com/oobabooga/text-generation-webui
➢ mkdir -p text-generation-webui/repositories
➢ ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa

On Windows we cannot use the Triton branch of GPTQ so we used CUDA branch:

➢ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda

➢ cd GPTQ-for-LLaMa
➢ python setup_cuda.py install

Dept of. CSE VVCE, Mysore 22

Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 4
CONCLUSION
GPTQ-for-LLaMa: This repository focuses on quantizing language models using the
GPTQ framework. It provides tools and utilities for quantization, enabling users to
convert large language models into more efficient versions suitable for deployment on
resource-constrained devices. The repository includes detailed documentation and
examples for quantizing models, along with instructions for integration into text-
generation-webui.

text-generation-webui: This repository hosts a user interface for text generation, allowing
users to interact with language models in a web-based environment. It provides a platform
for deploying and utilizing quantized language models generated using the GPTQ-for-
LLaMa framework. The repository includes features for model management, input/output
customization, and real-time text generation.

EperBERTo: This repository demonstrates the process of training a language model from
scratch for the Esperanto language. It outlines the steps involved in dataset selection,
tokenizer training, language model training, and fine-tuning for downstream tasks such as
Part-of-Speech tagging. The repository includes code snippets, configuration files, and
instructions for training an Esperanto-specific language model using the Hugging Face
transformers library.

OSCAR Corpus: This repository contains the Esperanto portion of the OSCAR corpus
from INRIA. The OSCAR corpus is a large multilingual dataset obtained from Common
Crawl dumps of the web. The Esperanto subset of this corpus serves as a valuable
resource for training language models and conducting NLP research in the Esperanto
language.

Each of these repositories plays a crucial role in the process of language model
development, training, and deployment, contributing to advancements in natural language
processing and facilitating research in linguistic diversity and accessibility.

Dept of. CSE VVCE, Mysore 23

Diving into GPTQ Llama's capabilities in Artificial intelligence

REFERENCES

1. NeuroFlares www.neuroflares.com
2. LMSYS ORG “Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%*
ChatGPT Quality”by: The Vicuna Team, Mar 30, 2023
https://lmsys.org/blog/2023-03-30-vicuna/
3. "Introducing LLaMA: A foundational, 65-billion-parameter large language
model". Meta AI. 24 February 2023.
4. Vincent, James (7 November 2019). "OpenAI has published the text-generating AI
it said was too dangerous to share". The Verge. Archived from the original on 11
June 2020. Retrieved 19 December 2020.
5. Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (1 September 2014).
"Neural Machine Translation by Jointly Learning to Align and Translate".
6. Vincent, James (14 February 2019). "OpenAI's new multitalented AI writes,
translates, and slanders". The Verge. Archived from the original on 18 December
2020. Retrieved 19 December 2020.
7. Hugging Face article “How to train a new language model from scratch using
Transformers and Tokenizers” by Julian Chaumond February 14, 2020
https://huggingface.co/blog/how-to-train
8. The Bloke / vicuna-7B-v0-GPTQ https://huggingface.co/TheBloke/vicuna-7B-v0-
GPTQ
9. “Fine-tune a pretrained model” https://huggingface.co/docs/transformers/training

Dept of. CSE VVCE, Mysore 24

SE - Lab Manual
No ratings yet
SE - Lab Manual
89 pages
Agentforce Specialist
100% (1)
Agentforce Specialist
30 pages
Week-1 - Lecture Notes of NPTEL
No ratings yet
Week-1 - Lecture Notes of NPTEL
126 pages
Subramanian Venkataraman - Crafting Effective Prompts - A Guide To Prompt Engineering-Independently Published (2024)
No ratings yet
Subramanian Venkataraman - Crafting Effective Prompts - A Guide To Prompt Engineering-Independently Published (2024)
211 pages
Artificial Intelligence For Blockchain - Mariya Ouaissa
100% (1)
Artificial Intelligence For Blockchain - Mariya Ouaissa
377 pages
Final Internship Report
No ratings yet
Final Internship Report
58 pages
Aicte Edukills Google Ai-Ml Virtual Internship: Bachelor of Technology IN Computer Science and Engineering
No ratings yet
Aicte Edukills Google Ai-Ml Virtual Internship: Bachelor of Technology IN Computer Science and Engineering
27 pages
Brain Tumor Final Report Latex
No ratings yet
Brain Tumor Final Report Latex
29 pages
AI ML Report
No ratings yet
AI ML Report
24 pages
Projects GenAI Pinnacle Program
No ratings yet
Projects GenAI Pinnacle Program
14 pages
Week-3 Lecture Notes
No ratings yet
Week-3 Lecture Notes
171 pages
Final Year Report PP
No ratings yet
Final Year Report PP
251 pages
Internshipreport FINAL441
No ratings yet
Internshipreport FINAL441
14 pages
INTERNSHIP
No ratings yet
INTERNSHIP
27 pages
Report
No ratings yet
Report
112 pages
Guru Intership Report 1
No ratings yet
Guru Intership Report 1
40 pages
25june Final - Merged
No ratings yet
25june Final - Merged
64 pages
Report On Advancements in Early Detection of Alzheimer's Disease
No ratings yet
Report On Advancements in Early Detection of Alzheimer's Disease
40 pages
NeuroScribe Turning Thoughts Into Text VINUSHA
No ratings yet
NeuroScribe Turning Thoughts Into Text VINUSHA
93 pages
Time Table Report2023docx
No ratings yet
Time Table Report2023docx
76 pages
Bhargav
No ratings yet
Bhargav
27 pages
A Project Report: in Partial Fulfillment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfillment For The Award of The Degree
50 pages
MINI DOCC LAST (1) - Removed
No ratings yet
MINI DOCC LAST (1) - Removed
52 pages
Internship - Report MONICA Finall
No ratings yet
Internship - Report MONICA Finall
37 pages
Final Modified Document
No ratings yet
Final Modified Document
63 pages
Suriya Intern Report2021 2024
No ratings yet
Suriya Intern Report2021 2024
15 pages
Draft Report
No ratings yet
Draft Report
68 pages
Internship Documentation-1
No ratings yet
Internship Documentation-1
34 pages
Final Modified Document PG
No ratings yet
Final Modified Document PG
58 pages
Ashwaq2k23-24 Internship
No ratings yet
Ashwaq2k23-24 Internship
50 pages
Week-4 Lecture Notes
No ratings yet
Week-4 Lecture Notes
57 pages
Mediclaim Claim Management Service
No ratings yet
Mediclaim Claim Management Service
40 pages
VEERENDRA Internship Report 1
No ratings yet
VEERENDRA Internship Report 1
42 pages
ML Internship Report
No ratings yet
ML Internship Report
29 pages
Final NOBLE COLLEGE Intership Report
No ratings yet
Final NOBLE COLLEGE Intership Report
41 pages
Project Phase I Sample Report
No ratings yet
Project Phase I Sample Report
38 pages
Final Review 1
No ratings yet
Final Review 1
29 pages
Python Intership Report
No ratings yet
Python Intership Report
22 pages
Report
No ratings yet
Report
36 pages
Aireport Word
No ratings yet
Aireport Word
34 pages
1 - Updated - Movie Recommendation System Report Internshippdf2 (1) - Organized
No ratings yet
1 - Updated - Movie Recommendation System Report Internshippdf2 (1) - Organized
24 pages
Portfolios
No ratings yet
Portfolios
26 pages
Prathap MVP Report
No ratings yet
Prathap MVP Report
40 pages
Intern
No ratings yet
Intern
22 pages
Facialppt
No ratings yet
Facialppt
21 pages
Internship Final Report Sandhya
No ratings yet
Internship Final Report Sandhya
23 pages
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
No ratings yet
Hybrid Retrieval-Augmented Generation Approach For LLMs Query Response Enhancement
5 pages
MAJOR AND MINOR PROJECT REPORT FORMAT Niist
No ratings yet
MAJOR AND MINOR PROJECT REPORT FORMAT Niist
9 pages
Final Report
No ratings yet
Final Report
22 pages
MAJOR AND MINOR PROJECT REPORT FORMAT Niist
No ratings yet
MAJOR AND MINOR PROJECT REPORT FORMAT Niist
9 pages
Internship PPT 1
No ratings yet
Internship PPT 1
13 pages
G AIT - LLM-: Enerative Oolkit A Framework For Increasing The Quality of Based Applications Over Their Whole Life Cycle
No ratings yet
G AIT - LLM-: Enerative Oolkit A Framework For Increasing The Quality of Based Applications Over Their Whole Life Cycle
16 pages
Internship Final Report123
No ratings yet
Internship Final Report123
18 pages
Hi Front Cse
No ratings yet
Hi Front Cse
10 pages
Intership START
No ratings yet
Intership START
12 pages
Documentation
No ratings yet
Documentation
19 pages
Jnaanendra Saurabh: Academic Details
No ratings yet
Jnaanendra Saurabh: Academic Details
2 pages
PromptEngineeringPaper 2
No ratings yet
PromptEngineeringPaper 2
9 pages
Banking Aakash3
No ratings yet
Banking Aakash3
6 pages
Task
No ratings yet
Task
10 pages
Artificial Intalligence Broucher
No ratings yet
Artificial Intalligence Broucher
10 pages
Vinit Acharya PDF
No ratings yet
Vinit Acharya PDF
1 page
Solomon JV Gotham Resume
No ratings yet
Solomon JV Gotham Resume
9 pages
Personal and Behavioral Questions
No ratings yet
Personal and Behavioral Questions
5 pages
Parminder Singh Bhatia Resume
No ratings yet
Parminder Singh Bhatia Resume
2 pages
Ankush Resume
No ratings yet
Ankush Resume
1 page
Week-2 - Lecture Notes
No ratings yet
Week-2 - Lecture Notes
100 pages
Fine-Tune & Evaluate LLMs in 2024 With Amazon SageMaker
No ratings yet
Fine-Tune & Evaluate LLMs in 2024 With Amazon SageMaker
12 pages
Block Chain
No ratings yet
Block Chain
9 pages
An Analysis of Large Language Models: Their Impact and Potential Applications
No ratings yet
An Analysis of Large Language Models: Their Impact and Potential Applications
24 pages
LLMs
No ratings yet
LLMs
40 pages
Othp 84
No ratings yet
Othp 84
42 pages
Intro To Large Language Models
No ratings yet
Intro To Large Language Models
45 pages
What Are Multimodal Models
No ratings yet
What Are Multimodal Models
6 pages
At Aba 1
No ratings yet
At Aba 1
9 pages
Afcat Previous Year Question Paper 2013 9aa9a65d
No ratings yet
Afcat Previous Year Question Paper 2013 9aa9a65d
9 pages
Open Roles-GN India
No ratings yet
Open Roles-GN India
20 pages
BLIVA - A Simple Multimodal LLM For Better Handling of Text-Rich Visual Questions
No ratings yet
BLIVA - A Simple Multimodal LLM For Better Handling of Text-Rich Visual Questions
12 pages
AI and Gaming Report
No ratings yet
AI and Gaming Report
16 pages
LLM Security Privacy Survey 2402.00888v2
No ratings yet
LLM Security Privacy Survey 2402.00888v2
51 pages
AI Tools Mastermind Workshop Updated
No ratings yet
AI Tools Mastermind Workshop Updated
2 pages
Streamlining Geoscience Data Analysis With An LLM-driven Workflow
No ratings yet
Streamlining Geoscience Data Analysis With An LLM-driven Workflow
10 pages
Company Profile - SenzMate AIoT Intelligence With Portfolio
No ratings yet
Company Profile - SenzMate AIoT Intelligence With Portfolio
45 pages
Empathy in Conversational AI
No ratings yet
Empathy in Conversational AI
13 pages
ELIZA Reinterpreted The Worlds First Chatbot Was
No ratings yet
ELIZA Reinterpreted The Worlds First Chatbot Was
23 pages
AI
No ratings yet
AI
13 pages
When AI Meets Finance (StockAgent)
No ratings yet
When AI Meets Finance (StockAgent)
33 pages
NeurIPS 2024 Make Your LLM Fully Utilize The Context Paper Conference
No ratings yet
NeurIPS 2024 Make Your LLM Fully Utilize The Context Paper Conference
29 pages
Pleias RAG Models
No ratings yet
Pleias RAG Models
19 pages
LogisticsVLN - Vision-Language Navigation For Low-Altitude Terminal Delivery Based On Agentic UAVs
No ratings yet
LogisticsVLN - Vision-Language Navigation For Low-Altitude Terminal Delivery Based On Agentic UAVs
7 pages
Seminar Report - Merged
No ratings yet
Seminar Report - Merged
28 pages
Is My Meeting Summary Good. Estimating Quality With A Multi-LLM Evaluator
No ratings yet
Is My Meeting Summary Good. Estimating Quality With A Multi-LLM Evaluator
15 pages
LaMI: Large Language Models For Multi-Modal Human-Robot
No ratings yet
LaMI: Large Language Models For Multi-Modal Human-Robot
10 pages
Hypothetical Document Embeddings (HyDE) - Jupyter Notebook
No ratings yet
Hypothetical Document Embeddings (HyDE) - Jupyter Notebook
3 pages
Devansh
No ratings yet
Devansh
1 page
ChatGPT Interaction
No ratings yet
ChatGPT Interaction
3 pages