0% found this document useful (0 votes)
43 views29 pages

Hugging Face Repo Project Report

Uploaded by

Chinmayi HS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views29 pages

Hugging Face Repo Project Report

Uploaded by

Chinmayi HS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

VIDYAVARDHAKA COLLEGE OF ENGINEERING

GOKULAM III STAGE, MYSURU-570 002

Accredited by NAAC with A ‘Grade’, Autonomous institution affiliated to


Visvesvaraya Technological University, Belagavi

Exploring the Depths: Diving into GPTQ Llama's


capabilities in Artificial intelligence

An Internship Report submitted in partial fulfillment for the award of degree

BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE & ENGINEERING
by
CHINMAYI H [4VV20CS024]

Internship Carried Out


at
NeuroFlares, Mysore
Under the Guidance of

Internal Guide External Guide

Dr.K.PARAMESHA Mr.PRAKASH M
Professor CEO
Dept. of CS & E NeuroFlares
VVCE, Mysuru Mysuru

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


Accredited by NBA, New Delhi

2023-24
Vidyavardhaka College of Engineering,
Gokulam 3rd Stage, Mysuru – 570002,
Department of Computer Science and Engineering,

CERTIFICATE
This is to certify that the internship report entitled “Exploring the Depths: Diving into GPTQ
Llama’s capabilities in Artificial Intellignece” has been successfully completed by Chinmayi

H student of 8th semester, Computer Science and Engineering, Vidyavardhaka College of


Engineering, Mysuru in partial fulfilment for the award of the degree of Bachelor of
Engineering in Computer Science and Engineering of the Visvesvaraya Technological
University, Belagavi, during the academic year 2023 – 2024. The report has been approved as
it satisfies the requirement in respect of internship prescribed for the said degree.

Signature of the internship guide Signature of the external guide Signature of the HoD

Dr.K.Paramesha Mr.Prakash M Dr.Pooja.M.R

Name of the Examiners Signature with Date

1)

2)
ACKNOWLEDGEMENT

The Internship would not have been possible without the guidance, assistance, and

suggestions of many individuals. I would like to express my deep sense of gratitude and

indebtedness to each one who has helped me to make this Internship a success.

I heartily thank my beloved Principal, Dr. B Sadashive Gowda for his wholehearted

support and for his kind permission to undergo the Internship.

I wish to express my deepest gratitude to Dr. Pooja M R, Head of Department,

Computer Science and Engineering, VVCE for their constant encouragement and

inspiration in taking up this Internship.

I gracefully thank my internship guide, Dr. K Paramesha, Professor, Department of

Computer Science and Engineering, VVCE for their encouragement and advice

throughout the course of the internship.

I gracefully thank my external guide, Mr. Prakash M, CEO, NeuroFlares Pvt Ltd for their

encouragement and advice throughout the course of the internship.

In the end, I extend my gratitude towards my family members and friends for their

valuable suggestions and encouragement.

CHINMAYI H (4VV20CS024)
ABSTRACT

In this internship we worked on a hugging Face repository named The Bloke and the
project we explored is Vicuna 7B quantized using GPTQ 4-bit 128g. This project aims to
run these GPTQ models in text-generation-webui. The quantization process reduces the
precision of the model's parameters while minimizing the loss of performance. The
quantized model is provided in two files, with “vicuna-7B-GPTQ-4bit-128g.safetensors”
being the recommended choice. These files were created using the latest GPTQ code and
require the latest GPTQ-for-LLama to be integrated into the text-generation-webui for
usage. To utilize the quantized model for text generation tasks we have cloned the
GPTQ-for-LLaMa and text-generation-webui repositories, creating symbolic links
between them, and installing the quantized model into the web UI's models directory.
The next project was to learn how to train a new language model (Esperanto) from
scratch using Transformers and Tokenizers. This project entails training a "small"
language model, comprising 84 million parameters, on the constructed language
Esperanto, with a focus on fine-tuning it for part-of-speech tagging.

Keywords: Vicuna 7B model, Quantization, GPTQ-for-LLaMa, Model files, Text-


generation-webui, Esperanto, Part-of-speech tagging, Fine Tuning.
TABLE OF CONTENTS

TITLE PAGE NO.


1. OVERVIEW OF THE ORGANIZATION 2

1.1 About Company 2

1.2 Company History 3

1.3 Software Solution 3

1.4 Services Provided 4-5

2. TRAINING PROGRAM 6-8

3. LEARNING EXPERIENCES 9-22

3.1 Knowledge Acquired 9-11

3.2 Project Execution 12-19

3.3 Skills Learned 20

3.4 Observed Attitude and Gained Values 21

3.5 Most Challenging Task Performed 22

4. CONCLUSION 23

5. REFERENCES 24
LIST OF FIGURES

NAME PAGE NO.


Figure 1.1 List of services provided by NeuroFlares 4

Figure 3.1 The chats with the Vicuna model’s quantized GPTQ using 14
LLaMa

Figure 3.2 The GPTQ’s capacity to provide long answers 14

Figure 3.3 The codes created at interface mode 15

Figure 3.4 The model could also analyse conversations between two or 15
more people

Figure 3.5 Capable to give complex codes 16


Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 1

OVERVIEW OF THE ORGANIZATION

1.1 About Company


NeuroFlares is a service-based Information Technology start-up company which provides
software services, unique business solutions, across the world. NeuroFlares assists
companies around the world to design, develop, localize and publish their applications
across various platforms. It has a dedicated team to bear on every project they take. They
have provided services in regions such as Israel, China, US and India. The quality of service
they provide, unique solution with strategic approach for every problem and optimization
of application, gives the ability to strengthen existing relationships with clients, build new
ones & enter new fields.

Mission: Our Mission is to provide quality assurance to clients with maximum efforts
driven towards customer satisfaction and create more employment opportunities.

Vision: Our vision is to Innovate and Automate industrial system for a better quality
product, increased productivity, efficient use of materials. We believe in providing unique
solutions in the most efficient way with robust and structured methodology, with gradual
evolution from hard-work to smart- work culture.

Principles: Automation |Quality | Innovation | Technology | Customer Satisfaction

NeuroFlares has a dream of evolving into a Global IT Company, ensuring that the solutions
being delivered include best practice in I. T. with the chosen area of technology.

• They operate with complete focus to Maximize customer satisfaction.


• Develop and encourage an environment of mutual respect within company and
extending it beyond to clients.
• Encourage commitment and personal learning of workforce.
• The organization is built on the strong pillars of integrity, honesty, and self-respect.

Dept of. CSE VVCE, Mysore 2


Diving into GPTQ Llama's capabilities in Artificial intelligence

1.2 Company History


Established in 2015 in Mysore, India which started as small as team of two. We started off
as an AI based startup providing solution for Intrusion Detection. The state-of-the art
development centre is now equipped with latest technology like HoloLens, Oculus, High-
end Graphic cards and networked environment with multi-layered controls for enhanced
data security. We deliver end-to-end solutions through valuable strategy development with
our highly skilled engineering execution.

NeuroFlares India has utilized its expertise and skills in order to keep pace with the surging
need for technological breakthroughs in the society, and has accomplished the same with
absolute dedication and perseverance.

It has provided solutions in private and public sector which ranges from small scale
industries to huge business. Also provided solution to Banks, Manufacturing Companies,
Entertainment industries etc. The company is known for Automation of applications,
Quality of the software and Delivery in Time.

1.3 Software Solutions


The company has developed a number of products for its clients in this service. Their
software development processes with unique and expertise solution, combined with
excellent infrastructure which has significantly increased the “on-time and on-budget”
delivery of software. Their services begin from analysis, moving through designing,
development, testing and implementation to maintenance. Their applications come in all
sizes, be it a one-table database, or a massive client-server application. The complete
automation of applications is a major field that we specialize in. Currently, they have an
automated software solution for manufacturing sector.

NeuroFlares was started with the aim of helping customers and business to provide unique
and improved services without impacting the quality at cost-effective price. They are a one-
point Engineering consulting company who can work as a guide in any of our project with
focus and aim of cost saving without compromising the quality.

Dept of. CSE VVCE, Mysore 3


Diving into GPTQ Llama's capabilities in Artificial intelligence

1.4 Services Provided

Figure 1.1 List of services provided by NeuroFlares

• Web services
• Gaming applications
• Native Android application.
• Native Desktop Application.
• VR/MR application.
• 3D modelling and FEM automation.
• UI/UX design.
• Artificial Intelligence with Image Processing

Web Services

NeuroFlares web solutions & services to help customer reach to a wider customer base.
The web is a new and different medium for communication and requires a different
viewpoint and skill set to use it in the most effective way. We need web consulting to get
more return on our investment in our web site. The company helps us to get the most
effective solution through:

• Website Development

Dept of. CSE VVCE, Mysore 4


Diving into GPTQ Llama's capabilities in Artificial intelligence

• Web Multimedia
• Web Promotion
• Web hosting
• E-commerce

Gaming Applications

NeuroFlares will offer game development on PC, android and web games including
Background Music composition, 2D, 3D Asset Creation. Game Corner will entertain users
with the best gaming experience possible. They provide games such as arcade games,
shooting games, strategy games, sport games, adventurous games, etc., They build simple
2D games to complex multiplayer applications which targets not only Children but also
Youths, Adults and anyone who is interested in leisure and games.

Android and Desktop Application with Machine Learning

They develop android apps(native) and desktop apps(native) based on your needs for PC,
mobiles and tablets with Artificial Intelligence support. Their prior work includes, but not
limited to auto app updating and download on the server to maintain app privacy.

They use the following to achieve this:

• Restful API to communicate with endpoints in web


• Usage of Git, Bitbucket for code management
• Image, video and files upload/download
• Handling database (sqlite database)
• Firebase
• Google Map integration
• Neuromorphic design
• SDK integration

The company has offered many apps like: Map related app development, Finance apps
development, E-commerce and Shopping Cart Apps, Retail and Fashion Apps, Education
Apps, Travel Apps, Food and Restaurant Apps, Real Estate & Home Automation Apps, and
many more. And has done over 30+ projects.

Dept of. CSE VVCE, Mysore 5


Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 2
TRAINING PROGRAM
The internship duration was for 6 months starting from 16th August 2023 to 16 February
2024. In these six months we explored three major project all from Hugging Face
repositories.

The first and the major one is Quantization of Vicuna 7B model using GPTQ for LLaMa.
The second one was about exploring and learning on how to train the model from the
scratch, the model was an Esperanto model using tokenizers and transformers. Drawing
similarities from that we tried to even fine tune the model. That is the third project where
we tried to understand how to fine tune an already pretrained model.

Vicuna 7B

The Vicuna 7B model is a state-of-the-art language model with 7 billion parameters,


capable of understanding and generating human-like text. However, its large size and
computational requirements limit its applicability in certain contexts, such as edge devices
or low-power systems. To address this limitation, quantization techniques are employed to
reduce the model's size and computational overhead while preserving its functionality.

To utilize the quantized model for text generation tasks, integration with the text-
generation-webui is necessary. The process involves cloning the GPTQ-for-LLaMa and
text-generation-webui repositories, creating symbolic links between them, and installing
the quantized model into the web UI's models directory. Additionally, the dependencies for
both repositories must be installed to ensure seamless operation.

The quantized Vicuna 7B model offers a more resource-efficient alternative to the original
model, suitable for deployment in environments with limited computational resources. By
following the provided instructions and integrating the model into the text-generation-
webui, users can leverage advanced AI capabilities for text generation tasks while
optimizing resource utilization. This project contributes to making sophisticated AI
technology more accessible and applicable in real-world scenarios.

Dept of. CSE VVCE, Mysore 6


Diving into GPTQ Llama's capabilities in Artificial intelligence

Training a language model

In this second project we are training a "small" language model, comprising 84 million
parameters, on the constructed language Esperanto, with a focus on fine-tuning it for part-
of-speech tagging. The process begins with acquiring a corpus of Esperanto text, combining
portions of the OSCAR corpus from INRIA with the Leipzig Corpora Collection to create
a training dataset of 3 GB. Subsequently, a byte-level Byte-pair encoding (BPE) tokenizer,
akin to GPT-2, is trained with a vocabulary size of 52,000, featuring special tokens similar
to RoBERTa to facilitate effective language modeling tasks. The language model is then
trained from scratch on a masked language modeling (MLM) task using the transformers
library, with custom hyperparameters optimized for training efficiency. Evaluation of the
model's performance involves utilizing the FillMaskPipeline to assess its ability to predict
masked tokens, including more complex prompts to gauge its semantic understanding.
Following successful training, the model undergoes fine-tuning for part-of-speech tagging
using annotated Esperanto POS tags in the CoNLL-2003 format. Finally, the trained model
is shared with the community, accompanied by a comprehensive README.md model card
detailing its description, training parameters, evaluation results, intended uses, and
limitations, thereby contributing to the broader NLP community and showcasing the
versatility of advanced language modeling techniques.

Fine Tuning Pretrained model

The project aims to demonstrate the process of fine-tuning pretrained language models
using the Transformers library, focusing on three major deep learning frameworks:
PyTorch, TensorFlow with Keras, and native PyTorch. It begins by emphasizing the
advantages of using pretrained models, such as reducing computation costs and carbon
footprint, before delving into the fine-tuning process. The tutorial walks through the steps
of preparing a dataset, specifically the Yelp Reviews dataset, for training. It then proceeds
to explain how to fine-tune a pretrained model using each of the mentioned frameworks.
For PyTorch, it showcases the use of the Trainer class provided by Transformers, which
streamlines the training process with various options for hyperparameters and training
features.

Dept of. CSE VVCE, Mysore 7


Diving into GPTQ Llama's capabilities in Artificial intelligence

In TensorFlow with Keras, it demonstrates how to load, compile, and fit a model using the
Keras API, as well as how to use the prepare_tf_dataset method to convert datasets into a
format compatible with Keras. Lastly, for native PyTorch, it outlines how to manually post-
process tokenized datasets, create DataLoaders, set up optimizer and learning rate
scheduler, and implement the training loop. Throughout the tutorial, the focus remains on
fine-tuning pretrained models for sequence classification tasks, offering insights into best
practices and optimizations for each framework.

Dept of. CSE VVCE, Mysore 8


Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 3

LEARNING EXPERIENCES

3.1 Knowledge Acquired


Our internship sessions started with a detailed introduction to Hugging Face which is a
company and open-source community that focuses on natural language processing (NLP)
technologies, particularly in the domain of deep learning and transformers. The company
provides a variety of tools, libraries, and pretrained models for tasks such as text
classification, language translation, text generation, and more. Their most notable offering
is the Transformers library, which is an open-source library built on top of PyTorch and
TensorFlow for working with transformer-based models, such as BERT, GPT, and T5. In
addition to the Transformers library, Hugging Face also maintains a model hub where users
can access pretrained models and fine-tune them for their specific tasks, as well as a
community platform for sharing models, datasets, and research in the field of NLP. Hugging
Face is similar to GitHub, especially in terms of its role as a platform for collaboration,
sharing, and version control, but with a specific focus on natural language processing (NLP)
models and tools.

Vicuna 7B

Later on we were introduced about the models in Generative AI one of which is Vicuna 7B
model, it’s a chat assistant trained by fine-tuning LLaMA on user-shared conversations
collected from ShareGPT.

• Developed by: LMSYS


• Model type: An auto-regressive language model based on the transformer
architecture.
• License: Non-commercial license
• Finetuned from model: LLaM

Dept of. CSE VVCE, Mysore 9


Diving into GPTQ Llama's capabilities in Artificial intelligence

The primary use of Vicuna is research on large language models and chatbots. The primary
intended users of the model are researchers and hobbyists in natural language processing,
machine learning, and artificial intelligence. Vicuna v0 is fine-tuned from LLaMA with
supervised instruction fine-tuning. The training data is around 70K conversations collected
from ShareGPT. Vicuna is evaluated with standard benchmarks, human preference, and
LLM-as-a-judge.

GPTQ and LLaMa

Llama (Large Language Model Meta AI) is a family of autoregressive large language
models (LLMs), released by Meta AI starting in February 2023.

Four model sizes were trained for the first version of LLaMA: 7, 13, 33, and 65 billion
parameters. LLaMA's developers reported that the 13B parameter model's performance on
most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters)
and that the largest model was competitive with state of the art models such
as PaLM and Chinchilla. In contrast, the most powerful LLMs have generally been
accessible only through limited APIs (if at all), Meta released LLaMA's model weights to
the research community under a non-commercial license. Within a week of LLaMA's
release, its weights were leaked to the public on 4chan via BitTorrent.

In July 2023, Meta released several models such as Llama 2, using 7, 13, and 70 billion
parameters. LLaMa2 is a suite of pretrained language models while LLaMa2- chatbot is a
fine tuned chatbot that uses reinforcement learning through human feedback.

GPT: GPT (Generative Pre-trained Transformer) is a type of artificial intelligence model


that's really good at understanding and generating human-like text. It's been trained on
massive amounts of text data to learn patterns in language.
Quantization: Quantization is a process of making things simpler. When we apply
quantization to a model like GPT, we're essentially making it smaller and more efficient
while trying to keep its performance as close to the original as possible. This makes it easier
to run the model on devices with limited resources, like smartphones or tablets.

Dept of. CSE VVCE, Mysore 10


Diving into GPTQ Llama's capabilities in Artificial intelligence

GPTQ for LLaMa: So, "GPTQ for LLaMa" is about applying this quantization process
specifically to GPT models within the LLaMa community. It's a way of making these large
language models more lightweight and easier to work with for researchers and developers
within the LLaMa community, while still maintaining their ability to understand and
generate human-like text. Through these six months we were introduced with many models
and we were made to quantize the Vicuna model using GPTQ for LLaMa.

Training a language model using RoBERTa for Tokenizer

The Esperanto model is a language model trained specifically on the constructed language
Esperanto. It is designed to understand and generate text in Esperanto language, utilizing
advanced natural language processing techniques. The model is typically fine-tuned for
specific tasks such as text generation, classification, or translation within the context of
Esperanto language data. By training on Esperanto text corpora and incorporating linguistic
features specific to Esperanto, such as its regular grammar and vocabulary, the model
becomes proficient in processing and generating Esperanto text, contributing to various
NLP applications within the Esperanto-speaking community.

Fine Tuning

The fine-tuning demonstrates the process of the parameters of the pretrained model being
adjusted for specific tasks while still retaining the knowledge and representations learned
during the initial pretraining phase. This process allows the model to learn task-specific
patterns and features from the new dataset, improving its performance on the target task
using the Transformers library. By leveraging pretrained models trained on vast amounts
of text data, researchers can significantly reduce the computational resources required to
train models from scratch while achieving state-of-the-art performance. The project
showcases three different frameworks—PyTorch, TensorFlow with Keras, and native
PyTorch—and provides step-by-step guidance on preparing datasets, fine-tuning models,
and evaluating their performance. Fine-tuning pretrained models enables researchers to
adapt them to specific tasks or domains, making them more efficient and effective for real-
world applications.

Dept of. CSE VVCE, Mysore 11


Diving into GPTQ Llama's capabilities in Artificial intelligence

3.2 The Project Executions

Vicuna7B for text generation webui


The repository we work on contains a version of the Vicuna 7B model that has been
quantized (simplified) using a technique called GPTQ-for-LLaMa. This specific version of
the model has been quantized to 4-bit precision and grouped in a way that makes it suitable
for deployment on devices with limited computational resources.

1. Clone the Repositories:

1. Open your command line interface (e.g., Terminal on macOS/Linux, Command


Prompt on Windows).
2. Run the following commands to clone the required repositories:
➢ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
➢ git clone https://github.com/oobabooga/text-generation-webui

2. Create a Symbolic Link:

1. Navigate to the text-generation-webui directory:


➢ cd text-generation-webui
2. Create a symbolic link to the GPTQ-for-LLaMa repository inside the repositories
directory:
➢ mkdir -p repositories
➢ ln -s ../GPTQ-for-LLaMa repositories/GPTQ-for-LLaMa

3. Install Dependencies:

Ensure that you have all the necessary dependencies installed for both GPTQ-for-LLaMa
and text-generation-webui. The instructions to install dependencies are:

1. Navigate to the GPTQ-for-LLaMa Repository: Open a command line interface


(e.g., Terminal on macOS/Linux, Command Prompt on Windows) and navigate to
the directory where you cloned the GPTQ-for-LLaMa repository. Ensure that
Python is installed on your system.

Dept of. CSE VVCE, Mysore 12


Diving into GPTQ Llama's capabilities in Artificial intelligence

2. Install Required Python Packages: Run the following command to install the
required Python packages specified in the requirements.txt file:
➢ pip install -r requirements.txt

Installing Dependencies for text-generation-webui:

1. Navigate to the text-generation-webui Repository: Open a command line


interface and navigate to the directory where you cloned the text-generation-webui
repository.
2. Install Node.js and npm: Ensure that Node.js and npm (Node Package Manager)
are installed on your system. You can download and install Node.js from the official
website: Node.js Downloads.
3. Install Required Node.js Packages: Run the following command to install the
required Node.js packages specified in the package.json file:
➢ npm install

4. Install the Model: Place the model file vicuna-7B-GPTQ-4bit-128g.safetensors inside


the models directory within the text-generation-webui directory.

5. Launch the User Interface (UI): Run the following command to start the UI:

➢ cd text-generation-webui
➢ python server.py --model vicuna-7B-GPTQ-4bit-128g --wbits 4 --groupsize 128

6. Interact with the UI: Once the server is running, you can access the UI by opening a
web browser and navigating to the specified address (usually http://localhost:8000 by
default).Use the UI to input text prompts and generate responses using the quantized
Vicuna 7B model.

Model Files: Two model files are provided, one of which is vicuna-7B-GPTQ-4bit-
128g.safetensors, representing the quantized Vicuna 7B model in a newer safetensors
format with improved file security.

Triton and CUDA Branches: Depending on the operating system and requirements, users
can choose to use either the Triton or CUDA branch of GPTQ-for-LLaMa.

Dept of. CSE VVCE, Mysore 13


Diving into GPTQ Llama's capabilities in Artificial intelligence

RESULTS

Figure 3.1 The chats with the Vicuna model’s Figure 3.2 The GPTQ’s capacity to provide long answers
quantized GPTQ using LLaMa

Dept of. CSE VVCE, Mysore 14


Diving into GPTQ Llama's capabilities in Artificial intelligence

Figure 3.3 The codes created at interface mode

Figure 3.4 The model could also analyse conversations between two or more people

Dept of. CSE VVCE, Mysore 15


Diving into GPTQ Llama's capabilities in Artificial intelligence

Figure 3.5 Capable to give complex codes

Dept of. CSE VVCE, Mysore 16


Diving into GPTQ Llama's capabilities in Artificial intelligence

Train a new language model from scratch using Transformers and


Tokenizers

The aim of this project is to train a language model specifically for Esperanto, a constructed
language designed to be easy to learn. This model, named EsperBERTo, will be trained
from scratch using a dataset of Esperanto text.

Step 1: Dataset Collection: Gather a large corpus of text written in Esperanto from various
sources, including news articles, literature, and Wikipedia.Concatenate multiple datasets to
create a comprehensive training corpus.

Step 2: Tokenization: Before training the model, they need to convert the text into a format
that the model can understand. They use a technique called byte-level Byte-pair encoding
(BPE) to tokenize the text into smaller units called tokens.Use the tokenizers library to train
the tokenizer with a vocabulary size of 52,000 and special tokens similar to RoBERTa.

➢ from tokenizers import ByteLevelBPETokenizer

# Initialize a tokenizer

➢ tokenizer = ByteLevelBPETokenizer()

# Customize training

➢ tokenizer.train(files=paths, vocab_size=52_000, min_frequency=2,


special_tokens=[

"<s>",

"<pad>",

"</s>",

"<unk>",

"<mask>",

])

Dept of. CSE VVCE, Mysore 17


Diving into GPTQ Llama's capabilities in Artificial intelligence

# Save files to disk

➢ tokenizer.save_model(".", "esperberto")

Step 3: Language Model Training:Implement a subclass of Dataset to load data from the
tokenized text files.Train the language model using the run_language_modeling.py script
from the transformers library.Use a RoBERTa-like model architecture and train on a task
of Masked Language Modeling (MLM).

➢ from transformers import RobertaForMaskedLM, RobertaTokenizer, Trainer,


TrainingArguments
# Define model and tokenizer
➢ model = RobertaForMaskedLM.from_pretrained("roberta-base")
➢ tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
# Fine-tune model
➢ training_args = TrainingArguments(
per_device_train_batch_size=4,
num_train_epochs=3,
logging_dir='./logs',
)
➢ trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
➢ trainer.train()

Dept of. CSE VVCE, Mysore 18


Diving into GPTQ Llama's capabilities in Artificial intelligence

Step 4: Model Evaluation:Use the trained model to fill in masked words in sentences and
check the quality of predictions.

➢ from transformers import pipeline


➢ fill_mask = pipeline(
"fill-mask",
model="path_to_trained_model",
tokenizer="path_to_trained_tokenizer",
)
➢ result = fill_mask("La suno <mask>.")
➢ print(result)

Step 5: Fine-tuning for Downstream Task:Fine-tune the trained language model on a


downstream task of Part-of-Speech (POS) tagging using the run_ner.py script from
transformers.Use a dataset of annotated Esperanto POS tags formatted in the CoNLL-2003
format.

➢ from transformers import Trainer, TrainingArguments


➢ training_args = TrainingArguments(
per_device_train_batch_size=4,
num_train_epochs=3,
logging_dir='./logs',
)
➢ trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
➢ trainer.train()

Step 6: Sharing the Model:Upload the trained model to the Hugging Face model hub for
sharing with the community.

Dept of. CSE VVCE, Mysore 19


Diving into GPTQ Llama's capabilities in Artificial intelligence

3.3 Skills Learned

• Python Programming: You'll use Python to execute the code snippets provided
in the repository and interact with various libraries and frameworks.
• Training Language Models: You'll learn how to train a language model from
scratch using frameworks like Transformers and tokenizers. This includes
understanding the concepts of tokenization, model architecture, hyperparameter
tuning, and training pipelines.
• Data Preprocessing: Preprocessing text data involves tasks like cleaning,
tokenization, and formatting. You'll gain experience in preparing datasets for
training language models.
• Model Evaluation: You'll evaluate the trained models using metrics like loss
values, performance on masked token prediction tasks, and downstream task
performance (e.g., part-of-speech tagging).
• Hyperparameter Tuning: Experimenting with different sets of hyperparameters
allows you to understand their impact on model performance and training dynamics.
• Tensorboard Usage: Monitoring training progress and visualizing model
performance using Tensorboard helps in gaining insights into the training process.
• Version Control: You'll learn how to use Git for version control, including
cloning repositories and managing branches.
• Installation and Dependency Management: You'll gain experience in installing
and managing dependencies for Python packages and libraries required for running
GPTQ-for-LLaMa and text-generation-webui.
• Model Management: You'll learn how to manage model files and directories,
including linking models to the text-generation-webui repository.
• Command-Line Interface: You'll use the command line to execute commands
for launching the text-generation web UI and specifying model configurations.
• Problem-Solving: You may encounter challenges during the installation or setup
process, requiring problem-solving skills to troubleshoot and resolve issues.
• Understanding Model Formats: You'll gain an understanding of model file
formats like safetensors and how they are used in GPTQ-for-LLaMa.

Dept of. CSE VVCE, Mysore 20


Diving into GPTQ Llama's capabilities in Artificial intelligence

3.4 Observed attitudes and Gained Values

• Curiosity: Exploring new methodologies and techniques for training language


models demonstrates a curiosity for learning and experimentation.
• Persistence: Training language models from scratch can be time-consuming and
requires patience and persistence to overcome challenges encountered during the
process.
• Openness to Collaboration: Sharing models and contributing to the community
by uploading models to platforms like Hugging Face demonstrates a willingness to
collaborate and contribute to the collective learning of the community.
• Attention to Detail: Following the provided instructions carefully and accurately
demonstrates attention to detail, which is important for successful installation and
setup.
• Resourcefulness: Finding solutions to problems encountered during the setup
process fosters a sense of resourcefulness and adaptability.
• Collaboration: Leveraging community resources, such as documentation and
online forums, for troubleshooting demonstrates the value of collaboration and
knowledge sharing within the developer community.
• Continuous Improvement: Iteratively improving the setup process based on
feedback and experience reflects a commitment to continuous improvement and
optimization.

Dept of. CSE VVCE, Mysore 21


Diving into GPTQ Llama's capabilities in Artificial intelligence

3.5 Most challenging task performed

CUDA Installation (for CUDA branch): Using the CUDA branch of GPTQ-for-LLaMa,
setting up CUDA and ensuring compatibility with GPU was complex.

Issues: Compatibility issues between CUDA versions and GPU drivers, as well as
dependencies on specific CUDA versions, had raised.

Solution: The installation instructions provided in the repository. Ensured that I had the
correct CUDA version installed and compatible GPU drivers.

The commands I used to clone the Triton branch of GPTQ-for-LLaMa, clone text-
generation-webui, and install GPTQ into the UI:

➢ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa


➢ git clone https://github.com/oobabooga/text-generation-webui
➢ mkdir -p text-generation-webui/repositories
➢ ln -s GPTQ-for-LLaMa text-generation-webui/repositories/GPTQ-for-LLaMa

On Windows we cannot use the Triton branch of GPTQ so we used CUDA branch:

➢ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda


➢ cd GPTQ-for-LLaMa
➢ python setup_cuda.py install

Dept of. CSE VVCE, Mysore 22


Diving into GPTQ Llama's capabilities in Artificial intelligence

Chapter 4
CONCLUSION
GPTQ-for-LLaMa: This repository focuses on quantizing language models using the
GPTQ framework. It provides tools and utilities for quantization, enabling users to
convert large language models into more efficient versions suitable for deployment on
resource-constrained devices. The repository includes detailed documentation and
examples for quantizing models, along with instructions for integration into text-
generation-webui.

text-generation-webui: This repository hosts a user interface for text generation, allowing
users to interact with language models in a web-based environment. It provides a platform
for deploying and utilizing quantized language models generated using the GPTQ-for-
LLaMa framework. The repository includes features for model management, input/output
customization, and real-time text generation.

EperBERTo: This repository demonstrates the process of training a language model from
scratch for the Esperanto language. It outlines the steps involved in dataset selection,
tokenizer training, language model training, and fine-tuning for downstream tasks such as
Part-of-Speech tagging. The repository includes code snippets, configuration files, and
instructions for training an Esperanto-specific language model using the Hugging Face
transformers library.

OSCAR Corpus: This repository contains the Esperanto portion of the OSCAR corpus
from INRIA. The OSCAR corpus is a large multilingual dataset obtained from Common
Crawl dumps of the web. The Esperanto subset of this corpus serves as a valuable
resource for training language models and conducting NLP research in the Esperanto
language.

Each of these repositories plays a crucial role in the process of language model
development, training, and deployment, contributing to advancements in natural language
processing and facilitating research in linguistic diversity and accessibility.

Dept of. CSE VVCE, Mysore 23


Diving into GPTQ Llama's capabilities in Artificial intelligence

REFERENCES

1. NeuroFlares www.neuroflares.com
2. LMSYS ORG “Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%*
ChatGPT Quality”by: The Vicuna Team, Mar 30, 2023
https://lmsys.org/blog/2023-03-30-vicuna/
3. "Introducing LLaMA: A foundational, 65-billion-parameter large language
model". Meta AI. 24 February 2023.
4. Vincent, James (7 November 2019). "OpenAI has published the text-generating AI
it said was too dangerous to share". The Verge. Archived from the original on 11
June 2020. Retrieved 19 December 2020.
5. Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (1 September 2014).
"Neural Machine Translation by Jointly Learning to Align and Translate".
6. Vincent, James (14 February 2019). "OpenAI's new multitalented AI writes,
translates, and slanders". The Verge. Archived from the original on 18 December
2020. Retrieved 19 December 2020.
7. Hugging Face article “How to train a new language model from scratch using
Transformers and Tokenizers” by Julian Chaumond February 14, 2020
https://huggingface.co/blog/how-to-train
8. The Bloke / vicuna-7B-v0-GPTQ https://huggingface.co/TheBloke/vicuna-7B-v0-
GPTQ
9. “Fine-tune a pretrained model” https://huggingface.co/docs/transformers/training

Dept of. CSE VVCE, Mysore 24

You might also like