0% found this document useful (0 votes)
59 views23 pages

大语言模型

Uploaded by

1315715372
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views23 pages

大语言模型

Uploaded by

1315715372
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

SCUT Future Tech

《Large Language Models and


Artificial Intelligence
Engineering Design》
Final Project Report

Project (Work) Title: Large Language


Model Deployment and Fine-Tuning

Student Name: : 俞铭一

Student ID: : 202130192499

Class Name: : 人工智能 2 班

Instructor Name: : 刘晔

Grade:

Instructor's

Signature:

1
SCUT Future Tech

2
SCUT Future Tech

《Large Language Models and Artificial Intelligence

Engineering Design》

Final Project Report

1 Background Explanation

1.1 Background

Large Language Model Deployment and Fine-Tuning are crucial


aspects of leveraging advanced natural language processing (NLP)
models like ChatGLM2-6B, as well as other similar models.
1.Large Language Model Deployment:
(1)Enhanced Natural Language Understanding: Large language
models like GPT-3.5 have the capacity to understand and generate
human-like text, making them invaluable for a wide range of
applications including chatbots, content generation, translation, and
more.
(2)Versatility Across Domains: They can be applied to a broad
spectrum of tasks without the need for specialized models. This includes
tasks like drafting emails, writing code, generating creative content,
summarizing text, and much more.
(3)Adaptability to New Tasks: Deploying a large language model
means you can potentially use it for tasks it wasn't explicitly trained for.
It can generate text for various prompts, even if they weren't present in
its original training data.
(4)Efficient Resource Utilization: One powerful language model can

1
SCUT Future Tech

replace the need for multiple specialized models, making it more


resource-efficient.
Reduced Development Time, Building a custom model for each
specific NLP task can be time-consuming. Deploying a pre-trained large
language model accelerates development.
2.Fine-Tuning:
(1)Task Customization: While pre-trained models like ChatGLM2-6B
have remarkable capabilities, fine-tuning allows you to adapt the model
to specific tasks or domains. This is done by training it on a smaller,
domain-specific dataset.
(2)Improved Task Performance: Fine-tuning enhances the model's
performance on particular tasks.
(3)Data Efficiency: Fine-tuning can adapt a large model with fewer
domain-specific data points compared to training a model from scratch.
(4)Tailored User Experience: Fine-tuning allows you to craft a user
experience that aligns more closely with the specific needs and
expectations of your target audience or application.
(5)Rapid Iteration: Fine-tuning provides a quicker path to
customizing a model compared to training a new model from scratch,
saving time and resources.
In practice, the combination of deploying a large language model
and fine-tuning it for specific tasks can lead to highly efficient, versatile,
and powerful NLP applications that can be applied across a wide range
of industries and use cases.

1.2 Analysis of the research status

China:
(1)Research and Development:
China has emerged as a significant player in the field of artificial

2
SCUT Future Tech

intelligence. Major Chinese tech companies like Baidu, Alibaba, Tencent,


and Huawei have established prominent research labs and made
substantial investments in developing large language models.
(2)Domestic Models:
Chinese companies have created their own large language models
tailored for the Chinese language and market. Baidu's ERNIE (Enhanced
Representation through kNowledge IntEgration) and Tencent's CLUE
(Chinese Language Understanding Evaluation) are notable examples.
(3)Language Diversity:
China is home to a wide array of languages and dialects.
Consequently, there is a strong emphasis on developing models capable
of handling diverse linguistic nuances, including regional dialects and
minority languages.
(4)Government Support:
The Chinese government has prioritized artificial intelligence as a
strategic area of development. Initiatives like "Made in China 2025" and
the "New Generation Artificial Intelligence Development Plan" reflect this
commitment. These initiatives provide funding, policy support, and
infrastructure for AI research and development.
(5)Ethical and Regulatory Considerations:
China, like many other nations, recognizes the importance of ethical
AI development. There is a focus on issues such as data privacy,
security, and ensuring that AI technologies are deployed responsibly and
in compliance with relevant regulations.

International Context:
(1)Global Competition:
The development of large language models is a highly competitive
field with contributions coming from various countries around the world.
Besides China, the United States, Europe, Canada, and other Asian

3
SCUT Future Tech

countries have made significant strides in this domain.


(2)Open Source Collaboration:
The AI research community is known for its collaborative nature.
Researchers and organizations from different countries often publish
their work and share models, contributing to the global advancement of
AI technologies.
(3)Ethical and Societal Concerns:
There is a growing recognition of the ethical implications of large
language models. This includes addressing issues of bias, fairness,
transparency, and ensuring that AI technologies are deployed in ways
that benefit society as a whole.
(4)Regulation and Policy:
Countries and regions have been actively formulating policies and
regulations to govern the development and deployment of AI. This
includes considerations related to data protection, privacy, and
accountability in AI systems.
(5)Language Adaptation:
The international context emphasizes the need for models that can
handle a wide range of languages. Efforts are being made to develop
models that are multilingual or capable of adapting to different linguistic
contexts to cater to the global user base.

1.3 Objectives

Task Objectives Summary:


Part 1: Large Language Model Deployment
(1)Deploy the ChatGLM2-6B large language model
Utilize the Baidu AI platform or an alternative platform to deploy the
ChatGLM2-6B large language model.
Part 2: Large Language Model Fine-Tuning

4
SCUT Future Tech

(1)Fine-tune the ChatGLM2-6B large language model


Employ the Baidu AI platform or another platform for fine-tuning the
ChatGLM2-6B large language model.
(2)Apply techniques like P-tuning-v2 or LoRA for fine-tuning and
evaluation
Implement advanced techniques such as P-tuning-v2 or LoRA for
fine-tuning and evaluation of the model.
(3)Utilize the provided dataset for fine-tuning and evaluation
Utilize the provided dataset to carry out the fine-tuning process and
subsequent evaluation.
(4)The goal of fine-tuning is to reduce the hallucination of large
models, using BLEU and ROUGE as evaluation metrics
The objective of fine-tuning is to minimize the generation of
fabricated information by large models. Evaluation will be based on
metrics like BLEU and ROUGE.
Task Details:
Part 1: Large Language Model Deployment
(1)Implement the deployment of ChatGLM2-6B on the designated
platform.
(2)Ensure the model can effectively respond to and process specific
natural language processing tasks.
Part 2: Large Language Model Fine-Tuning
(1)Fine-tune the ChatGLM2-6B on the specified platform.
(2)Utilize advanced techniques such as P-tuning-v2 or LoRA to
enhance the model's performance.
(3)Employ the provided dataset for training and evaluation, ensuring
the model performs well on the designated tasks.
(4)Evaluate the fine-tuned model using metrics like BLEU and
ROUGE to ensure a reduction in hallucinated information.
Task Challenges:

5
SCUT Future Tech

(1)Ensuring the efficiency and stability of the model during


deployment.
(2)Balancing optimization for specific tasks during fine-tuning
without overfitting.
(3)Employing advanced fine-tuning techniques may require
additional resources and technical expertise.
(4)Proper selection and interpretation of evaluation metrics will
influence the success of the fine-tuning process.
Notes:
(1)Adhere to all regulations and ethical guidelines throughout the
deployment and fine-tuning processes.
(2)Monitor the model's performance regularly for adjustments or
optimizations as needed.

2 Project Process

2.1 Deployment of Large Language Models

(1)Install Conda software:

1. cd /root/
2. wget -U "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/58.0.3029.110 Safari/537.36"
https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2023.07-1-Linux-
x86_64.sh --no-check-certificate #下载
3. bash Anaconda3-2023.07-1-Linux-x86_64.sh #执行安装
4. conda -V #新建终端 查看版本是否 23.5.2 版本
5. conda info #查看是否成功

(2)Create a Conda virtual environment:

1. conda create -n glm2 python=3.8 #新建终端创建 glm2 虚拟环境


2. conda activate glm2 #激活虚拟环境

6
SCUT Future Tech

(3)Upload and Unzip ChatGLM2-6B.zip to the Root Directory

1. cd /root/
2. unzip ChatGLM2-6B.zip #解压

(3)Download Model Files

1. cd /root/ChatGLM2-6B/chatglm2-6b #切入目录
2. #清华源:https://cloud.tsinghua.edu.cn/d/674208019e314311ab5c/?p=%2Fchatglm2-
6b&mode=list
3. #执行以下命令下载 pytorch_model-*.bin 7 个文件
4. wget --no-check-certificate + 下载链接

(5)Install Third-Party Dependencies


After completing the unzipping process, install the dependencies
specified in the requirements.txt file. If the dependency installation is
slow, use a Chinese source for pip install:

1. cd /root/work/ChatGLM2-6B
2. pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple #执行此命令使
用清华源安装

(6)Launch web_demo.py
Before launching web_demo.py, make the following code modifications:
The code value "THUDM/chatglm2-6b" is the path to load the model
implementation. To use your own model file path, modify it
(/home/root/ChatGLM2-6B/chatglm2-6b).

1. #修改为自己模型文件路径
tokenizer = AutoTokenizer.from_pretrained("/root/ChatGLM2-6B/chatglm2-6b",
trust_remote_code=True)
2. #模型量化
model = AutoModel.from_pretrained("/root/ChatGLM2-6B/chatglm2-
6b",trust_remote_code=True).quantize(4).cuda()

7
SCUT Future Tech

(7)After completing the modifications, launch the application:

1. cd /home/aistudio/work/ChatGLM2-6B
2. python web_demo.py #启动

The successful deployment process of the large model is shown


in Figure 2-1.

Figure 2-1 successful deployment


The successful Initiation process of the web_demo.py is shown
in Figure 2-2.

8
SCUT Future Tech

Figure 2-2 successful Initiation

9
SCUT Future Tech

2.2 Fine-Tuning of Large Language Models

(1)Switching to an Anaconda virtual environment

1. conda env list # 查看虚拟环境列表


2. conda activate glm2 # 切入虚拟环境

(2)Place the extracted dataSet directory into the ptuning directory.


Download the processed dataSet dataset from QQ, and put the
unzipped dataSet directory into the ptuning directory.

(2)Installation of ChatGLM2-6B dependencies

1. pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple #执


行此命令使用清华源安装

Running fine-tuning requires the following dependencies to be


installed in addition to the ChatGLM2-6B dependency.

1. #安装微调所需依赖库
!pip install rouge_chinese nltk jieba datasets transformers[torch] -i
https://pypi.douban.com/simple/

(3)Go to the ptuning directory and check the dataset

1. cd /root /ChatGLM2-6B/ptuning/
2. !ls -alh ptuning #检查数据集

(5)Parameter tuning and optimization


PRE_SEQ_LEN and LR are the soft prompt length and the
learning rate of the training, respectively, which can be adjusted for

10
SCUT Future Tech

best results.
The P-Tuning-v2 method freezes all model parameters, and can
be adjusted by adjusting quantization_bit to be the quantization
level of the original model, without this option it is loaded with FP16
precision.
Under the default configuration, the model parameters of INT4
are frozen, and a training iteration will perform 16 cumulative
backward and forward propagations with a batch size of 1, which is
equivalent to a total batch size of 16, and then requires a minimum
of 6.7G of RAM. If you want to improve the training efficiency under
the same batch size, you can increase the value of
per_device_train_batch_size while keeping the product of the two
unchanged, but it will also bring more graphics memory
consumption.

Figure 2-3 train.sh

train.sh after adjusting parameters:


1. PRE_SEQ_LEN=128

11
SCUT Future Tech

2. LR=2e-2
3. NUM_GPUS=1
4.
5. torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py \
6. --do_train \
7. --train_file dataSet/chinese_simplified_train.json \
8. --validation_file dataSet/chinese_simplified_val.json \
9. --preprocessing_num_workers 10 \
10. --prompt_column summary \
11. --response_column title \
12. --overwrite_cache \
13. --model_name_or_path /root/ChatGLM2-6B/chatglm2-6b \
14. --output_dir output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR \
15. --overwrite_output_dir \
16. --max_source_length 256 \
17. --max_target_length 128 \
18. --per_device_train_batch_size 1 \
19. --per_device_eval_batch_size 1 \
20. --gradient_accumulation_steps 16 \
21. --predict_with_generate \
22. --max_steps 1500 \
23. --logging_steps 10 \
24. --save_steps 500 \
25. --learning_rate $LR \
26. --pre_seq_len $PRE_SEQ_LEN \
27. --quantization_bit 4

Parameter description:
PRE_SEQ_LEN: This is the maximum length of the input sequence,
can be adjusted according to your dataset appropriately large or
small, here choose 128.

LR: Learning Rate, can be adjusted appropriately small, because this


is fine-tuning, learning rate does not need to be too large, here
choose 2e-2.

CUDA_VISIBLE_DEVICES:Set the GPU device to use.


--train_file and --validation_file:# Set your own dataset path.

12
SCUT Future Tech

--model_name_or_path:# Set the model path to chatglm-6b.。


--output_dir: Sets the output path of the fine-tuned model and logs.

--max_source_length Specifies the maximum length of our input


dialog text (i.e., soft prompts). If an input text exceeds this length, it
will be truncated; if it is shorter than this length, it will be padded at
the end.

--max_target_length Specifies the maximum length of the response


text output by the model. If the response generated by the model
exceeds this length, it will be truncated; if it is shorter than this
length, it will fill padding at the end. - So, the setting values of these
two parameters need to be determined according to your hardware
situation and data characteristics.
Selection of training parameters:
The following figure 2-4 shows the distribution of the
length of the various parts of the articles in the dataset:

Figure 2-4: length distribution chart


By looking at the distribution of the length of each part of the
article, we find that the length of the summary is mainly distributed
in 0-250, while the length of the title is mainly distributed in 0-45, so

13
SCUT Future Tech

our max_source_length and max_target_length here are set to 256


and 128 respectively.

--gradient_accumulation_steps: set the number of gradient


accumulation steps, can be increased, here choose 16, in order to
use a larger batch size.

--per_device_train_batch_size and --per_device_eval_batch_size: can


be increased appropriately to speed up the training, choose 1 here.

--max_steps: set the maximum number of training steps, according


to the size of the dataset adjusted appropriately, because this
dataset is small, here choose 1500.

--logging_steps and --save_steps: set the number of logging steps


and the number of model saving steps, choose 10 and 500 here.

--quantization_bit: precision selection for model parameters. This


parameter is about the choice of model quantization in the P-Tuning
method, which means to quantize the model parameters from
floating point (FP32) precision to low precision, such as FP16 (half
precision) or lower. This can greatly reduce the number of
parameters in the model and speed up the inference, while
sacrificing some accuracy. Adjusting the quantization level affects
the model accuracy, and it is necessary to test different quantization
levels on the validation set to select the configuration with the least
loss of accuracy. In other words, the P-Tuning method adopts the
strategy of freezing the pre-trained model, and then quantizing and
fine-tuning it. The quantization_bit parameter is set to 4bit to obtain
a higher speedup ratio while preserving the accuracy as much as

14
SCUT Future Tech

possible.

(6)model training

1. bash /root/ChatGLM2-6B/ptuning/train.sh #执行训练

THE model training parameters after successful fine-tuning


training is shown in Figure 2-5.

Figure 2-5: training parameters result


We can learn that the model has a loss value (loss) of 2.4173
after 6.42 training cycles (epochs). The trained model fits the
dataset well.

The successful fine-tuning training process of the large


model is shown in Figure 2-6.

15
SCUT Future Tech

Figure 2-6: Result

2.3 Fine-Tuning Evaluation of Large Models

(1)Inferential estimation of the fine-tuned trained model is an


integral part of the machine learning model development process,
and can help determine the performance and reliability of the
model, thus supporting the practical application of the model.
(Create a new /root/ChatGLM2-6B/ptuning/predict.sh file configured
as follows)

16
SCUT Future Tech

Figure 2-7: predict.sh

predict.sh when evaluate:


1. PRE_SEQ_LEN=128
2. CHECKPOINT=my-chatglm2-6b-checkpoint
3. STEP=1500
4. NUM_GPUS=1
5.
6. torchrun --standalone --nnodes=1 --nproc_per_node=$NUM_GPUS /root/ChatGLM2-
6B/ptuning/main.py \
7. --do_predict=1 \
8. --validation_file /root/ChatGLM2-
6B/ptuning/dataSet/chinese_simplified_val.json \
9. --test_file /root/ChatGLM2-
6B/ptuning/dataSet/chinese_simplified_test.json \
10. --overwrite_cache \
11. --prompt_column summary \
12. --response_column title \
13. --model_name_or_path /root/ChatGLM2-6B/chatglm2-6b \
14. --ptuning_checkpoint /root/ChatGLM2-
6B/ptuning/output/$CHECKPOINT/checkpoint-$STEP \
15. --output_dir /root/ChatGLM2-6B/ptuning/output/predict-result \
16. --overwrite_output_dir \
17. --max_source_length 256\
18. --max_target_length 128 \
19. --per_device_eval_batch_size 1 \
20. --predict_with_generate \
21. --pre_seq_len $PRE_SEQ_LEN \
22. --quantization_bit 4

17
SCUT Future Tech

(4)Executive reasoning

1. bash /root/ChatGLM2-6B/ptuning/predict.sh #执行推理

The successful fine-tuning prediction process of the large


model is shown in Figure 2-8.

Figure 2-8: Predtictiction Result

(3)Model Evaluation
After the model inference is completed, we need to evaluate the
model to understand the performance and accuracy of the model.
Model evaluation can be done by comparing with labeled data to
calculate metrics such as accuracy, recall, F1 score, etc. The results
of the evaluation can help us understand the strengths and
weaknesses of the model and improve and adjust the model.

18
SCUT Future Tech

Figure 2-9: evaluate.sh

evaluate.sh when evaluate:


1. PRE_SEQ_LEN=128
2. CHECKPOINT=my-chatglm2-6b-checkpoint
3. STEP=1500
4. NUM_GPUS=1
5.
6. torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS /root/ChatGLM2-
6B/ptuning/main.py \
7. --do_eval \
8. --validation_file /root/ChatGLM2-
6B/ptuning/dataSet/chinese_simplified_val.json \
9. --test_file /root/ChatGLM2-
6B/ptuning/dataSet/chinese_simplified_test.json \
10. --overwrite_cache \
11. --prompt_column summary \
12. --response_column title \
13. --model_name_or_path /root/ChatGLM2-6B/chatglm2-6b \
14. --ptuning_checkpoint /root/ChatGLM2-
6B/ptuning/output/$CHECKPOINT/checkpoint-$STEP \
15. --output_dir /root/ChatGLM2-6B/ptuning/output/evaluate-result \
16. --overwrite_output_dir \
17. --max_source_length 256 \
18. --max_target_length 128 \
19. --per_device_eval_batch_size 4 \
20. --predict_with_generate \
21. --pre_seq_len $PRE_SEQ_LEN \
22. --quantization_bit 8

19
SCUT Future Tech

(4)Implementation assessment

1. bash /root/ChatGLM-Tuning/evaluate.sh #执行评估

The successful fine-tuning evaluation process of the large


model is shown in Figure 2-10.

Figure 2-10: Evaluation Result


By evaluating the model, we obtained Bleu-4, Rouge-1, Rouge-2,
and Rouge-L for the fine-tuned model.Compare them with the
original model parameters.
Comparison of fine-tuning effects

Metric Bleu-4 Rouge-1 Rouge-2 Rouge-L

Score 2.1368 14.6026 1.5682 11.5908

P-Tuning-v2 18.871 33.5337 12.2603 30.6107

20
SCUT Future Tech

3 Summary

I find that the evaluation result is higher than the given fine-
tuning evaluation baseline. After fine-tuning, the model outperforms
the original model on several evaluation metrics (Bleu-4, Rouge-1,
Rouge-2, Rouge-L). This means that the fine-tuning done helps to
improve the performance of the model, making it more accurate or
more in line with expectations when generating text or
accomplishing other tasks.

21

You might also like