ChatGPT Pamphlet
ChatGPT Pamphlet
A Quick Review
Alireza Akhavanpour
CLASS.VISION
معرفی و نگاه کلی
• What is ChatGPT :
• Is a chatbot
It uses Deep learning to generate human-like text
• Act as a personal teacher
It knows almost everything!
It can be an alternative for Google!
• Developed by OpenAI
OpenAI is an AI research organization founded in 2015
goal: promoting and developing friendly AI
• What is ChatGPT :
• Is a chatbot
It uses Deep learning to generate human-like text
• Act as a personal teacher
It knows almost everything!
It can be an alternative for Google!
• Developed by OpenAI
OpenAI is an AI research organization founded in 2015
goal: promoting and developing friendly AI
ثبت نام
Human Feedback
GPT-3 ChatGPT
GPT-3 درMisalignment مشکل
➢ Lack of helpfulness:
it might not follow the user's instructions.
GPT-3 درMisalignment مشکل
➢ Lack of helpfulness:
it might not follow the user's instructions.
➢ Hallucinations:
model might make up wrong facts.
GPT-3 درMisalignment مشکل
➢ Lack of helpfulness:
it might not follow the user's instructions.
➢ Hallucinations:
model might make up wrong facts.
➢ Lack of interpretability:
it is difficult for humans to understand how the model
arrived at a particular decision or prediction.
GPT-3 درMisalignment مشکل
➢ Lack of helpfulness:
it might not follow the user's instructions.
➢ Hallucinations:
model might make up wrong facts.
➢ Lack of interpretability:
it is difficult for humans to understand how the model
arrived at a particular decision or prediction.
➢ Generating biased or toxic output:
it may reproduce biased/toxic outputs, even if it was
not explicitly instructed to do so.
GPT-3 درMisalignment دلیل مشکل
Reinforcement Learning
From Human Feedback
SFT ایجاد مدل
SFT Model
)Reward ( ایجاد مدل پاداش یا
...
)Reward ( ایجاد مدل پاداش یا
labellers preferences
step 2 - The Reward Model (RM)
کردن مدل با مدل پاداشFine-tune
Step 3 - Fine-tuning the SFT model with Proximal Policy Optimization (PPO)
کردن مدل با مدل پاداشFine-tune
Step 3 - Fine-tuning the SFT model with Proximal Policy Optimization (PPO)
کردن مدل با مدل پاداشFine-tune
Step 3 - Fine-tuning the SFT model with Proximal Policy Optimization (PPO)
ارزیابی مدل
Helpfulness:
judging the model's ability to follow user instructions.
Truthfulness:
judging the model's tendency for hallucinations on closed-domain tasks.
The model is evaluated on the TruthfulQA dataset.
Harmlessness:
appropriateness of the model's output, whether it denigrates a protected
class, or contains derogatory content.
The model is benchmarked on the Real ToxicityPrompts and CrowS-Pairs
datasets.
Plus version
نسخهی غیر رایگان
استفاده در سرچ Bingماکروسافت!
Other Chatbots!
چت بات گوگل
LaMDA
چت بات گوگل
LaMDA
منبع
• https://github.com/f/awesome-chatgpt-prompts#act-as-an-english-translator-and-
improver
• https://www.youtube.com/watch?v=x_bw_IHjCWU