0% found this document useful (0 votes)
37 views46 pages

ChatGPT Pamphlet

Uploaded by

posas47691
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views46 pages

ChatGPT Pamphlet

Uploaded by

posas47691
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

What is ChatGPT and how does it work?

A Quick Review
Alireza Akhavanpour
CLASS.VISION
‫معرفی و نگاه کلی‬

• What is ChatGPT :

• Is a chatbot
It uses Deep learning to generate human-like text
• Act as a personal teacher
It knows almost everything!
It can be an alternative for Google!
• Developed by OpenAI
OpenAI is an AI research organization founded in 2015
goal: promoting and developing friendly AI
• What is ChatGPT :

• Is a chatbot
It uses Deep learning to generate human-like text
• Act as a personal teacher
It knows almost everything!
It can be an alternative for Google!
• Developed by OpenAI
OpenAI is an AI research organization founded in 2015
goal: promoting and developing friendly AI
‫ثبت نام‬

➢ Do not forget sanctions ☺


➢ ChatGPT Login Page:
https://chat.openai.com/auth/login
➢ Use Virtual Number!
https://numberland.ir?ref=204551
‫موارد کاربرد‬

Possible use cases:


➢ Movie Recommendation
➢ Translation
➢ Writing a letter
➢ Rewrite a text (Summarize , More scientific or
Simpler)
➢ Programming and also code conversion
➢ Learn sth such as English
➢ SEO
➢ Consult!
➢…
https://www.linkedin.com/posts/andrewyng_i-wish-schools-could-make-homework-so-joyful-
activity-7029549078053064704-uZPT
How ChatGPT actually works
https://class.vision/deeplearning-learning-path/
‫بازخورد از انسان‬

Human Feedback

GPT-3 ChatGPT
GPT-3 ‫ در‬Misalignment ‫مشکل‬

➢ Lack of helpfulness:
it might not follow the user's instructions.
GPT-3 ‫ در‬Misalignment ‫مشکل‬

➢ Lack of helpfulness:
it might not follow the user's instructions.
➢ Hallucinations:
model might make up wrong facts.
GPT-3 ‫ در‬Misalignment ‫مشکل‬

➢ Lack of helpfulness:
it might not follow the user's instructions.
➢ Hallucinations:
model might make up wrong facts.
➢ Lack of interpretability:
it is difficult for humans to understand how the model
arrived at a particular decision or prediction.
GPT-3 ‫ در‬Misalignment ‫مشکل‬

➢ Lack of helpfulness:
it might not follow the user's instructions.
➢ Hallucinations:
model might make up wrong facts.
➢ Lack of interpretability:
it is difficult for humans to understand how the model
arrived at a particular decision or prediction.
➢ Generating biased or toxic output:
it may reproduce biased/toxic outputs, even if it was
not explicitly instructed to do so.
GPT-3 ‫ در‬Misalignment ‫دلیل مشکل‬

Next token prediction


Predicting the next word in a ------

Masked language modeling


Predicting the ------- that should go in the gap in a sentence.
GPT-3 ‫ در‬Misalignment ‫دلیل مشکل‬
GPT-3 ‫ در‬Misalignment ‫دلیل مشکل‬

The Roman Empire [MASK] with the reign of Emperor Augustus.


GPT-3 ‫ در‬Misalignment ‫دلیل مشکل‬

The Roman Empire began with the reign of Emperor Augustus.

The Roman Empire ended with the reign of Emperor Augustus.


Misalignment ‫ برای مشکل‬OpenAI ‫راه حل‬

Reinforcement Learning
From Human Feedback
SFT ‫ایجاد مدل‬

Step 1 - The Supervised Fine-Tuning (SFT) model


SFT ‫ایجاد مدل‬

Step 1 - The Supervised Fine-Tuning (SFT) model


SFT ‫ایجاد مدل‬

Step 1 - The Supervised Fine-Tuning (SFT) model

SFT Model
)Reward ( ‫ایجاد مدل پاداش یا‬

step 2 - The Reward Model (RM)


)Reward ( ‫ایجاد مدل پاداش یا‬

step 2 - The Reward Model (RM)


)Reward ( ‫ایجاد مدل پاداش یا‬

step 2 - The Reward Model (RM)


)Reward ( ‫ایجاد مدل پاداش یا‬

step 2 - The Reward Model (RM)


)Reward ( ‫ایجاد مدل پاداش یا‬

step 2 - The Reward Model (RM)

...
)Reward ( ‫ایجاد مدل پاداش یا‬

step 2 - The Reward Model (RM)


)Reward ( ‫ایجاد مدل پاداش یا‬

step 2 - The Reward Model (RM)


)Reward ( ‫ایجاد مدل پاداش یا‬

labellers preferences
step 2 - The Reward Model (RM)
‫ کردن مدل با مدل پاداش‬Fine-tune

Step 3 - Fine-tuning the SFT model with Proximal Policy Optimization (PPO)
‫ کردن مدل با مدل پاداش‬Fine-tune

Step 3 - Fine-tuning the SFT model with Proximal Policy Optimization (PPO)
‫ کردن مدل با مدل پاداش‬Fine-tune

Step 3 - Fine-tuning the SFT model with Proximal Policy Optimization (PPO)
‫ارزیابی مدل‬

Helpfulness:
judging the model's ability to follow user instructions.

Truthfulness:
judging the model's tendency for hallucinations on closed-domain tasks.
The model is evaluated on the TruthfulQA dataset.

Harmlessness:
appropriateness of the model's output, whether it denigrates a protected
class, or contains derogatory content.
The model is benchmarked on the Real ToxicityPrompts and CrowS-Pairs
datasets.
Plus version
‫نسخهی غیر رایگان‬
‫استفاده در سرچ ‪ Bing‬ماکروسافت!‬
Other Chatbots!
‫چت بات گوگل‬

‫‪LaMDA‬‬
‫چت بات گوگل‬

‫‪LaMDA‬‬
‫منبع‬

• https://github.com/f/awesome-chatgpt-prompts#act-as-an-english-translator-and-
improver
• https://www.youtube.com/watch?v=x_bw_IHjCWU

You might also like