0% found this document useful (0 votes)

325 views26 pages

Nightingale RLHF Code Onboarding WIP

Uploaded by

Liben Hagos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

325 views26 pages

Nightingale RLHF Code Onboarding WIP

Uploaded by

Liben Hagos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Nightingale RLHF

Code
Onboarding
Introduction
● Project assignment requirements
○ Attend at least one onboarding session
○ Engage, and ask questions at the end of presentation
○ Attend at least 2 daily webinars per week

Instructions:
https://docs.google.com/document/u/1/d/e/2PACX-1vRNK1x15w0ZcsqhLCbxjtqSkKYnvPVquGPrTrRgLKEuu18
MkQ_alVtC7q_hVNIDVUM3t9G6Djuljxnw/pub#h.65ikpbq61y4i
Resources

Access
Resources and
webinar link
here
Project Overview & Goal
Your work will help improve a cutting-edge language model to provide more helpful,
accurate, and concise coding responses. Speciﬁcally, we want the generated code to follow
the client instructions precisely.

Given a prompt and two code responses, your task is to:

1. Rate the provided responses on
a. Correctness/Completeness
b. Coherence/Clarity
c. Simple vs Complex Language
d. Succinct vs Verbose Language
e. Helpfulness/Overall

2. Indicate which response is better with a side-by-side (SxS) rating and provide a written
justiﬁcation
How Does Feedback Work?
Training Tasks
● These are tasks that test your understanding of the instructions and Edge Cases without
impacting production
● Failing Training tasks results in EQ, review each task thoroughly
● You cannot receive feedback on these tasks
● Look exactly like normal tasks

Normal Tasks
● You will be given a score from 1-5 and a message indicating areas of improvement
What do Tasks Look Like? (Prompt Evaluation)
● There will be 6 minute tasks used to evaluate
the safety/validity of a prompt
● You do not evaluate the prompt and model in
the same task
What Do The Tasks Look Like?
Prompt is the user’s request. The
prompt might be a conversation with Prompt
multiple turns, but the chatbot is
Response 1
responding to the ﬁnal request.

Correctness/ Completeness

Coherence/Clarity

Coding Response and explanation Simple vs Complex Language

Code Response includes
explanation and most likely code Succinct vs Verbose Language
snippets Helpfulness Overall

Save Changes

Response 2

Correctness/ Completeness
Coherence/Clarity

Coding Response and explanation Simple vs Complex Language

Succinct vs Verbose Language

Helpfulness Overall

Save Changes
What Do The Tasks Look Like?
Which is the better response?
SxS Rating:
You will provide a side-by-side score
Rate your preference between the two responses on a scale from 1 to 6
to specify which model is preferred
based on the previous, generated
1 2 3 4 5 6
responses.

Response 1 is Response 2 is
much better than much better than
Response 2 Response 1

Explain how you chose this final comparison rating. Your justification should have a declaration in the beginning stating why one
response is better than the other (or if they are the same) and an evidence-based reason for your decision that cites information
directly from the prompts and responses.

SxS Justiﬁcation:
You will justify your answer and
prove that the score you have Justification
selected carries weight.

Save and Continue

Workﬂow

Step 1. Step 2. Step 3. Step 4. Step 5.

Read the prompt + conversation history.

● Understand what the user wants, putting yourself in the shoes of the user interacting with the
chatbot.

Note: There may be previous conversations in your task. If so, only focus on the last prompt and model response pair of the task. The
rest should be used as context
Workﬂow

Step 1. Step 2. Step 3. Step 4. Step 5.

Analyze the two Responses separately.

a. Evaluate each of the responses on the 5 dimensions. Make sure to mark the appropriate
checkboxes when a response has issues in a particular dimension.

b. Follow the Dimension Rating Rubric for a breakdown on how to analyze each response.

c. Make sure to fact check responses if necessary.

d. Make sure your helpfulness justification is of sufficient length approx. 50-250 words.
e. Test the code if applicable
Workflow

Step 1. Step 2. Step 3. Step 4. Step 5.

Choose the Response that most correctly satisﬁes the requirements of the prompt.

Which response is better: @Response 1 or @Response 2

Workﬂow

Step 1. Step 2. Step 3. Step 4. Step 5.

Make a Side by Side (SxS) Comparison between the two responses.

a. Choose a score between 1-6 indicating which response is better, and by how much.

a. Consult the SxS Score Guide for a description of what each score means. Ensure that this score
coincides with the response you have selected in step 3.

Response 1 is Response 1 is Response 1 is Response 2 is Response 2 is Response 2 is

much better than better than slightly better slightly better better than much better than
Response 2. Response 2. than Response 2. than Response 1. Response 1. Response 1.

1 2 3 4 5 6
Workﬂow

Step 1. Step 2. Step 3. Step 4. Step 5.

Write a justiﬁcation.

a. Be sure to use good spelling and grammar.

b. Consult the Writing a Good Justiﬁcation section for a guide on how to properly write your
justiﬁcation for choosing a particular response. For this project refer to the model responses as:
@Response 1 and @Response 2.

c. Your SxS justification should typically be shorter than your helpfulness justification
Code Response Analysis Guide - Correctness
Criteria Description Rubric
Correctness/ The intent of Correctness/Completeness is to 5 - The response is completely correct and accurate to what is requested by the prompt with no
provide a response that is factual, accurate, and fully necessary details missing and without false, misleading, or hallucinated information. If the prompt
Completeness
addresses all requirements in the prompt. asks the assistant to do a task, the task is completely done and addressed in the response (within
In addition to the rating, if you provide a score of 1 to the limits of the assistant’s capabilities and intended usage).
4, an "areas for improvement" box will appear, you
MUST check all applicable options:
4 - The response is mostly accurate and correct with a small amount of missing information. It
● Contains incorrect information
contains no misleading information or hallucinations. If the prompt asks the assistant to perform a
● Key information is missing
task, the task is mostly successfully attempted.
● Misses one or more specific prompt
requirement(s)
● Contains unwarranted refusal
● Model Response is outdated as of July 2024[1] 3 - The response contains a mix of correct and incorrect information. The response may miss
some details, contain misleading information, or minor hallucinations, but is more or less aligned
with what the prompt asks for. If the prompt asks the assistant to perform a task, the task is
⚠ When assessing correctness, it’s important to attempted with moderate success but still has clear room for improvement.
populate a list of sources used to validate the
accuracy of the response in the provided open text
field. Sources must be URLs of publicly accessible, 2 - The response has some correct elements but is mostly wrong or incomplete. The
reliable, human-generated web pages or response may contain multiple instances of hallucinated, false and/or misleading information. If the
documents. For each source, you have to copy and prompt asks the assistant to do a task, the task was attempted with a small amount of success.
paste an excerpt (in a manner that is as concise as
possible) from the source regarding the specific
information that was used in the source to verify 1 - The response is completely incorrect. All information provided is wrong, false or hallucinated.
accuracy.[2] If the prompt asks the assistant to do a task, the task is not at all attempted for no good reason, or
the wrong task was attempted in the response. The response is completely irrelevant to the prompt.
Code Response Analysis Guide - Clarity
Criteria Description Rubric
Coherence/ With this attribute we measure how lucid, 5 (Perfectly Coherent and Clear) - The response is perfectly clear and self-consistent throughout.
cogent, and self-consistent the model’s There are no contradictory assertions or statements, the writing flows logically and following the
Clarity
response is. The Coherence/Clarity rating of train of thought/story is not challenging.
the response should account for previous user
and assistant turns in the conversation (so as
to spot potential contradictions, repetitions, 4 (Mostly Coherent and Clear) - The response is mostly clear and coherent, but there may be one
or two places where the wording is confusing, the flow of the response is a little hard to follow, or
unwarranted style changes, etc.).
with a small amount of repetitions / irrelevant content. Overall, the response can mostly be followed
In addition to the rating, if you provide a score with a little room for improvement.
of 1 to 4, an "areas for improvement" box will
appear, you MUST check all applicable
options: 3 (A Little Unclear and/or Incoherent) - The response is a little unclear. There are some
inconsistencies or contradictions, run-on sentences, confusing statements, blatant repetitions,
Contains irrelevant information significant amounts of irrelevant content, or hard to follow sections of the response.
Contains repetitions
Contains contradiction(s)
Contains awkward
2 (Mostly Incoherent and/or Unclear) - The response is mostly hard to follow, with inconsistencies,
[3]
phrasing/formatting issues contradictions, confusing logic flow, unclear language, constant repetitions or mostly irrelevant
Contains style changes content used throughout, but there are still some coherent/clear parts.
Should have addressed a false
premise, mistake, or ambiguity in
[4] 1 (Completely Incoherent and/or Unclear) - The response is completely incomprehensible or
the prompt
irrelevant and no clear meaning or sensible message can be discerned from it.
Code Response Analysis Guide - Language
Criteria Description Rubric
Simple vs. Complex Rating of the response along a simple → 5 (Expert) - Deep expertise in the field or area (typically associated with post-graduate education) is
required to understand the response. It uses specific and technically relevant vocabulary, or elevated
Language complex spectrum: the response uses
language that someone at the simple or basic level may not understand at all. The professional
simple, easy to understand vocabulary and language of a lawyer, scientist, engineer, or doctor falls into this category.
sentence structure that children can
understand, vs. the model uses
4 (Advanced) - The response uses a fairly sophisticated vocabulary and terminology. Someone
sophisticated language with elevated majoring in this subject at a university (post-18 education) would understand the response, while an
vocabulary that adults with advanced average adult who does not work or study in this area would not.
education or experts on the topic would use.
⚠ Make sure the rating aligns with the
rubric. A 5 is not necessarily “better” for 3 (Intermediate) - People who have completed up through a high school education (up to age 18)
this metric. ⚠ will probably be able to understand the vocabulary and sentence structure used, but those at the
basic level or children might struggle to understand the response.

2 (Simple) - The response uses relatively straightforward language and wording, but some schooling
through elementary (age 7 to 12) or a middle school (age 13 - 15) in the language might be required
to understand the response.

1 (Basic) - The response uses very easy to understand language that is clear and completely
interpretable by children under 6, adults, and anyone with a functional command of the language.
Code Response Analysis Guide - Language
Criteria Description Rubric
Succinct vs. The goal here is to place the response on a 5 (Verbose) - The response is particularly lengthy, wordy, and/or extensive with extra details
spectrum from the most short, crisp given what the prompt requested from the assistant model. The response can be verbose
Verbose Language
regardless of if the length is due to repetition and incoherency or if it is due to rich and
answers, to the most lengthy, detailed,
insightful detail.
and/or wordy answers, under the context of
the length expectations set by the prompt.
4 (Moderately Long) - The response is on the longer side but could still have more added to it
For example, if the prompt asks the model a before it is considered fully detailed or rambling.
yes or no question and the model simply
responds “yes” the answer is succinct. But if
the model responds “yes”, restates the
question worded as an answer, and explains 3 (Intermediate Length) - The response isn’t especially long or short given what the prompt is
why it gave that answer, the answer is asking of the model. The length is adequate for conveying a full response but isn’t particularly
verbose. wordy nor particularly concise.
Even if two responses have exactly the
same length, one can be rated as verbose
and the other as succinct depending on the 2 (Pretty Short) - The response is on the shorter side but could still have words, details,
prompting context. and/or text removed before it’s at a bare minimum of what the response is trying to convey.
This verbosity rating evaluates the response
as a whole.
⚠ Make sure the rating aligns with the
1 (Succinct) - The response is short, to the point, and the most concise it can be. No
rubric. A 5 is not “better” for this metric additional information is provided outside of what is requested by the prompt (regardless of if
the information or response itself is incorrect, hallucinated, or misleading: response that gives
an incorrect answer can still be succinct).
Code Response Analysis Guide - Helpfulness
Criteria Description Rubric
Helpfulness/ Overall quality rating summarizing how 5 - The response is perfectly helpful and completely aligned with the spirit of what the prompt was
asking for. It acts on the user’s request accurately, and to the point - without any unnecessary
Overall useful and helpful the response is. information. If a user request is not possible/inline with desired model behavior, a helpful response
⚠For the Helpfulness/Overall rating, provides useful context and rationale even if they do not act on user request directly.
you must provide an explanation
(50-250 words) of why you selected 4 - The response is mostly helpful and mainly aligned with what the user was looking for, but
this rating. Be as detailed as possible, there is still some room for improvement.
within the length bounds. Do not make
references to the other response in this
explanation. ⚠
3 - The response is partially helpful but misses the overall goal of the user's query/input in some
way. The response did not fully satisfy what the user was looking for.

2 - The response is slightly helpful and mostly does not capture what the user was looking for, but
it is still usable and helpful in a small way.

1 - The response is not helpful. The response completely missed the essence of what the user
wanted.
Fact Checking
● Do not use Forums or blogs as sources (Quora, Stack Overﬂow, Wikipedia)

● Make sure to fact check any major factual claim made by the model that will not be
veriﬁed testing the code

● Each source should follow the format:

○ Source: url
○ Excerpt: “Excerpt from the source that pertains to the fact being veriﬁed”
Fact Checking Example
Response: Unfortunately, Python 2.7 was sunset in 2020. I would not recommend
making a new project in Python 2.

Good Example
Source: https://www.python.org/doc/sunset-python-2/
Excerpt: “Python 2 was sunset Jan 1, 2020”

(Why is this good? This has both the source link, and the excerpt)

Poor Example
Source: https://stackoverﬂow.com/questions/4836375/end-of-support-for-python-2-7

(Why is this Poor? This did not include the excerpt, and used a source crowd based versus official documentation)
Fact Checking Example
Helpfulness Justification
● Provide a detailed explanation of why you gave
the response that rating on helpfulness
● Give examples to justify your point if applicable
● Your helpfulness justification should not
reference the other response
● Helpfulness justifications must be 50-250
words.
● Helpfulness justification must begin with “The
response is
{not/slightly/partially/mostly/perfectly} helpful”
based on score from 1 to 5. Please don’t use
synonyms. This should be a sentence (end with
a full stop).
● Your justification should not be in first person or
mention the number score you gave
Preference Justification
● Preference justifications should ideally be no
more than 50 words
● The first sentence of preference reasoning
should start with “@Response {1/2} is
{slightly/<blank>/much} better than @Response
{1/2}” based on the scores.
● Response Nomenclature: Please include the
“@” symbol and 1 or 2 when referring to
responses. This will help identify and
distinguish which response you are referring to.
● Do not refer to responses as A/B
● If both responses have ratings of 1 or 2, select
neither response is valid and open your
justification with “@Response 1 is as unhelpful
as @Response 2”
Selecting The Better Response
Criteria Rubric
Slightly Better ● To be used when the responses are similarly appropriate and the difference is minor or a
Response 1 is slightly better matter of personal preference.
than Response 2 – Score: 3 ● The difference in Helpfulness/Overall between responses should be at most 1
● Minor differences in clarity and formatting warrant this response.
OR
● When you consider the responses to be tied, you should slightly prefer the shorter one (in
Response 2 is slightly better unlikely circumstances of same length - use your own judgment)
than Response 1 – Score: 4

Better ● To be used when one response is clearly better than the other but not by a very large
Response 1 is better than margin (a difference in Helpfulness/Overall of 1 or 2 points)
Response 2 – Score: 2 ● If the better response fails to follow some but not all instructions or is misleading but the
worse response does not follow instructions at all or is completely wrong, this should be
OR
selected.
Response 2 is better than ● If both answers follow instructions and are correct, but one is significantly clearer and/or
Response 1 – Score: 5 better formatted, this should be selected.

Much Better ● To be used when there is a significant difference between the two responses (a difference
Response 1 is much better in Helpfulness/Overall of at least 2 points)
than Response 2 – Score: 1 ● If one answer is entirely correct and the other contains a major mistake, this should be
selected.
OR
● If one answer follows all instructions and the other does not, this should be selected.
Response 2 is much better ● If there are major differences in readability and formatting, this should be selected.
than Response 1 – Score: 6
Common Errors
● Code does not compile

● Fact Checking not complete, used blog/crowd sourced answer

● SxS scoring doesn't match justiﬁcation

● Dimension scoring doesn’t match completeness reasoning

● Responses are referred to something other than: @Response 1/@Response 2.

The customer uploaded Response A/B but wants us to refer to them as
above.
THANK YOU!

(Turing) Guidelines For Technical Writing Assessment (April 2024)
100% (1)
(Turing) Guidelines For Technical Writing Assessment (April 2024)
8 pages
LLM SFT Data Guideline v2.0
No ratings yet
LLM SFT Data Guideline v2.0
13 pages
Omni Project Summarized Rules
No ratings yet
Omni Project Summarized Rules
7 pages
Jellyfish Instruction Hierarchy Safety - Instructions
No ratings yet
Jellyfish Instruction Hierarchy Safety - Instructions
184 pages
(Internal) I18n Code Evals Instructions
No ratings yet
(Internal) I18n Code Evals Instructions
18 pages
Workspace Projects 1
No ratings yet
Workspace Projects 1
16 pages
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
No ratings yet
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
5 pages
Instructions
No ratings yet
Instructions
19 pages
Welcome To The AI Companion Certification Course
No ratings yet
Welcome To The AI Companion Certification Course
17 pages
(Turing) Guidelines For Python Puzzles
No ratings yet
(Turing) Guidelines For Python Puzzles
8 pages
Align
No ratings yet
Align
5 pages
Core Evals English Instructions
No ratings yet
Core Evals English Instructions
24 pages
Prompting Techniquesand Prompt Engineering AComprehensive Guide 02
No ratings yet
Prompting Techniquesand Prompt Engineering AComprehensive Guide 02
74 pages
Prompt Engineering - OpenAI API
No ratings yet
Prompt Engineering - OpenAI API
21 pages
Project Instructions
No ratings yet
Project Instructions
12 pages
Ghar Ki Baat Ghar Me Hi Rehne Do - Part 1 - Desi Kahani
50% (2)
Ghar Ki Baat Ghar Me Hi Rehne Do - Part 1 - Desi Kahani
6 pages
Project Omni - Conversation With AI - Labeling Instructions - Google Docs 4
No ratings yet
Project Omni - Conversation With AI - Labeling Instructions - Google Docs 4
2 pages
Mail Valley Assessment BMS Updated
No ratings yet
Mail Valley Assessment BMS Updated
5 pages
Ginger LLM Training Material
No ratings yet
Ginger LLM Training Material
39 pages
Isle of The Unknown
100% (3)
Isle of The Unknown
134 pages
Guide To RAG System Evaluation Metrics
No ratings yet
Guide To RAG System Evaluation Metrics
21 pages
Edu Given Prompt Evals - Attempter Instruction
No ratings yet
Edu Given Prompt Evals - Attempter Instruction
24 pages
Nightgown Standoff
No ratings yet
Nightgown Standoff
7 pages
(2024-10-20) Copy of Gratitude Corsage - SXS Eval - Instructions
No ratings yet
(2024-10-20) Copy of Gratitude Corsage - SXS Eval - Instructions
26 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
16 pages
Use Case Labeling Rater Guidelines
No ratings yet
Use Case Labeling Rater Guidelines
25 pages
Rubrics Instructions
No ratings yet
Rubrics Instructions
22 pages
Prompt - QA - Guidelines For Multilingual Prompts - PII
No ratings yet
Prompt - QA - Guidelines For Multilingual Prompts - PII
28 pages
Reviewer Checklist
No ratings yet
Reviewer Checklist
15 pages
Interactive Preference Collection Guidelines - Creationv9
No ratings yet
Interactive Preference Collection Guidelines - Creationv9
42 pages
Determine A Thread Group's Ramp-Up Period
100% (2)
Determine A Thread Group's Ramp-Up Period
7 pages
Green Wizard
100% (1)
Green Wizard
18 pages
LLM+技巧总结+ +Prompt+Engineering指南
No ratings yet
LLM+技巧总结+ +Prompt+Engineering指南
25 pages
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
100% (1)
CU6051NA - Artificial Intelligence 20% Individual Coursework 2019-20 Autumn
20 pages
EASF Livechat Chatbot Multitenant Suhas-09Oct2020
No ratings yet
EASF Livechat Chatbot Multitenant Suhas-09Oct2020
59 pages
Sharing - AMR Human Annotation Guideline - 20240828
No ratings yet
Sharing - AMR Human Annotation Guideline - 20240828
14 pages
Code V Code Official Instructions
No ratings yet
Code V Code Official Instructions
43 pages
Problem Statement:: Project Title Technologies Domain Project Difficulties Level
No ratings yet
Problem Statement:: Project Title Technologies Domain Project Difficulties Level
4 pages
(Updated) Green Wizards Attempter Specifications
No ratings yet
(Updated) Green Wizards Attempter Specifications
9 pages
Prompt Engineering
0% (1)
Prompt Engineering
2 pages
Chat GTP Prompt Guides
No ratings yet
Chat GTP Prompt Guides
1 page
Mail Valley v2 - Do's and Don'ts
No ratings yet
Mail Valley v2 - Do's and Don'ts
3 pages
SQE Assignment 1
No ratings yet
SQE Assignment 1
8 pages
Ai For Developers
No ratings yet
Ai For Developers
10 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
19 pages
Instruction For Retail (AGI)
No ratings yet
Instruction For Retail (AGI)
18 pages
Best Practices For Prompt Engineering With The OpenAI
No ratings yet
Best Practices For Prompt Engineering With The OpenAI
6 pages
I18n Evals - LLM Prompt and Response Evaluation - Guideline - v.1
No ratings yet
I18n Evals - LLM Prompt and Response Evaluation - Guideline - v.1
15 pages
Amf Concept
No ratings yet
Amf Concept
14 pages
Rubrics Cheat-Sheet
No ratings yet
Rubrics Cheat-Sheet
7 pages
U1 NLP App Solved
No ratings yet
U1 NLP App Solved
26 pages
Semantic Search On Stack Overflow
No ratings yet
Semantic Search On Stack Overflow
3 pages
Boarding Pass 5 Promo
0% (1)
Boarding Pass 5 Promo
28 pages
Engagement - Bait - Post - Guidelines - Approved 11.05.18 - v2
No ratings yet
Engagement - Bait - Post - Guidelines - Approved 11.05.18 - v2
11 pages
Maps Search Evaluation March 2018
No ratings yet
Maps Search Evaluation March 2018
210 pages
Touchet - Relevance Measurement Labeling Guidelines - 4.4
No ratings yet
Touchet - Relevance Measurement Labeling Guidelines - 4.4
6 pages
Prompt Engineering Notes
No ratings yet
Prompt Engineering Notes
2 pages
Functional Testing E-Guide
No ratings yet
Functional Testing E-Guide
17 pages
Prompt Engineering
No ratings yet
Prompt Engineering
1 page
Mergers and Acquisitions
No ratings yet
Mergers and Acquisitions
123 pages
Final GR 7 Tech Term 3 Task 5 Memo
No ratings yet
Final GR 7 Tech Term 3 Task 5 Memo
4 pages
Micro-Framework: Presented By-Khirod Kumar Behera
No ratings yet
Micro-Framework: Presented By-Khirod Kumar Behera
10 pages
Religion Evaluation PDF
No ratings yet
Religion Evaluation PDF
4 pages
The Assessment Consists of 26 Questions
0% (1)
The Assessment Consists of 26 Questions
1 page
A Pragmatic Evaluation of Stress and Performance Testing Technologies For Web Based Applications
No ratings yet
A Pragmatic Evaluation of Stress and Performance Testing Technologies For Web Based Applications
5 pages
Shree Cement Reprot 21-22
No ratings yet
Shree Cement Reprot 21-22
292 pages
Occlusion and Periodontal Health
No ratings yet
Occlusion and Periodontal Health
8 pages
Love Text-1
No ratings yet
Love Text-1
7 pages
Throne of Secrets Kerri Maniscalco Instant Download
100% (2)
Throne of Secrets Kerri Maniscalco Instant Download
41 pages
Duraco Septic Tank
100% (1)
Duraco Septic Tank
6 pages
Lesson 5 - Site Layout and Design-1
No ratings yet
Lesson 5 - Site Layout and Design-1
7 pages
Acct Statement - XX6157 - 29012025
No ratings yet
Acct Statement - XX6157 - 29012025
40 pages
Tube Stube Settlers
No ratings yet
Tube Stube Settlers
9 pages
PAHS 055 Session 4 Disaster Management - 1
No ratings yet
PAHS 055 Session 4 Disaster Management - 1
27 pages
CERTIFICATE TRAVELLED Autosaved
No ratings yet
CERTIFICATE TRAVELLED Autosaved
6 pages
Frida Kahlo: By: Maria Jose Castillo, Camila Amaya, Danna Valencia
No ratings yet
Frida Kahlo: By: Maria Jose Castillo, Camila Amaya, Danna Valencia
9 pages
Hong Kong History
No ratings yet
Hong Kong History
5 pages
Homework-3 Cap405: Computer Graphics
No ratings yet
Homework-3 Cap405: Computer Graphics
9 pages
AGAPE
No ratings yet
AGAPE
13 pages
Compilation - Stamp Duty - Lease Deed
No ratings yet
Compilation - Stamp Duty - Lease Deed
7 pages
March June 2022
No ratings yet
March June 2022
24 pages
PDF p2 Guerrero Ch15 Compress
No ratings yet
PDF p2 Guerrero Ch15 Compress
27 pages
Motion To Disqualify Allen Baddour
No ratings yet
Motion To Disqualify Allen Baddour
12 pages
Ajp12. Minu
No ratings yet
Ajp12. Minu
9 pages
Summer Vacation Assignment of Class Xii (2023-2024)
No ratings yet
Summer Vacation Assignment of Class Xii (2023-2024)
3 pages
Question of Fact Speech & PowerPoint
No ratings yet
Question of Fact Speech & PowerPoint
2 pages
Depiction: (NƏT̪ Ə Ra DƷƏ) Tamil Malayalam Telugu Kannada Shiva Brahma
No ratings yet
Depiction: (NƏT̪ Ə Ra DƷƏ) Tamil Malayalam Telugu Kannada Shiva Brahma
4 pages
Submission Week6
No ratings yet
Submission Week6
7 pages
Practical Research Proposal
No ratings yet
Practical Research Proposal
33 pages
How To Setup A Kali Linux Hacking Station On Raspberry Pi 3 Model B+
No ratings yet
How To Setup A Kali Linux Hacking Station On Raspberry Pi 3 Model B+
11 pages

Nightingale RLHF Code Onboarding WIP

Uploaded by

Nightingale RLHF Code Onboarding WIP

Uploaded by

Nightingale RLHF

Given a prompt and two code responses, your task is to:

Coding Response and explanation Simple vs Complex Language

Coding Response and explanation Simple vs Complex Language

Succinct vs Verbose Language

Save and Continue

Step 1. Step 2. Step 3. Step 4. Step 5.

Read the prompt + conversation history.

Step 1. Step 2. Step 3. Step 4. Step 5.

Analyze the two Responses separately.

c. Make sure to fact check responses if necessary.

Step 1. Step 2. Step 3. Step 4. Step 5.

Which response is better: @Response 1 or @Response 2

Step 1. Step 2. Step 3. Step 4. Step 5.

Make a Side by Side (SxS) Comparison between the two responses.

Response 1 is Response 1 is Response 1 is Response 2 is Response 2 is Response 2 is

Step 1. Step 2. Step 3. Step 4. Step 5.

a. Be sure to use good spelling and grammar.

● Each source should follow the format:

● Fact Checking not complete, used blog/crowd sourced answer

● SxS scoring doesn't match justiﬁcation

● Dimension scoring doesn’t match completeness reasoning

● Responses are referred to something other than: @Response 1/@Response 2.

You might also like