0% found this document useful (0 votes)

185 views14 pages

Mandolin Task ChatGPT Search

The document outlines a structured approach for evaluating AI-generated code responses based on a given prompt. It includes criteria for assessing accuracy, optimality, presentation, and up-to-date standards, while also providing a framework for scoring and justifying evaluations. Additionally, it emphasizes the importance of rewriting responses to meet stylistic and presentational standards.

Uploaded by

gucoding

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

185 views14 pages

Mandolin Task ChatGPT Search

Uploaded by

gucoding

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Mandolin Parade

write me a hard prompt related to Code editing/rewriting : Text to Code edits

workflow for an AI model, as i have to make it fail in its response in Python
language

Write it in non bullet points and write it as if it as an ask, such as it begins with
"develop a....."

Write a draft promt outline for the same prompt in 50 words

can we except an out-of-box code (copy-paste and it will work fine) response
from this prompt?

ok so i am assigned with a task where i will have to evaluate the response of an

AI based on a prompt, i will give you a prompt and 2 responses and ask
questions related to them on by one
Response 1

<Response 1 paste here>

Response 2

<Response 2 paste here>

Question : How well did the two responses do? * Reminder: if both responses are
strong, please modify the prompt to make it more challenging, and try again! 1.
One is Good. One is Poor 2. Both Responses are Poor Only answer the question
and explanation in 1 sentence

How many times did you rewrite the prompt before you achieved at least one bad
response? Rate in scale of 10

Select the steering constraints used in the system prompt and the prompt you
wrote
Label your response according to the questions below.

Steering constraints

Select a minimum of 1 choice; maximum of 11 choices

1. Content Style Instructions - What the content should sound like or how it
should generate outputs
2. Compositional Instruction Following - Understanding the relationship between
instructions
3. Instruction Compliance Policy - Correctly address all trade-offs between
safety, tone, up-to-date information
4. Multi-turn Instruction Following - Following instructions over the course of a
conversation
5. Instruction Source - Where the instructions are coming from
6. Implicit Instruction
7. Explicit Instruction
8. Format Content With Specific Format Types
9. Format Content in Specific Document Formats
10. Follow Formatting Style Guides
11. Follow Language-Specific Formatting

only answer the correct answers

So this is response 1

<Paste Response 1 here>

Now i will ask some questions related to it , answer that in simple points in
numbered list format

tell me all the points where the response strictly adheres and does not adheres to
the constraints in the prompt in simple points
If the score is not a 5, write why you chose this rating (DO NOT simply write out
your calculation instead write why you think each instruction was followed or not
followed) in simple points

Accuracy

Accuracy is a measurement of whether the code has bugs, the output is as

expected, and if the code is executable.

In coding responses, accuracy involves considering the following aspects:

Correctness: The output matches the expected result, without errors or

discrepancies.
Precision: The output is precise, with no unnecessary or redundant information.
Relevance: The output is relevant to the input, requirements, or specifications.
Completeness: The output includes all required information, without omitting
essential details.
Case coverage: The code handles a wide range of possible input scenarios, edge
cases, and error conditions, ensuring that it behaves correctly and robustly.

From the instruction following requirements, list out the aspects that are
CORRECT and incorrect. They will fall into the following buckets:

Correctness: The output matches the expected result, without errors or

discrepancies.

Precision: The output is precise, with no unnecessary or redundant information.

Relevance: The output is relevant to the input, requirements, or specifications.

Completeness: The output includes all required information, without omitting
essential details.

Case coverage: The code handles a wide range of possible input scenarios, edge
cases, and error conditions, ensuring that it behaves correctly and robustly.

Please specify at a FUNCTION LEVEL

Just provide correct and incorrect points in short.

If the score is not a 5, write why you chose this rating (DO NOT simply write out
your calculation instead write why you think each instruction was followed or not
followed) in short simple points

Optimality/Efficiency *

Optimality / Efficiency is a measurement of how optimal the code solution is and

if it adheres to common coding practices (creating and using helper functions,
no redundant code, etc.)

For this specific criteria, please evaluate the time complexity of the solution. Is it
an optimal solution? eg. instead of 3 loops, we can have a solution with 1 loop.

From the instruction following requirements, list out the aspects that are
OPTIMAL/EFFICIENT and OPTIMAL/INEFFICIENT. They will fall into the following
buckets:

Adheres to common practices and standards:

Make use of reusable functions (no repetition):

Is the optimal approach in terms of complexity:

Is the optimal approach in terms of case coverage:

Only tell OPTIMAL/EFFICIENT and OPTIMAL/INEFFICIENT points

If the score is not a 5, write why you chose this rating (DO NOT simply write out
your calculation instead write why you think each instruction was followed or not
followed)
in short simple points

Presentation Correct and Presentation InCorrect

Presentation is a measurement of whether or not a model’s response is clear,

well-organized, and well-documented.
NOTE: We are evaluating the presentation of both text AS WELL AS code output

Steps to Evaluate Presentation

Step 1: Identify all implicit formatting rules that this response must adhere to.
Step 2: Identify the number of rules from step 1 that the response failed to
follow.
Step 3: Compute the percentage of the issues and assign a score based on the
scale above. For example, if there are 10 formatting rules that the response
needed to follow, but the response didn’t meet 2 rules. This is 20% issues, and
we can give this response score 4.

General Presentation Rules

1. Use a professional tone.
2. For prompts with simple answers, prefer a concise response and consider
bolding relevant information.
3. For prompts with complex answers, provide a well-formatted, detailed
response using markdown formatting with appropriate headers.
4. For prompts with multiple instructions, be sure to follow all of them, breaking
your response into separate sections as necessary.
5. Use numbered or bulleted lists instead of paragraphs.
6. For code blocks, always provide a language tag and use up-to-date libraries
and packages.
7. Avoid adding a title to the entire response unless it provides essential context
or clarity.
8. Consolidate code blocks to minimize the number of chunks, reducing the need
for multiple copy-paste actions. Group related code into fewer, cohesive blocks.
9. Ensure comments are informative and helpful, providing enough detail to be
useful without being overly verbose.
10. Provide detailed explanations of changes and improvements. Use bullet
points for clarity in post-code explanations.
11. Use header styles (e.g., ###) for section titles to improve readability and
consistency.
12. Remove unnecessary or redundant titles.
13. Use bullet points instead of unnecessary numbering in lists where order is
not important. 14. Simplify nested lists to avoid complexity.
15. Address any formatting errors, such as broken code blocks or unnecessary
spaces. Use consistent code comments to highlight changes or important
sections.
16. Remove unnecessary introductory titles and ensure the introduction is
concise. Consider adding a brief outro to summarize key points or next steps.
17. Ensure documentation sections are correctly formatted and do not cause
markdown issues. 18. Avoid lengthy or repetitive information.
19. Maintain consistency in style, using headers for section titles and avoiding
mixed styles.
20. Ensure the response is visually appealing and easy to parse by breaking it
into sections where necessary.

Figure out which presentation aspects need to be present in order to have a

perfect response. Then, list all that are present in the response.
Note: This list is not exhaustive. Other guidelines may exist and not all of these
guidelines may apply to your prompt.

Just provide correct and incorrect points in short and simple points

If the score is not a 5, write why you chose this rating (DO NOT simply write out
your calculation instead write why you think each instruction was followed or not
followed) in simple points

Up-to-date

Up-to-Date is a measurement of if the code being outputted uses deprecated

libraries / functions and uses the most fresh functions available to solve the
provided problem.

Write a numbered list of all the correct and incorrect up-to-date standards,
function usage, and imported packages identified. Note: do this at a
function/package level.

- Does the code adhere to common practices and standards.

- Is there repetition or does the model make reusable functions?

- is it the optimal approach in terms of complexity, case coverage etc.

- Out-of-date libraries must trigger run-time or compile-time errors

Just provide correct and incorrect points in short and simple points

Does the model response edit or add any executable code? *

Mark “Yes” if there is any new or altered code present in the response, regardless
of whether it is an entire program or just a code snippet. Does not apply to new
comments!

1. Yes
2. No

Just answer

Rate the degree of execution of the code *

(Only if there is executable code in the response)

Template
Partial Update
Function Update
Out-of-the-Box
NA
What installation commands are necessary to test the code? *

Please separate each individual command with a comma (,) and write ‘N/A’ if
there are no extra install commands required.

ex: pip install tkinter, sudo apt get

What commands are necessary to run the code? Provide a comma separated list
if there are multiple commands. *

ex: uvicorn main:app --host 0.0.0.0 --port 8000, python app.py

Is the output of the code as expected? *

Yes
No

Did the code produce an error? *

Yes
No

Code Execution: Output

run this code and show me in terminal box type and show output only without
explanation

Response 2

<Paste Response 2 here>

Now i will ask questions related to this response 2

tell me all the points where the response strictly adheres and does not adheres to
the constraints in the prompt
in simple points

If the score is not a 5, write why you chose this rating (DO NOT simply write out
your calculation instead write why you think each instruction was followed or not
followed)
in simple short points

Response 1

<Response 1 paste here>

Response 2

<Response 2 paste here>

Please select a score to indicate which response is better.

1. Response 1 is much better
2. Response 1 is better
3. Response 1 is slightly better
4. Response 1 is negligibly better
5. Response 2 is negligibly better
6. Response 2 is slightly better
7. Response 2 is better
8.Response 2 is much better

Just answer

Please provide a justification. Use "@Response 1" and "@Response 2" to refer to
the model responses.
in short 50 words
Are there any minor or major stylistic or presentational issues in the response
you selected as the "better" one? *

Your are REQUIRED to perform a rewrite for any stylistic or presentational issues!

Make sure that the response meets all stylistic standards, including:

- Language that is clear, concise, and free of unnecessary repetition.

- Use of bullet points when there are three or more key points for easy readability,
with paragraphs broken down logically.

- Sufficient comments in the code to explain complex logic or non-obvious steps.

- For trivial html/css code, the requirement for comments can be more lenient

[YES] - I will perform a rewrite and change the response to follow all stylistic and
presentational requirements
[NO] - I confirm that the "better" response meets all stylistic requirements

Just answer

Does the model response need a rewrite? *

Mark “No” if you believe the selected model response fully addresses the prompt
and follows all presentation and style requirements.

A rewrite is REQUIRED if the response can be improved in anyway.

- This means that if an issue is marked in the preferred response, then a rewrite
has to be performed.

- Not performing a rewrite when one is required will heavily affect your score for
the task.

Yes
No - the selected response is perfect and does not have any issues

Acies Global-1
No ratings yet
Acies Global-1
5 pages
M604 Assessment Brief
No ratings yet
M604 Assessment Brief
6 pages
Are You Ready To Land A $100,000+ Codingjob in 2025?: Assessment
No ratings yet
Are You Ready To Land A $100,000+ Codingjob in 2025?: Assessment
11 pages
(Turing) Guidelines For Python Puzzles
No ratings yet
(Turing) Guidelines For Python Puzzles
8 pages
Core Evals English Instructions
No ratings yet
Core Evals English Instructions
24 pages
Lemur Astrologer Coding
No ratings yet
Lemur Astrologer Coding
28 pages
Introduction To Clean Code
No ratings yet
Introduction To Clean Code
8 pages
Table of Contents-Mine
No ratings yet
Table of Contents-Mine
11 pages
Milestone
No ratings yet
Milestone
7 pages
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
No ratings yet
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
5 pages
Reviewer Checklist
No ratings yet
Reviewer Checklist
15 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
16 pages
Align
No ratings yet
Align
5 pages
(2024-10-20) Copy of Gratitude Corsage - SXS Eval - Instructions
No ratings yet
(2024-10-20) Copy of Gratitude Corsage - SXS Eval - Instructions
26 pages
Assignment
No ratings yet
Assignment
3 pages
(Turing) Guidelines For Python Puzzles (March 2024)
No ratings yet
(Turing) Guidelines For Python Puzzles (March 2024)
11 pages
Project+Instructions+and+marking+scheme+24 25.pdf67400e7cc299b70759
No ratings yet
Project+Instructions+and+marking+scheme+24 25.pdf67400e7cc299b70759
2 pages
Instructions 22
No ratings yet
Instructions 22
28 pages
Final RichardSinn Cmpe285 Fall2024
No ratings yet
Final RichardSinn Cmpe285 Fall2024
13 pages
Nightingale RLHF Code Onboarding WIP
No ratings yet
Nightingale RLHF Code Onboarding WIP
26 pages
C++ Computer Programming Rubric
No ratings yet
C++ Computer Programming Rubric
2 pages
AP SummativeAssesment OL2
No ratings yet
AP SummativeAssesment OL2
8 pages
MB Python Manual Final
No ratings yet
MB Python Manual Final
39 pages
EVALUATION - Coding Data Requirements
No ratings yet
EVALUATION - Coding Data Requirements
24 pages
002 Good Programming Practice
No ratings yet
002 Good Programming Practice
6 pages
Lecture 4 - Static+Dynamic Routing
No ratings yet
Lecture 4 - Static+Dynamic Routing
51 pages
Message Queues, Semaphores, Shared Memory
No ratings yet
Message Queues, Semaphores, Shared Memory
46 pages
Model of MINI UPS System
50% (4)
Model of MINI UPS System
38 pages
CS6310 A3 Individual - Implementation Fall2022 v4
No ratings yet
CS6310 A3 Individual - Implementation Fall2022 v4
20 pages
Software Developement Prompts
No ratings yet
Software Developement Prompts
14 pages
Tp200-Interface Guide (2024 - 09 - 09 23 - 12 - 19 UTC)
No ratings yet
Tp200-Interface Guide (2024 - 09 - 09 23 - 12 - 19 UTC)
50 pages
CP2406 Programming-II: Assignment-1: Assessment Description
No ratings yet
CP2406 Programming-II: Assignment-1: Assessment Description
6 pages
Configuration Samba Server File Sharing
No ratings yet
Configuration Samba Server File Sharing
20 pages
ASEEx Slides
No ratings yet
ASEEx Slides
87 pages
Sharing - AMR Human Annotation Guideline - 20240828
No ratings yet
Sharing - AMR Human Annotation Guideline - 20240828
14 pages
Paymenow Employee App T&Cs Feb 2025
No ratings yet
Paymenow Employee App T&Cs Feb 2025
15 pages
Fire Detection Algorithm Based On The Fusion of YOLOv8 and Deformable Conv DCN
No ratings yet
Fire Detection Algorithm Based On The Fusion of YOLOv8 and Deformable Conv DCN
8 pages
PW2 Report Dharvin 10DTK21F1007
No ratings yet
PW2 Report Dharvin 10DTK21F1007
18 pages
An5543 Guidelines For Enhanced Spi Communication On stm32 Mcus and Mpus Stmicroelectronics en
No ratings yet
An5543 Guidelines For Enhanced Spi Communication On stm32 Mcus and Mpus Stmicroelectronics en
24 pages
HW01 Spring23
No ratings yet
HW01 Spring23
15 pages
Object Oriented Programming
No ratings yet
Object Oriented Programming
94 pages
FCSP - Semester Project
No ratings yet
FCSP - Semester Project
4 pages
Ap23 SG Computer Science Principles
No ratings yet
Ap23 SG Computer Science Principles
6 pages
HAv2 Write The Requirement
No ratings yet
HAv2 Write The Requirement
5 pages
Calc Logic Assessment
No ratings yet
Calc Logic Assessment
4 pages
Lab5 CPEN21 Ver2 2
No ratings yet
Lab5 CPEN21 Ver2 2
2 pages
37490197UWUNICEY
No ratings yet
37490197UWUNICEY
12 pages
(Internal) I18n Code Evals Instructions
No ratings yet
(Internal) I18n Code Evals Instructions
18 pages
Manara
No ratings yet
Manara
11 pages
Unit 1
No ratings yet
Unit 1
77 pages
Assignments 2
No ratings yet
Assignments 2
5 pages
Final Presentation - AI Stream
No ratings yet
Final Presentation - AI Stream
4 pages
CS591 Programming Project 1 Description
No ratings yet
CS591 Programming Project 1 Description
4 pages
A Simulation of A Monitoring and Alarm System in An Energy
No ratings yet
A Simulation of A Monitoring and Alarm System in An Energy
58 pages
A3.1
No ratings yet
A3.1
4 pages
Opposing
No ratings yet
Opposing
2 pages
Assignment 3 - Design Implementation-2024
No ratings yet
Assignment 3 - Design Implementation-2024
5 pages
AI Student
No ratings yet
AI Student
18 pages
ACCC4006 Software Development
No ratings yet
ACCC4006 Software Development
18 pages
Cursor全局通用思考链路V3
No ratings yet
Cursor全局通用思考链路V3
4 pages
TS Guide S88750 Service Manual
No ratings yet
TS Guide S88750 Service Manual
28 pages
PUM SetUpandPatching
No ratings yet
PUM SetUpandPatching
42 pages
Computer Programming Tusculum College
No ratings yet
Computer Programming Tusculum College
2 pages
Prompt Eng Notes
No ratings yet
Prompt Eng Notes
5 pages
Goboard Catalog 2
No ratings yet
Goboard Catalog 2
12 pages
3 Control Structures
No ratings yet
3 Control Structures
2 pages
8.1.1.2 Packet Tracer - Create Your Own Thing
0% (1)
8.1.1.2 Packet Tracer - Create Your Own Thing
5 pages
MC Ty Completingsquare2 2009 1
No ratings yet
MC Ty Completingsquare2 2009 1
5 pages
Unit-8 StructuresandUnions
No ratings yet
Unit-8 StructuresandUnions
9 pages
CE100L - Lab Task 1A - Week 1
No ratings yet
CE100L - Lab Task 1A - Week 1
3 pages
AAPP010-4-2-PWP - Assignment Question
No ratings yet
AAPP010-4-2-PWP - Assignment Question
5 pages
CSC 121
No ratings yet
CSC 121
8 pages
Adv Sec Arch Spec Parnter Req Etmg en
No ratings yet
Adv Sec Arch Spec Parnter Req Etmg en
5 pages
CDA C1 R 015 en File 31.en
No ratings yet
CDA C1 R 015 en File 31.en
2 pages
Grading Rubric
No ratings yet
Grading Rubric
1 page
Hall Ticket
No ratings yet
Hall Ticket
1 page
Chapter 01 Subprograms
No ratings yet
Chapter 01 Subprograms
10 pages
InterCor Hybrid-Roadmap v1.0 Final
No ratings yet
InterCor Hybrid-Roadmap v1.0 Final
40 pages
Cyble MBus STD EN WEB
No ratings yet
Cyble MBus STD EN WEB
2 pages
Real Numbers and Indices Rubric 2022
No ratings yet
Real Numbers and Indices Rubric 2022
2 pages
DSTR Assignment Question APU
No ratings yet
DSTR Assignment Question APU
3 pages
Software Architecture Design - L01
No ratings yet
Software Architecture Design - L01
13 pages
Brief Learning Points (Objectives) : Able To Prove 2, 3 Are Irrational, Terminating Decimal
No ratings yet
Brief Learning Points (Objectives) : Able To Prove 2, 3 Are Irrational, Terminating Decimal
3 pages
C# Classes Syntax Steps
No ratings yet
C# Classes Syntax Steps
1 page
Mbox 2 Mini Quick Setup
No ratings yet
Mbox 2 Mini Quick Setup
2 pages
Python: Best Practices to Programming Code with Python
From Everand
Python: Best Practices to Programming Code with Python
Charlie Masterson
No ratings yet
Python: Best Practices to Programming Code with Python: Python Computer Programming, #2
From Everand
Python: Best Practices to Programming Code with Python: Python Computer Programming, #2
Charlie Masterson
No ratings yet
Ielts Writing Success. The Essential Step by Step Guide for Task 1 Writing. 8 Practice Tests for Bar Charts & Line Graphs. w/Band 9 Model Answer Key & On-line Support.
From Everand
Ielts Writing Success. The Essential Step by Step Guide for Task 1 Writing. 8 Practice Tests for Bar Charts & Line Graphs. w/Band 9 Model Answer Key & On-line Support.
Oliver Wilde
5/5 (1)
IELTS Academic Writing - Discover The Secrets To Writing 8+ Answers For The IELTS Exams! (High Scoring Sample Answers Included)
From Everand
IELTS Academic Writing - Discover The Secrets To Writing 8+ Answers For The IELTS Exams! (High Scoring Sample Answers Included)
Jennifer Anderson
No ratings yet
CODING INTERVIEW: 50+ Tips and Tricks to Better Performance in Your Coding Interview
From Everand
CODING INTERVIEW: 50+ Tips and Tricks to Better Performance in Your Coding Interview
Eric Schmidt
No ratings yet

Mandolin Task ChatGPT Search

Uploaded by

Mandolin Task ChatGPT Search

Uploaded by

Mandolin Parade

write me a hard prompt related to Code editing/rewriting : Text to Code edits

Write a draft promt outline for the same prompt in 50 words

ok so i am assigned with a task where i will have to evaluate the response of an

<Response 1 paste here>

<Response 2 paste here>

Select a minimum of 1 choice; maximum of 11 choices

only answer the correct answers

<Paste Response 1 here>

Accuracy is a measurement of whether the code has bugs, the output is as

In coding responses, accuracy involves considering the following aspects:

Correctness: The output matches the expected result, without errors or

Correctness: The output matches the expected result, without errors or

Precision: The output is precise, with no unnecessary or redundant information.

Relevance: The output is relevant to the input, requirements, or specifications.

Please specify at a FUNCTION LEVEL

Just provide correct and incorrect points in short.

Optimality / Efficiency is a measurement of how optimal the code solution is and

Adheres to common practices and standards:

Is the optimal approach in terms of complexity:

Is the optimal approach in terms of case coverage:

Only tell OPTIMAL/EFFICIENT and OPTIMAL/INEFFICIENT points

Presentation Correct and Presentation InCorrect

Presentation is a measurement of whether or not a model’s response is clear,

Steps to Evaluate Presentation

General Presentation Rules

Figure out which presentation aspects need to be present in order to have a

Up-to-Date is a measurement of if the code being outputted uses deprecated

- Does the code adhere to common practices and standards.

- Is there repetition or does the model make reusable functions?

- is it the optimal approach in terms of complexity, case coverage etc.

- Out-of-date libraries must trigger run-time or compile-time errors

Does the model response edit or add any executable code? *

Rate the degree of execution of the code *

(Only if there is executable code in the response)

ex: pip install tkinter, sudo apt get

ex: uvicorn main:app --host 0.0.0.0 --port 8000, python app.py

Is the output of the code as expected? *

Did the code produce an error? *

Code Execution: Output

<Paste Response 2 here>

Now i will ask questions related to this response 2

<Response 1 paste here>

<Response 2 paste here>

Please select a score to indicate which response is better.

- Language that is clear, concise, and free of unnecessary repetition.

- Sufficient comments in the code to explain complex logic or non-obvious steps.

Does the model response need a rewrite? *

A rewrite is REQUIRED if the response can be improved in anyway.

You might also like