0% found this document useful (0 votes)

13 views24 pages

EVALUATION - Coding Data Requirements

allign

Uploaded by

raafoolvinci

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views24 pages

EVALUATION - Coding Data Requirements

allign

Uploaded by

raafoolvinci

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Coding Data Generation - Project Overview

This project is focused around generating an SFT dataset containing prompts + gold standard
responses for helping improve the performance of a customer’s model on coding related tasks.

One of the biggest focuses is ensuring that we only provide datarows in the delivery where the
prompt is shown to elicit an erroneous / incorrect response from the customer model. For this,
you will have the ability to iteratively query the customer model with your prompt - testing until
you get an incorrect response.

Once you have managed to achieve an incorrect response, you will have the ability to rewrite
in order to provide your own, correct (“gold standard”) response.

Task Details:

Full details are provided under the Coding Data Requirements section below, however, a
summary is provided here:

- The expectation is provide a training dataset which covers several coding languages:
- Python, JavaScript, TypeScript, SQL, PHP, Java, Ruby
- It should also cover a range of different task types: from “Code Generation /
Completion”, to “Code Fixing”, to “Test Case Generation / Output Generation”,
and “Code Explanation”
- Before starting anything, you are expected to fully read the below guidance for your
given task type to understand the expected format of the input prompts + output
responses + areas of interest and examples of what good looks like
- In addition to the general task areas, the customer has specified that they are interested
in improving their model performance against the following benchmarks, so it is
important that we align prompts around the sorts of questions included in these:
- HumanEval-Python: Code completion prompts in docstring style (Python)
- HumanEval-X: Code completion prompts in docstring style (PHP, Java, Ruby,
JavaScript, TypeScript)
- CodeForces: Global competitive programming platform for skill improvement and
benchmarking problem-solving abilities through contests
- BigCodeBench: Offers two coding-task variations: Code Completion using
structured docstrings, and Code Generation based on natural-language
instructions
- LiveCodeBench: Contamination-free benchmark for evaluating LLMs on code,
focusing on self-repair, execution, and test output prediction
- HumanEvalPack: HumanEvalPack is an extension of OpenAI's HumanEval to
cover 6 total languages across 3 tasks (HumanEvalDescribe, HumanEvalFix, and
HumanEvalSynthesize)
- The editor that is used to complete this task will be connected to the customer
model, so you can iteratively query the model until you get an incorrect response.
- One possible approach here, rather than continuing to write prompts from scratch
if the model is successful in responding when you first structure a prompt, is to
iterate on your original starter prompt making it increasingly complex until you
elicit an incorrect response
- Once you arrive at the prompt that results in an incorrect response, you should do the
following:
- Rewrite the response to provide a correct (“gold standard”) one - make sure
to abide by all the formatting constraints outlined below
- Quickly capture in no more than 5 bullet points the errors / loss patterns
that were exhibited in the original response, that justified you providing a
rewrite (keep this brief and don’t spend much time here, this is not being
provided to the customer in the first instance, but instead being used to A) build
up a view of the trends in errors, B) enabling faster review process as the errors
are already being captured in labelling)
- Provide a script that can be copied directly into the relevant IDE and
execute to test that the functionality of any code provided in your
corrected/rewritten response is valid and runs

Key things to focus on:

1. The prompts we are proceeding with rewrite for are only those that elicit an incorrect
model response (you should reset any prompts that result in correct responses and
only bother providing a response where the model response is incorrect)
2. The prompts and rewritten responses must fully align with the expected formats
detailed below, including any expected components of the content - e.g. explanations of
code, running code functions, examples / tests, example implementations. These differ
by task type, so it is important to observe the relevant task section & examples below
3. The prompts must be focused on both the correct areas specified via task type AND
taking into account the benchmark datasets the customer wishes to improve against
4. You should avoid, at all costs, including comments or text that you would not want to be
presented to the customer - i.e. avoid PII, usernames (e.g. console paths) or passwords
(e.g. API keys) - if you wish to discuss anything with the team or capture nuance, do so
using the “Issues” feature in the Labelbox editor

Demo of Editor [EXTREMELY IMPORTANT TO WATCH]:

https://www.loom.com/share/ff5932b27f81495d800336f27995fdbc?sid=f437b5b5-c5e8-412
b-9e17-bf9eae392ab5
Coding Data Requirements
This document presents the client requirements and instructions for collecting human
demonstration coding data.

Table of Contents

1. Targeted benchmarks 5
2. Data volume and distribution requirements 5
2.1 Coding-task distribution, by subtask and programming language 5
3. Annotation instructions 5
3.1 Instructions that apply for all tasks 6
3.1.1 Guidelines for Crafting Complex Coding Prompts 7
3.2 Coding-task instructions 7
3.2.1 Code Generation/Completion 7
3.2.2 Code Fixing 12
3.2.3 Test-case generation / Output generation 12
3.2.4 Code explanation 12
4. Complete examples 13
4.1 Code Generation/Completion 13
4.1.1 Code Generation 13
4.1.2 Code Completition 15
4.2 Code fixing 17
4.3 Test-case generation / output generation 20
4.3.1 Test-case generation 20
4.3.2 Output Generation 22
4.4 Code Explanation 24

1. Targeted benchmarks

Below are the target benchmarks that we aim to improve coding performance on:

● HumanEval-Python: Code completion prompts in docstring style (Python)

● HumanEval-X: Code completion prompts in docstring style (PHP, Java, Ruby,
JavaScript, TypeScript)
● CodeForces: Global competitive programming platform for skill improvement and
benchmarking problem-solving abilities through contests
● BigCodeBench: Offers two coding-task variations: Code Completion using structured
docstrings, and Code Generation based on natural-language instructions
● LiveCodeBench: Contamination-free benchmark for evaluating LLMs on code, focusing
on self-repair, execution, and test output prediction
● HumanEvalPack: HumanEvalPack is an extension of OpenAI's HumanEval to cover 6
total languages across 3 tasks (HumanEvalDescribe, HumanEvalFix, and
HumanEvalSynthesize)

2. Data volume and distribution requirements

2.1 Coding-task distribution, by subtask and programming language

Coding Subtask Python PHP Java Ruby JavaScript TypeScript SQL Total

Code Generation / 100 200 100 100 100 200 200 1k

Completion

Code Fixing 100 200 100 100 100 200 200 1k

Test Case Generation 100 200 100 100 100 200 200 1k
/ Output Generation

Code Explanation 100 200 100 100 100 200 200 1k

Notes:

● These should all be single-turn demonstrations.

3. Annotation instructions

This section outlines the set of instructions that data collectors need to follow in order to curate
high-quality examples.

3.1 Instructions that apply for all tasks

● All code snippets in the output must be inside a markdown with the language tag added.
● The code solutions should not be more complex than necessary. For example, do not
define classes if a function would suffice. However, we would like the annotators to have
creative freedom. See Section 3.1.1 for guidelines on crafting coding prompts are the
desired complexity.
● Generated code should respect any requirements/constraints set in the input request.
These include, but are not limited to, names of functions and variables and any implicit
or explicit data types. If the input request doesn’t specify the output format, you need to
use the provided test cases (if any) to deduce the output format.
● Good code practices should be applied, such as using descriptive variable names and
avoiding very long lines. For Python, follow PEP8 guidelines, except for the line limitation
of 79 characters.
● We recommend that the programming language is explicitly mentioned in the prompt if it
is not otherwise easy to deduce from that prompt.
● Responses should include adequate comments to make the code easily readable.
● For demonstrations that involve the use of external libraries, we expect annotators to
adhere to the following requirements:
● Annotators should always use the latest stable release of any external library or
framework at the time of creating the demonstration unless explicitly instructed
otherwise.
● Annotators should include the version of libraries and frameworks only if they are
using libraries that are not commonly accepted as standard or widely used in the
field. For example, there is no need to specify the version for libraries like numpy
or pandas in Python, but version documentation is required for less common or
specialized libraries.
● Code submissions should be verified for compatibility with the specified library
versions to ensure reproducibility and accuracy.
● Prompts should be sourced based on use cases for which the current [client] models fail
to produce high quality responses. This requirement can be achieved most efficiently by
placing [client] models “in the loop” (MiL) with the participants during their active work.
Participants should be instructed to “probe” the models for areas of weakness before
proceeding with further response rating/writing activities.
● Human demonstrations should be based on the [client] system prompt, not an
arbitrary/alternative system prompt. Any MiL content generated during session creation
should use the [client] system prompt in its inputs.
● Content produced in RAG fashion (i.e. injection of results of indexing/searching
operations into the session context), if present in the session, should be placed into its
own section of the delivery schema so it can be clearly isolated from other parts of the
demonstration text.
● Human-written or MiL responses involving both code and text (explanations, concept
definitions, documentations, etc.) should remain as faithful as possible to current [client]
model’s response style and presentation tendencies. Even if accurate and helpful, a
demonstration can be counterproductive if it introduces contrary stylistic elements.

3.1.1 Guidelines for Crafting Complex Coding Prompts

1. Encourage Deep Logical Thinking: Create prompts that require models to engage in
multi-step reasoning and problem decomposition. The tasks should not have
straightforward or linear solutions and should involve understanding and applying
intricate logic.
2. Incorporate Multi-Rule Logic: Prompts that include multiple interconnected rules that
need to be followed will enhance the complexity of the prompts.
3. Include Constraints to Test Adaptability: Define constraints that limit straightforward
implementations, forcing the model to consider alternative approaches. Constraints
might include limits on repetition, mandatory use of specific constructs, or handling
boundary conditions.
4. Test Handling of Edge Cases: Prompts should be designed to include potential edge
cases that challenge the model to generalize well. These cases should be derived from
the natural complexity of the task.
5. Emphasize Efficiency: Include requirements that consider time and space efficiency,
ensuring the model optimizes for performance where applicable.

3.2 Coding-task instructions

3.2.1 Code Generation/Completion

3.2.1.1 Code Generation

Prompt: {A natural language request that requires the creation of code and can optionally include helper
functions and empty function bodies.}

Response: {A natural language explanation describing the approach to solve the given problem. Code to
address the request. It is highly recommended to include a couple of examples to showcase the usage of
the generated code.}

3.2.1.2 Code Completion

Prompt: {Function body with the description of the problem in the docstring. Can optionally include a set
of test cases. These test cases can either be written in doctest format or descriptive format.}

Response: {Complete code aimed at solving the given problem which passes all provided test cases
along with an explanation/reasoning behind the approach.}

3.2.1.2.A Code Generation/Completion: Areas of emphasis

Place greater emphasis on the following types of examples:

1. Examples which require usage of well-known mathematical formulas and properties. For
example:

Python
def projectile_motion(v0: float, angle: float) -> dict:
"""
Calculate the time of flight, maximum height, and range of a projectile.

Args:
- v0 (float): Initial velocity in m/s.
- angle (float): Launch angle in degrees.

Returns:
- dict: A dictionary containing:
- 'time_of_flight': Time of flight in seconds.
- 'max_height': Maximum height in meters.
- 'range': Range of the projectile in meters.

Raises:
- ValueError: If v0 or angle is negative.

Example:
>>> projectile_motion(20, 45)
{'time_of_flight': 2.89, 'max_height': 10.20, 'range': 40.82}
"""

2. Examples with more than two constraints/conditions in the problem. The complexity of the
logic required to solve a problem increases as the number of conditions or constraints
increases. For example:

Python
def unique_sorted_string(input_string: str) -> str:
"""
Returns a sorted version of the input string based on the following
constraints:
1. Numbers (0-9) appear first in the sorted output.
2. Uppercase letters (A-Z) are treated as smaller than lowercase
letters (a-z).
3. Duplicates are removed, but the first occurrence of each
character in the input retains its position in the final output.

Args:
input_string (str): The string to be sorted and processed.

Returns:
str: The sorted string following the constraints.

Examples:
>>> unique_sorted_string("Hello123World!") '123HWdelo'
>>> unique_sorted_string("aAbBcCdDeE1233") '123ABCDEabcde'
>>> unique_sorted_string("zZz!") 'Zz'
"""

3. Examples that require the model to deduce the correct return type of the output (e.g., int,
float, str, list, etc.) based on the format of the provided test cases, rather than explicitly
specifying the return type in the problem description. For example:

Python
def calculate_average(numbers: list[float]) -> float:
"""
Calculates the average of a list of numbers.

Args:
numbers (list[float]): A list of floating-point numbers.

Examples:
>>> calculate_average([1, 2, 3, 4, 5])
3.000
>>> calculate_average([10, 15, 20])
15.000
>>> calculate_average([1.1, 2.2, 3.3])
2.200
>>> calculate_average([5.555, 4.444, 3.333])
4.444
>>> calculate_average([0, 0.123, 0.456])
0.193
"""

In this example, the model must return the output as a float with three decimal places of
precision.

4. Examples of prompts where the problem description is vague, and the logic or behavior
required must be inferred by analyzing the test cases. For example:

Python
def merge_tuples(tuples: list[tuple[int, int]]) -> list[tuple[int, int]]:
"""
Merges a list of tuples.

Args:
tuples (list[tuple[int, int]]): A list of tuples where each tuple
contains two integers.

Returns:
list[tuple[int, int]]: A list of merged tuples.

Examples:
>>> merge_tuples([(1, 3), (2, 4), (5, 7), (6, 8)])
[(1, 4), (5, 8)]
>>> merge_tuples([(10, 15), (12, 20), (25, 30)])
[(10, 20), (25, 30)]
>>> merge_tuples([(1, 2), (3, 4)])
[(1, 2), (3, 4)]
"""

5. Examples where the test cases are written in a descriptive format, rather than being
structured in a doctest format. For example:
Python
def sum_of_squares(numbers: list[int]) -> int:
"""
Calculates the sum of squares of integers in a list.

Args:
numbers (list[int]): A list of integers.

Returns:
int: The sum of the squares of the integers.

Test Cases:
1. Input: [1, 2, 3, 4]
Expected Output: 30
Explanation: 1^2 + 2^2 + 3^2 + 4^2 = 30.

2. Input: [-1, -2, -3]

Expected Output: 14
Explanation: Squares of negative numbers are positive: 1 + 4 + 9 =
14.

3. Input: []
Expected Output: 0
Explanation: An empty list results in a sum of 0.

4. Input: [0, 5, 10]

Expected Output: 125
Explanation: 0^2 + 5^2 + 10^2 = 125.
"""

3.2.2 Code Fixing

Prompt: {A request in natural language to fix the given erroneous code snippet.}

Response: {A natural language explanation describing the error in the given code snippet, followed by
the corrected code snippet along with an explanation of the corrections made.}
3.2.3 Test-case generation / Output generation

3.2.3.1 Test-case generation

Prompt: {A request in natural language to generate test cases for a given code snippet.}

Response: {A diverse set of test cases covering all edge cases necessary to test the correctness of the
given code. Please also include a concise summary of the code.}

3.2.3.2 Output generation

Prompt: {A request in natural language to predict the output for the given code snippet for a given input.}

Response: {The expected output for a given input, along with an explanation behind the generated
output.}

3.2.4 Code explanation

Prompt: {A request in natural language to generate an explanation for the given code snippet.}

Response: {A concise natural language explanation for the given code snippet.}

4. Complete examples

4.1 Code Generation/Completion

4.1.1 Code Generation

Prompt:
Description:
You are given a list of tasks and their dependencies. Each task is represented by a unique
character, and dependencies are given as pairs of tasks where the first task must be completed
before the second task can start. Your goal is to determine if it's possible to complete all tasks
and, if so, return one possible order in which the tasks can be completed.

Input:
A list of tasks represented by unique characters (e.g., ['A', 'B', 'C', ...]).
A list of dependency pairs where each pair is a tuple of two tasks (e.g., [('A', 'B'), ('B', 'C'), ...]).

Output:
If it's possible to complete all tasks, return one possible order in which the tasks can be
completed as a list of characters (e.g., ['A', 'B', 'C', ...]).
If it's not possible to complete all tasks due to circular dependencies, return None.

Example:
tasks = ['A', 'B', 'C', 'D']
dependencies = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('B', 'D')]

Output:
None # Circular dependency detected (A -> B -> C -> A)

tasks = ['A', 'B', 'C', 'D']

dependencies = [('A', 'B'), ('B', 'C'), ('C', 'D')]

Output:
['A', 'B', 'C', 'D'] # One possible order Constraints:
The number of tasks will be between 1 and 100.
The number of dependencies will be between 0 and 1000.

Response:
To determine if it's possible to complete all tasks and find one possible order, we can use a
topological sort algorithm.
Here's a step-by-step approach to solve this problem:

Build the Graph:

Create a dictionary to represent the graph where each task is a key and its value is a list of
tasks that depend on it.

Count Incoming Edges:

Create a dictionary to count the number of incoming edges (dependencies) for each task.

Initialize the Queue:

Create a queue to hold tasks with no incoming edges (i.e., tasks that can be started
immediately).

Topological Sort:
While the queue is not empty, dequeue a task, add it to the result list, and reduce the incoming
edge count for each of its dependent tasks. If a dependent task's incoming edge count reaches
zero, add it to the queue.
If at any point a task is revisited before all its dependencies are resolved, a circular dependency
is detected.

Check for Circular Dependencies:

If the result list does not contain all tasks, there is a circular dependency, and it's not possible to
complete all tasks.

Here's the implementation of the above approach:

Python
from collections import deque, defaultdict

def find_order(tasks, dependencies):

# Step 1: Build the graph
graph = defaultdict(list)
in_degree = {task: 0 for task in tasks}

for dependency in dependencies:

graph[dependency[0]].append(dependency[1])
in_degree[dependency[1]] += 1

# Step 2: Initialize the queue with tasks having no incoming edges

queue = deque([task for task in tasks if in_degree[task] == 0])

# Step 3: Perform topological sort

result = []
while queue:
task = queue.popleft()
result.append(task)
for dependent_task in graph[task]:
in_degree[dependent_task] -= 1
if in_degree[dependent_task] == 0:
queue.append(dependent_task)

# Step 4: Check if all tasks were processed

if len(result) == len(tasks):
return result
else:
return None

# Example usage:
tasks = ['A', 'B', 'C', 'D']
dependencies = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('B', 'D')]
print(find_order(tasks, dependencies)) # Output: None

tasks = ['A', 'B', 'C', 'D']

dependencies = [('A', 'B'), ('B', 'C'), ('C', 'D')]
print(find_order(tasks, dependencies)) # Output: ['A', 'B', 'C', 'D']

This algorithm efficiently determines if a valid order exists and returns one possible order if it
does. If a circular dependency is detected, it returns None.

4.1.2 Code Completition

Prompt:

Additionally, the function should handle nested parentheses of different

types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()") True

>>> is_valid_parentheses("([])") True
>>> is_valid_parentheses("({[}])") False
>>> is_valid_parentheses("a(b)c") True
>>> is_valid_parentheses(")(") False
"""

Response:

Additionally, the function should handle nested parentheses of different

types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()") True

>>> is_valid_parentheses("([])") True
>>> is_valid_parentheses("({[}])") False
>>> is_valid_parentheses("a(b)c") True
>>> is_valid_parentheses(")(") False
"""

stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop() if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
stack.append(char)
return not stack

Explanation:
The function uses a stack to keep track of the opening parentheses.
It iterates through each character in the string.
If the character is a closing parenthesis, it checks if the stack is empty or if the top element of
the stack does not match the corresponding opening parenthesis.
If either condition is true, the string is invalid.
If the character is an opening parenthesis, it is pushed onto the stack.
After iterating through the string, if the stack is empty, it means all parentheses were properly
matched and nested, so the string is valid. Otherwise, it is invalid.

Test Cases:
is_valid_parentheses("()") returns True
is_valid_parentheses("([])") returns True
is_valid_parentheses("({[}])") returns False
is_valid_parentheses("a(b)c") returns True
is_valid_parentheses(")(") returns False

4.2 Code fixing

Prompt:
I'm working on writing a code snippet to check if the given stream of parenthesis is balanced but
it’s not working as expected. Can you help me fix it ?

Python
def is_valid_parentheses(s):
"""
Write a function to check if the given string of parentheses is valid.
A string of parentheses is valid if every opening parenthesis has a
corresponding closing parenthesis in the correct order.
Additionally, the function should handle nested parentheses of different
types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

Response:
Your code has an issue with how you handle the stack. Specifically, you're using `stack.pop(0)`,
which removes the first element from the stack, but stack operations should follow the
Last-In-First-Out (LIFO) principle, where you pop the last element.

Here's the corrected version of your function:

>>> is_valid_parentheses("()")
True
>>> is_valid_parentheses("([])")
True
>>> is_valid_parentheses("({[}])")
False
>>> is_valid_parentheses("a(b)c")
True
>>> is_valid_parentheses(")(")
False
"""
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
# Pop the top of the stack
top_element = stack.pop() if stack else '#'
# Check is the mapping is correct
if mapping[char] != top_element:
return False
elif char in mapping.values():
# Push opening brackets onton the stack
stack.append(char)
return not stack

Key Changes:
1. Stack Pop:
- Changed `stack.pop(0)` to `stack.pop()`. This ensures LIFO behavior, as the last added
element is removed first
2. Edge Case Handling
- Ensured the function gracefully handles an empty stack by using a placeholder (`'#'`).
3. Return Value:
- The function now correctly checks if the stack is empty at the end to determine if all
parentheses were properly closed.

### Testing:

Python
print(is_valid_parentheses("()")) # True
print(is_valid_parentheses("([])")) # True
print(is_valid_parentheses("({[}])")) # False
print(is_valid_parentheses("a(b)c")) # True
print(is_valid_parentheses(")(")) # False
print(is_valid_parentheses("{[()]}")) # True
print(is_valid_parentheses("{[(])}")) # False

This corrected code should now work as expected.

4.3 Test-case generation / output generation

4.3.1 Test-case generation

Prompt:
I have written a code snippet to check whether the given stream of parenthesis is balanced or
not. Can you help me generate a few test cases to validate the correctness of my code?

>>> is_valid_parentheses("()")
True
>>> is_valid_parentheses("([])")
True
>>> is_valid_parentheses("({[}])")
False
>>> is_valid_parentheses("a(b)c")
True
>>> is_valid_parentheses(")(")
False
"""
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop() if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
stack.append(char)
return not stack
Response:
Here are several test cases to validate the correctness of your function is_valid_parentheses.
These test cases cover various scenarios, including empty strings, nested structures, incorrect
nesting, and non-parenthesis characters:

Python
# Test case 1: Empty string
assert is_valid_parentheses("") == True# An empty string is considered
balanced.

# Test case 2: Simple balanced parentheses

assert is_valid_parentheses("()") == True
assert is_valid_parentheses("[]") == True
assert is_valid_parentheses("{}") == True

# Test case 3: Simple unbalanced parentheses

assert is_valid_parentheses("(") == False
assert is_valid_parentheses(")") == False
assert is_valid_parentheses("{[}") == False

# Test case 4: Nested balanced parentheses

assert is_valid_parentheses("([])") == True
assert is_valid_parentheses("{[()]}") == True
assert is_valid_parentheses("([{}])") == True

# Test case 5: Nested unbalanced parentheses

assert is_valid_parentheses("([)]") == False
assert is_valid_parentheses("{[(])}") == False
assert is_valid_parentheses("{[}") == False

# Test case 6: Additional characters in string

assert is_valid_parentheses("a(b)c") == True
assert is_valid_parentheses("a[b]c{d}") == True
assert is_valid_parentheses("a(b[c{d}e]f)g") == True
assert is_valid_parentheses("a(b[c{d)e]f}g") == False

# Test case 7: String with only non-parentheses characters

assert is_valid_parentheses("abcdef") == True # No parentheses, so it's valid.

# Test case 8: Edge cases

assert is_valid_parentheses("(((((((") == False # All opening, no closing.
assert is_valid_parentheses(")))))))") == False # All closing, no opening.
assert is_valid_parentheses("((") == False # Incomplete opening.
assert is_valid_parentheses("))") == False # Incomplete closing.
# Test case 9: Mixed nested and unbalanced
assert is_valid_parentheses("[(])") == False # Incorrectly nested.
assert is_valid_parentheses("{[}]") == False # Incorrectly nested.

# Test case 10: Large balanced and unbalanced strings

assert is_valid_parentheses("()" * 1000) == True # Large balanced.
assert is_valid_parentheses("(" * 1000 + ")" * 999) == False # Missing one
closing parenthesis.

You can use these test cases to ensure your function handles a wide range of input
scenarios effectively.

4.3.2 Output Generation

Prompt:
I have written a code snippet to check whether the given stream of parenthesis is balanced or
not. Below is my code snippet:

Can you tell me what would be the output of this function if my input is '(())))'?

Response:
Let's analyze the code step-by-step for the input (()))):

Initialize Variables:
stack = []
mapping = {')': '(', '}': '{', ']': '['}

Process Each Character:

For the first character '(':
'(' is not in mapping, but it is in mapping.values(), so it is pushed onto the stack.
stack = ['(']

For the second character '(' :

Similarly, '(' is pushed onto the stack.
stack = ['(', '(']

For the third character ')':

')' is in mapping, so the code tries to pop the top element of the stack. The top
element is '(', which matches mapping[')'].
stack = ['(']

For the fourth character ')':

')' is in mapping, so the code tries to pop the top element of the stack. The top
element is '(', which matches mapping[')'].
stack = []

For the fifth character ')':

')' is in mapping, so the code tries to pop the top element of the stack. However,
the stack is empty (stack = []), so top_element = '#'.

Since mapping[')'] (which is '(') does not match top_element ('#'), the function
returns False.

Final Output:
The function returns False for the input (()))), as there are unmatched closing
parentheses.

4.4 Code Explanation

Prompt:
Can you explain what this code snippet is trying to achieve?

Python
def is_valid_parentheses(s):
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop() if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
stack.append(char)
return not stack

Response:
This code snippet is designed to determine if a string of parentheses is valid.
Here's a breakdown of its functionality:

1. **Initialization**:
- A stack is initialized to keep track of opening parentheses.
- A dictionary `mapping` is created to map closing parentheses to their corresponding
opening parentheses.

2. **Iteration**:
- The function iterates through each character in the string `s`.
- If the character is a closing parenthesis (`)`, `}`, or `]`), it checks the top element of the
stack:
- If the stack is empty, it uses a placeholder (`'#'`) to indicate there’s no matching
opening parenthesis.
- If the top element of the stack does not match the corresponding opening
parenthesis specified in the `mapping`, the function returns `False`.
- If the character is an opening parenthesis ( `(`, `{`, or `[`), it is pushed onto the
stack.

3. **Final Check**:
- After iterating through the string, the function returns `True` if the stack is empty
(indicating all parentheses were properly matched and closed), otherwise `False`.

In summary, the function checks if the parentheses in the string are balanced and
correctly nested

The Elements of A Database
No ratings yet
The Elements of A Database
11 pages
Dbms Cheat Sheet
100% (5)
Dbms Cheat Sheet
5 pages
MIS - Project Title Proposal
100% (1)
MIS - Project Title Proposal
14 pages
Vault of Codes Assignment 2 - 30960551
No ratings yet
Vault of Codes Assignment 2 - 30960551
3 pages
(Turing) Guidelines For Python Puzzles (March 2024)
No ratings yet
(Turing) Guidelines For Python Puzzles (March 2024)
11 pages
Log.2017 08 03 11 04 48 0500
No ratings yet
Log.2017 08 03 11 04 48 0500
28 pages
v118 Na Series Programmable Terminal Software Users Manual en
No ratings yet
v118 Na Series Programmable Terminal Software Users Manual en
346 pages
Tripp Lite Owners Manual 889413
No ratings yet
Tripp Lite Owners Manual 889413
230 pages
Align
No ratings yet
Align
5 pages
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
No ratings yet
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
5 pages
Nokia C2-00 User Guide: Issue 1.1
No ratings yet
Nokia C2-00 User Guide: Issue 1.1
39 pages
Systems & Applications Standard
No ratings yet
Systems & Applications Standard
131 pages
Google Cloud Professional Cloud Architect Exam Prep Sheet
100% (2)
Google Cloud Professional Cloud Architect Exam Prep Sheet
15 pages
Mandolin Task ChatGPT Search
No ratings yet
Mandolin Task ChatGPT Search
14 pages
Onboarding Session 04-06-2025
No ratings yet
Onboarding Session 04-06-2025
14 pages
HSM USB Serial Driver Getting Started Guide PDF
No ratings yet
HSM USB Serial Driver Getting Started Guide PDF
2 pages
Az 104t00a Enu Powerpoint 02
No ratings yet
Az 104t00a Enu Powerpoint 02
36 pages
GPT-4.1 Prompting Guide - OpenAI Cookbook
No ratings yet
GPT-4.1 Prompting Guide - OpenAI Cookbook
28 pages
TM1115 - STM32F103ZE - Chapter 6 (I2C - LCD)
No ratings yet
TM1115 - STM32F103ZE - Chapter 6 (I2C - LCD)
23 pages
Part6 RT Professional Server Und Panel Client en
No ratings yet
Part6 RT Professional Server Und Panel Client en
23 pages
Town&Country Planning MCQS - UPSC1
100% (3)
Town&Country Planning MCQS - UPSC1
2 pages
ICG DevOps Methodology Ebook
No ratings yet
ICG DevOps Methodology Ebook
5 pages
CSC429 - Assignment - Storage Medium
No ratings yet
CSC429 - Assignment - Storage Medium
9 pages
Instructions
No ratings yet
Instructions
19 pages
A.1.3 Usage of Identifiers: A.2 HI1: Interface Port For Administrative State
No ratings yet
A.1.3 Usage of Identifiers: A.2 HI1: Interface Port For Administrative State
27 pages
GitanjaliJoshi QA 8years
No ratings yet
GitanjaliJoshi QA 8years
3 pages
Comptel Test Automation Framework: Installation Guide
No ratings yet
Comptel Test Automation Framework: Installation Guide
26 pages
GPT-4.1 Prompting Guide - OpenAI Cookbook en
No ratings yet
GPT-4.1 Prompting Guide - OpenAI Cookbook en
28 pages
Java Loops II - Java Question
No ratings yet
Java Loops II - Java Question
4 pages
Experiment No 3: Mitesh Chauhan Te It - 1 B1 Roll No:-08
No ratings yet
Experiment No 3: Mitesh Chauhan Te It - 1 B1 Roll No:-08
6 pages
DS Neptune NPT-1100
No ratings yet
DS Neptune NPT-1100
8 pages
Prompt Engineering Guide by Examples
No ratings yet
Prompt Engineering Guide by Examples
14 pages
Definition - What Does Consumer-to-Business (C2B) Mean?
No ratings yet
Definition - What Does Consumer-to-Business (C2B) Mean?
3 pages
Table of Contents-Mine
No ratings yet
Table of Contents-Mine
11 pages
Lemur Astrologer Coding
No ratings yet
Lemur Astrologer Coding
28 pages
VPN Form202404050172
No ratings yet
VPN Form202404050172
3 pages
Swol
No ratings yet
Swol
2 pages
RPSC Programmer Sad DPP Part-3 By-Sunil Yadav Sir
No ratings yet
RPSC Programmer Sad DPP Part-3 By-Sunil Yadav Sir
2 pages
Agents For Software Development
No ratings yet
Agents For Software Development
53 pages
(2024-10-20) Copy of Gratitude Corsage - SXS Eval - Instructions
No ratings yet
(2024-10-20) Copy of Gratitude Corsage - SXS Eval - Instructions
26 pages
Seed Coder
No ratings yet
Seed Coder
46 pages
Deepseek-Coder: When The Large Language Model Meets Programming - The Rise of Code Intelligence
No ratings yet
Deepseek-Coder: When The Large Language Model Meets Programming - The Rise of Code Intelligence
23 pages
Chivas ST Attempter Introduction
No ratings yet
Chivas ST Attempter Introduction
14 pages
New Features of J2SE 5.0: Dr. Stephan Fischli
No ratings yet
New Features of J2SE 5.0: Dr. Stephan Fischli
46 pages
PCB Artist User Tips Guide
No ratings yet
PCB Artist User Tips Guide
10 pages
Evaluating Large Language Models Trained On Code
No ratings yet
Evaluating Large Language Models Trained On Code
35 pages
Attempter's Cheat Sheet
No ratings yet
Attempter's Cheat Sheet
1 page
Reviewer Checklist
No ratings yet
Reviewer Checklist
15 pages
Computer Networking Image Gallery: Inside This Article
No ratings yet
Computer Networking Image Gallery: Inside This Article
6 pages
(Turing) Guidelines For Python Puzzles
No ratings yet
(Turing) Guidelines For Python Puzzles
8 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
16 pages
Instructions 22
No ratings yet
Instructions 22
28 pages
OpenAI Codex Arxiv
No ratings yet
OpenAI Codex Arxiv
35 pages
OpenCoder 1731317971
No ratings yet
OpenCoder 1731317971
35 pages
AI Assignment For Braincade
No ratings yet
AI Assignment For Braincade
5 pages
Hopper Coding RLHF - Instructions
No ratings yet
Hopper Coding RLHF - Instructions
17 pages
O C: T O C T - T C L L M: PEN Oder HE PEN Ookbook For OP IER ODE Arge Anguage Odels
No ratings yet
O C: T O C T - T C L L M: PEN Oder HE PEN Ookbook For OP IER ODE Arge Anguage Odels
35 pages
LCD Library
No ratings yet
LCD Library
5 pages
CSC1002 Week3 AI Prompt
No ratings yet
CSC1002 Week3 AI Prompt
46 pages
AI Agent UC Berkeley
No ratings yet
AI Agent UC Berkeley
14 pages
Agent Coder 2312.13010v2
No ratings yet
Agent Coder 2312.13010v2
21 pages
Bulba Advanced Instructions
No ratings yet
Bulba Advanced Instructions
13 pages
代码大模型
No ratings yet
代码大模型
18 pages
GPT 4 1 Prompting Guide Summary 1745347916
No ratings yet
GPT 4 1 Prompting Guide Summary 1745347916
12 pages
Assist - Remove Course
No ratings yet
Assist - Remove Course
33 pages
Bee - Coding Advanced
No ratings yet
Bee - Coding Advanced
18 pages
Software Developement Prompts
No ratings yet
Software Developement Prompts
14 pages
(EXT) Starfish Mint RTL - Specifications
No ratings yet
(EXT) Starfish Mint RTL - Specifications
39 pages
Personal Best AI Prompts For Coders in 2025
No ratings yet
Personal Best AI Prompts For Coders in 2025
6 pages
37490197UWUNICEY
No ratings yet
37490197UWUNICEY
12 pages
Code V Code Official Instructions
No ratings yet
Code V Code Official Instructions
43 pages
AI Recruit
No ratings yet
AI Recruit
7 pages
Aryan 2022PH11425
No ratings yet
Aryan 2022PH11425
3 pages
Prompt Eng Notes
No ratings yet
Prompt Eng Notes
5 pages
(Internal) I18n Code Evals Instructions
No ratings yet
(Internal) I18n Code Evals Instructions
18 pages
Mastering Adobe Commerce Frontend: Build optimized, user-centric e-commerce sites with tailored theme design and enhanced interactivity
From Everand
Mastering Adobe Commerce Frontend: Build optimized, user-centric e-commerce sites with tailored theme design and enhanced interactivity
Jakub Winkler
No ratings yet
Bare Metal C: Embedded Programming for the Real World
From Everand
Bare Metal C: Embedded Programming for the Real World
Stephen Oualline
No ratings yet
Go Design Patterns
From Everand
Go Design Patterns
Mario Castro Contreras
5/5 (1)
Windows Batch File Programming
From Everand
Windows Batch File Programming
Michael Elliott
2/5 (2)
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
From Everand
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
Fergal Dearle
No ratings yet
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
C# Essentials for New Coders: A Practical Guide with Examples
From Everand
C# Essentials for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
The Beginner’s Guide to Kilo Code
From Everand
The Beginner’s Guide to Kilo Code
Steven Mcananey
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Go Debugging from Scratch: A Practical Guide with Examples
From Everand
Go Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
From Everand
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
Marije Brummel
No ratings yet
C++ Debugging from Scratch: A Practical Guide with Examples
From Everand
C++ Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)

EVALUATION - Coding Data Requirements

Uploaded by

EVALUATION - Coding Data Requirements

Uploaded by

Coding Data Generation - Project Overview

Key things to focus on:

Demo of Editor [EXTREMELY IMPORTANT TO WATCH]:

●​ HumanEval-Python: Code completion prompts in docstring style (Python)

2. Data volume and distribution requirements

2.1 Coding-task distribution, by subtask and programming language

Code Generation / 100 200 100 100 100 200 200 1k

Code Fixing 100 200 100 100 100 200 200 1k

Code Explanation 100 200 100 100 100 200 200 1k

●​ These should all be single-turn demonstrations.

3.1 Instructions that apply for all tasks

3.1.1 Guidelines for Crafting Complex Coding Prompts

3.2 Coding-task instructions

3.2.1 Code Generation/Completion

3.2.1.1 Code Generation​ ​ ​

3.2.1.2 Code Completion

3.2.1.2.A Code Generation/Completion: Areas of emphasis

2. Input: [-1, -2, -3]

4. Input: [0, 5, 10]

3.2.2 Code Fixing

3.2.3.1 Test-case generation

3.2.3.2 Output generation

3.2.4 Code explanation

4.1 Code Generation/Completion

4.1.1 Code Generation

tasks = ['A', 'B', 'C', 'D']

Build the Graph:

Count Incoming Edges:

Initialize the Queue:

Check for Circular Dependencies:

Here's the implementation of the above approach:

def find_order(tasks, dependencies):

for dependency in dependencies:

# Step 2: Initialize the queue with tasks having no incoming edges

# Step 3: Perform topological sort

# Step 4: Check if all tasks were processed

tasks = ['A', 'B', 'C', 'D']

4.1.2 Code Completition

Additionally, the function should handle nested parentheses of different

>>> is_valid_parentheses("()") True

Additionally, the function should handle nested parentheses of different

>>> is_valid_parentheses("()") True

4.2 Code fixing

Here's the corrected version of your function:

This corrected code should now work as expected.

4.3.1 Test-case generation

# Test case 2: Simple balanced parentheses

# Test case 3: Simple unbalanced parentheses

# Test case 4: Nested balanced parentheses

# Test case 5: Nested unbalanced parentheses

# Test case 6: Additional characters in string

# Test case 7: String with only non-parentheses characters

# Test case 8: Edge cases

# Test case 10: Large balanced and unbalanced strings

4.3.2 Output Generation

Process Each Character:

For the second character '(' :

For the third character ')':

For the fourth character ')':

For the fifth character ')':

4.4 Code Explanation

You might also like

● HumanEval-Python: Code completion prompts in docstring style (Python)

● These should all be single-turn demonstrations.

3.2.1.1 Code Generation