0% found this document useful (0 votes)
8 views

EVALUATION - Coding Data Requirements (3)

allign

Uploaded by

raafoolvinci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

EVALUATION - Coding Data Requirements (3)

allign

Uploaded by

raafoolvinci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Coding Data Generation - Project Overview

This project is focused around generating an SFT dataset containing prompts + gold standard
responses for helping improve the performance of a customer’s model on coding related tasks.

One of the biggest focuses is ensuring that we only provide datarows in the delivery where the
prompt is shown to elicit an erroneous / incorrect response from the customer model. For this,
you will have the ability to iteratively query the customer model with your prompt - testing until
you get an incorrect response.

Once you have managed to achieve an incorrect response, you will have the ability to rewrite
in order to provide your own, correct (“gold standard”) response.

Task Details:

Full details are provided under the Coding Data Requirements section below, however, a
summary is provided here:

-​ The expectation is provide a training dataset which covers several coding languages:
-​ Python, JavaScript, TypeScript, SQL, PHP, Java, Ruby
-​ It should also cover a range of different task types: from “Code Generation /
Completion”, to “Code Fixing”, to “Test Case Generation / Output Generation”,
and “Code Explanation”
-​ Before starting anything, you are expected to fully read the below guidance for your
given task type to understand the expected format of the input prompts + output
responses + areas of interest and examples of what good looks like
-​ In addition to the general task areas, the customer has specified that they are interested
in improving their model performance against the following benchmarks, so it is
important that we align prompts around the sorts of questions included in these:
-​ HumanEval-Python: Code completion prompts in docstring style (Python)
-​ HumanEval-X: Code completion prompts in docstring style (PHP, Java, Ruby,
JavaScript, TypeScript)
-​ CodeForces: Global competitive programming platform for skill improvement and
benchmarking problem-solving abilities through contests
-​ BigCodeBench: Offers two coding-task variations: Code Completion using
structured docstrings, and Code Generation based on natural-language
instructions
-​ LiveCodeBench: Contamination-free benchmark for evaluating LLMs on code,
focusing on self-repair, execution, and test output prediction
-​ HumanEvalPack: HumanEvalPack is an extension of OpenAI's HumanEval to
cover 6 total languages across 3 tasks (HumanEvalDescribe, HumanEvalFix, and
HumanEvalSynthesize)
-​ The editor that is used to complete this task will be connected to the customer
model, so you can iteratively query the model until you get an incorrect response.
-​ One possible approach here, rather than continuing to write prompts from scratch
if the model is successful in responding when you first structure a prompt, is to
iterate on your original starter prompt making it increasingly complex until you
elicit an incorrect response
-​ Once you arrive at the prompt that results in an incorrect response, you should do the
following:
-​ Rewrite the response to provide a correct (“gold standard”) one - make sure
to abide by all the formatting constraints outlined below
-​ Quickly capture in no more than 5 bullet points the errors / loss patterns
that were exhibited in the original response, that justified you providing a
rewrite (keep this brief and don’t spend much time here, this is not being
provided to the customer in the first instance, but instead being used to A) build
up a view of the trends in errors, B) enabling faster review process as the errors
are already being captured in labelling)
-​ Provide a script that can be copied directly into the relevant IDE and
execute to test that the functionality of any code provided in your
corrected/rewritten response is valid and runs

Key things to focus on:

1.​ The prompts we are proceeding with rewrite for are only those that elicit an incorrect
model response (you should reset any prompts that result in correct responses and
only bother providing a response where the model response is incorrect)
2.​ The prompts and rewritten responses must fully align with the expected formats
detailed below, including any expected components of the content - e.g. explanations of
code, running code functions, examples / tests, example implementations. These differ
by task type, so it is important to observe the relevant task section & examples below
3.​ The prompts must be focused on both the correct areas specified via task type AND
taking into account the benchmark datasets the customer wishes to improve against
4.​ You should avoid, at all costs, including comments or text that you would not want to be
presented to the customer - i.e. avoid PII, usernames (e.g. console paths) or passwords
(e.g. API keys) - if you wish to discuss anything with the team or capture nuance, do so
using the “Issues” feature in the Labelbox editor​

Demo of Editor [EXTREMELY IMPORTANT TO WATCH]:


https://www.loom.com/share/ff5932b27f81495d800336f27995fdbc?sid=f437b5b5-c5e8-412
b-9e17-bf9eae392ab5
Coding Data Requirements
This document presents the client requirements and instructions for collecting human
demonstration coding data.

Table of Contents

1. Targeted benchmarks​ 5
2. Data volume and distribution requirements​ 5
2.1 Coding-task distribution, by subtask and programming language​ 5
3. Annotation instructions​ 5
3.1 Instructions that apply for all tasks​ 6
3.1.1 Guidelines for Crafting Complex Coding Prompts​ 7
3.2 Coding-task instructions​ 7
3.2.1 Code Generation/Completion​ 7
3.2.2 Code Fixing​ 12
3.2.3 Test-case generation / Output generation​ 12
3.2.4 Code explanation​ 12
4. Complete examples​ 13
4.1 Code Generation/Completion​ 13
4.1.1 Code Generation​ 13
4.1.2 Code Completition​ 15
4.2 Code fixing​ 17
4.3 Test-case generation / output generation​ 20
4.3.1 Test-case generation​ 20
4.3.2 Output Generation​ 22
4.4 Code Explanation​ 24

1. Targeted benchmarks

Below are the target benchmarks that we aim to improve coding performance on:

●​ HumanEval-Python: Code completion prompts in docstring style (Python)


●​ HumanEval-X: Code completion prompts in docstring style (PHP, Java, Ruby,
JavaScript, TypeScript)
●​ CodeForces: Global competitive programming platform for skill improvement and
benchmarking problem-solving abilities through contests
●​ BigCodeBench: Offers two coding-task variations: Code Completion using structured
docstrings, and Code Generation based on natural-language instructions
●​ LiveCodeBench: Contamination-free benchmark for evaluating LLMs on code, focusing
on self-repair, execution, and test output prediction
●​ HumanEvalPack: HumanEvalPack is an extension of OpenAI's HumanEval to cover 6
total languages across 3 tasks (HumanEvalDescribe, HumanEvalFix, and
HumanEvalSynthesize)

2. Data volume and distribution requirements

2.1 Coding-task distribution, by subtask and programming language

Coding Subtask Python PHP Java Ruby JavaScript TypeScript SQL Total

Code Generation / 100 200 100 100 100 200 200 1k


Completion

Code Fixing 100 200 100 100 100 200 200 1k

Test Case Generation 100 200 100 100 100 200 200 1k
/ Output Generation

Code Explanation 100 200 100 100 100 200 200 1k

Notes:

●​ These should all be single-turn demonstrations.

3. Annotation instructions

This section outlines the set of instructions that data collectors need to follow in order to curate
high-quality examples.

3.1 Instructions that apply for all tasks

●​ All code snippets in the output must be inside a markdown with the language tag added.
●​ The code solutions should not be more complex than necessary. For example, do not
define classes if a function would suffice. However, we would like the annotators to have
creative freedom. See Section 3.1.1 for guidelines on crafting coding prompts are the
desired complexity.
●​ Generated code should respect any requirements/constraints set in the input request.
These include, but are not limited to, names of functions and variables and any implicit
or explicit data types. If the input request doesn’t specify the output format, you need to
use the provided test cases (if any) to deduce the output format.
●​ Good code practices should be applied, such as using descriptive variable names and
avoiding very long lines. For Python, follow PEP8 guidelines, except for the line limitation
of 79 characters.
●​ We recommend that the programming language is explicitly mentioned in the prompt if it
is not otherwise easy to deduce from that prompt.
●​ Responses should include adequate comments to make the code easily readable.
●​ For demonstrations that involve the use of external libraries, we expect annotators to
adhere to the following requirements:
●​ Annotators should always use the latest stable release of any external library or
framework at the time of creating the demonstration unless explicitly instructed
otherwise.
●​ Annotators should include the version of libraries and frameworks only if they are
using libraries that are not commonly accepted as standard or widely used in the
field. For example, there is no need to specify the version for libraries like numpy
or pandas in Python, but version documentation is required for less common or
specialized libraries.
●​ Code submissions should be verified for compatibility with the specified library
versions to ensure reproducibility and accuracy.
●​ Prompts should be sourced based on use cases for which the current [client] models fail
to produce high quality responses. This requirement can be achieved most efficiently by
placing [client] models “in the loop” (MiL) with the participants during their active work.
Participants should be instructed to “probe” the models for areas of weakness before
proceeding with further response rating/writing activities.
●​ Human demonstrations should be based on the [client] system prompt, not an
arbitrary/alternative system prompt. Any MiL content generated during session creation
should use the [client] system prompt in its inputs.
●​ Content produced in RAG fashion (i.e. injection of results of indexing/searching
operations into the session context), if present in the session, should be placed into its
own section of the delivery schema so it can be clearly isolated from other parts of the
demonstration text.
●​ Human-written or MiL responses involving both code and text (explanations, concept
definitions, documentations, etc.) should remain as faithful as possible to current [client]
model’s response style and presentation tendencies. Even if accurate and helpful, a
demonstration can be counterproductive if it introduces contrary stylistic elements.

3.1.1 Guidelines for Crafting Complex Coding Prompts

1.​ Encourage Deep Logical Thinking: Create prompts that require models to engage in
multi-step reasoning and problem decomposition. The tasks should not have
straightforward or linear solutions and should involve understanding and applying
intricate logic.
2.​ Incorporate Multi-Rule Logic: Prompts that include multiple interconnected rules that
need to be followed will enhance the complexity of the prompts.
3.​ Include Constraints to Test Adaptability: Define constraints that limit straightforward
implementations, forcing the model to consider alternative approaches. Constraints
might include limits on repetition, mandatory use of specific constructs, or handling
boundary conditions.
4.​ Test Handling of Edge Cases: Prompts should be designed to include potential edge
cases that challenge the model to generalize well. These cases should be derived from
the natural complexity of the task.
5.​ Emphasize Efficiency: Include requirements that consider time and space efficiency,
ensuring the model optimizes for performance where applicable.

3.2 Coding-task instructions

3.2.1 Code Generation/Completion

3.2.1.1 Code Generation​ ​ ​

Prompt: {A natural language request that requires the creation of code and can optionally include helper
functions and empty function bodies.}​ ​ ​ ​ ​

Response: {A natural language explanation describing the approach to solve the given problem. Code to
address the request. It is highly recommended to include a couple of examples to showcase the usage of
the generated code.} ​

​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

3.2.1.2 Code Completion

Prompt: {Function body with the description of the problem in the docstring. Can optionally include a set
of test cases. These test cases can either be written in doctest format or descriptive format.}

Response: {Complete code aimed at solving the given problem which passes all provided test cases
along with an explanation/reasoning behind the approach.}

3.2.1.2.A Code Generation/Completion: Areas of emphasis


Place greater emphasis on the following types of examples:

1. Examples which require usage of well-known mathematical formulas and properties. For
example:

Python
def projectile_motion(v0: float, angle: float) -> dict:
"""​
​ Calculate the time of flight, maximum height, and range of a projectile.

Args:
- v0 (float): Initial velocity in m/s.​
​ - angle (float): Launch angle in degrees.

​ Returns:​
​ - dict: A dictionary containing:
- 'time_of_flight': Time of flight in seconds.
- 'max_height': Maximum height in meters.​
​ - 'range': Range of the projectile in meters.​ ​ ​ ​
​ ​
Raises:​
​ - ValueError: If v0 or angle is negative.

​ Example:
>>> projectile_motion(20, 45)
{'time_of_flight': 2.89, 'max_height': 10.20, 'range': 40.82}
​ """

​ ​ ​ ​ ​
2. Examples with more than two constraints/conditions in the problem. The complexity of the
logic required to solve a problem increases as the number of conditions or constraints
increases. For example:

Python
def unique_sorted_string(input_string: str) -> str:
"""
Returns a sorted version of the input string based on the following
constraints:
1. Numbers (0-9) appear first in the sorted output.
2. Uppercase letters (A-Z) are treated as smaller than lowercase
letters (a-z).
3. Duplicates are removed, but the first occurrence of each
character in the input retains its position in the final output.

Args:
input_string (str): The string to be sorted and processed.

Returns:
str: The sorted string following the constraints.

Examples:
>>> unique_sorted_string("Hello123World!") '123HWdelo'
>>> unique_sorted_string("aAbBcCdDeE1233") '123ABCDEabcde'
>>> unique_sorted_string("zZz!") 'Zz'
"""

​ ​ ​

3. Examples that require the model to deduce the correct return type of the output (e.g., int,
float, str, list, etc.) based on the format of the provided test cases, rather than explicitly
specifying the return type in the problem description. For example:

Python
def calculate_average(numbers: list[float]) -> float:
"""
Calculates the average of a list of numbers.

Args:
numbers (list[float]): A list of floating-point numbers.

Examples:
>>> calculate_average([1, 2, 3, 4, 5])
3.000
>>> calculate_average([10, 15, 20])
15.000
>>> calculate_average([1.1, 2.2, 3.3])
2.200
>>> calculate_average([5.555, 4.444, 3.333])
4.444
>>> calculate_average([0, 0.123, 0.456])
0.193
"""

In this example, the model must return the output as a float with three decimal places of
precision.

4. Examples of prompts where the problem description is vague, and the logic or behavior
required must be inferred by analyzing the test cases. For example:

Python
def merge_tuples(tuples: list[tuple[int, int]]) -> list[tuple[int, int]]:
"""
Merges a list of tuples.

Args:
tuples (list[tuple[int, int]]): A list of tuples where each tuple
contains two integers.

Returns:
list[tuple[int, int]]: A list of merged tuples.

Examples:
>>> merge_tuples([(1, 3), (2, 4), (5, 7), (6, 8)])
[(1, 4), (5, 8)]
>>> merge_tuples([(10, 15), (12, 20), (25, 30)])
[(10, 20), (25, 30)]
>>> merge_tuples([(1, 2), (3, 4)])
[(1, 2), (3, 4)]
"""

5. Examples where the test cases are written in a descriptive format, rather than being
structured in a doctest format. For example:
Python
def sum_of_squares(numbers: list[int]) -> int:
"""
Calculates the sum of squares of integers in a list.

Args:
numbers (list[int]): A list of integers.

Returns:
int: The sum of the squares of the integers.

Test Cases:
1. Input: [1, 2, 3, 4]
Expected Output: 30
Explanation: 1^2 + 2^2 + 3^2 + 4^2 = 30.

2. Input: [-1, -2, -3]


Expected Output: 14
Explanation: Squares of negative numbers are positive: 1 + 4 + 9 =
14.

3. Input: []
Expected Output: 0
Explanation: An empty list results in a sum of 0.

4. Input: [0, 5, 10]


Expected Output: 125
Explanation: 0^2 + 5^2 + 10^2 = 125.
"""

3.2.2 Code Fixing

Prompt: {A request in natural language to fix the given erroneous code snippet.}

Response: {A natural language explanation describing the error in the given code snippet, followed by
the corrected code snippet along with an explanation of the corrections made.} ​
3.2.3 Test-case generation / Output generation

3.2.3.1 Test-case generation

Prompt: {A request in natural language to generate test cases for a given code snippet.}

Response: {A diverse set of test cases covering all edge cases necessary to test the correctness of the
given code. Please also include a concise summary of the code.}

3.2.3.2 Output generation

Prompt: {A request in natural language to predict the output for the given code snippet for a given input.}

Response: {The expected output for a given input, along with an explanation behind the generated
output.}

3.2.4 Code explanation

Prompt: {A request in natural language to generate an explanation for the given code snippet.}

Response: {A concise natural language explanation for the given code snippet.}

4. Complete examples

4.1 Code Generation/Completion

4.1.1 Code Generation

Prompt:
Description:
You are given a list of tasks and their dependencies. Each task is represented by a unique
character, and dependencies are given as pairs of tasks where the first task must be completed
before the second task can start. Your goal is to determine if it's possible to complete all tasks
and, if so, return one possible order in which the tasks can be completed.

Input:
A list of tasks represented by unique characters (e.g., ['A', 'B', 'C', ...]).
A list of dependency pairs where each pair is a tuple of two tasks (e.g., [('A', 'B'), ('B', 'C'), ...]).

Output:
If it's possible to complete all tasks, return one possible order in which the tasks can be
completed as a list of characters (e.g., ['A', 'B', 'C', ...]).
If it's not possible to complete all tasks due to circular dependencies, return None.

Example:
tasks = ['A', 'B', 'C', 'D']
dependencies = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('B', 'D')]

Output:
None # Circular dependency detected (A -> B -> C -> A)

tasks = ['A', 'B', 'C', 'D']


dependencies = [('A', 'B'), ('B', 'C'), ('C', 'D')]

Output:
['A', 'B', 'C', 'D'] # One possible order Constraints:
The number of tasks will be between 1 and 100.
The number of dependencies will be between 0 and 1000.

Response:
To determine if it's possible to complete all tasks and find one possible order, we can use a
topological sort algorithm.
Here's a step-by-step approach to solve this problem:

Build the Graph:


Create a dictionary to represent the graph where each task is a key and its value is a list of
tasks that depend on it.

Count Incoming Edges:


Create a dictionary to count the number of incoming edges (dependencies) for each task.

Initialize the Queue:


Create a queue to hold tasks with no incoming edges (i.e., tasks that can be started
immediately).

Topological Sort:
While the queue is not empty, dequeue a task, add it to the result list, and reduce the incoming
edge count for each of its dependent tasks. If a dependent task's incoming edge count reaches
zero, add it to the queue.
If at any point a task is revisited before all its dependencies are resolved, a circular dependency
is detected.

Check for Circular Dependencies:


If the result list does not contain all tasks, there is a circular dependency, and it's not possible to
complete all tasks.

Here's the implementation of the above approach:

Python
from collections import deque, defaultdict

def find_order(tasks, dependencies):


# Step 1: Build the graph
graph = defaultdict(list)
in_degree = {task: 0 for task in tasks}

for dependency in dependencies:


graph[dependency[0]].append(dependency[1])
in_degree[dependency[1]] += 1

# Step 2: Initialize the queue with tasks having no incoming edges


queue = deque([task for task in tasks if in_degree[task] == 0])

# Step 3: Perform topological sort


result = []
while queue:
task = queue.popleft()
result.append(task)
for dependent_task in graph[task]:
in_degree[dependent_task] -= 1
if in_degree[dependent_task] == 0:
queue.append(dependent_task)

# Step 4: Check if all tasks were processed


if len(result) == len(tasks):
return result
else:
return None

# Example usage:
tasks = ['A', 'B', 'C', 'D']
dependencies = [('A', 'B'), ('B', 'C'), ('C', 'A'), ('B', 'D')]
print(find_order(tasks, dependencies)) # Output: None

tasks = ['A', 'B', 'C', 'D']


dependencies = [('A', 'B'), ('B', 'C'), ('C', 'D')]
print(find_order(tasks, dependencies)) # Output: ['A', 'B', 'C', 'D']

This algorithm efficiently determines if a valid order exists and returns one possible order if it
does. If a circular dependency is detected, it returns None.

4.1.2 Code Completition

Prompt:

Python
def is_valid_parentheses(s):
"""
Write a function to check if the given string of parentheses is valid. A
string of parentheses is valid if every opening parenthesis has a
corresponding closing parenthesis in the correct order.

Additionally, the function should handle nested parentheses of different


types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()") True


>>> is_valid_parentheses("([])") True
>>> is_valid_parentheses("({[}])") False
>>> is_valid_parentheses("a(b)c") True
>>> is_valid_parentheses(")(") False
"""

Response:

Python
def is_valid_parentheses(s):
"""
Write a function to check if the given string of parentheses is valid. A
string of parentheses is valid if every opening parenthesis has a
corresponding closing parenthesis in the correct order.

Additionally, the function should handle nested parentheses of different


types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()") True


>>> is_valid_parentheses("([])") True
>>> is_valid_parentheses("({[}])") False
>>> is_valid_parentheses("a(b)c") True
>>> is_valid_parentheses(")(") False
"""

stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop() if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
​ stack.append(char)
return not stack

Explanation:
The function uses a stack to keep track of the opening parentheses.
It iterates through each character in the string.
If the character is a closing parenthesis, it checks if the stack is empty or if the top element of
the stack does not match the corresponding opening parenthesis.
If either condition is true, the string is invalid.
If the character is an opening parenthesis, it is pushed onto the stack.
After iterating through the string, if the stack is empty, it means all parentheses were properly
matched and nested, so the string is valid. Otherwise, it is invalid.

Test Cases:
is_valid_parentheses("()") returns True
is_valid_parentheses("([])") returns True
is_valid_parentheses("({[}])") returns False
is_valid_parentheses("a(b)c") returns True
is_valid_parentheses(")(") returns False

4.2 Code fixing

Prompt:
I'm working on writing a code snippet to check if the given stream of parenthesis is balanced but
it’s not working as expected. Can you help me fix it ?

Python
def is_valid_parentheses(s):
"""
Write a function to check if the given string of parentheses is valid.
A string of parentheses is valid if every opening parenthesis has a
corresponding closing parenthesis in the correct order.
Additionally, the function should handle nested parentheses of different
types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()")
True
>>> is_valid_parentheses("([])")
True
>>> is_valid_parentheses("({[}])")
False
>>> is_valid_parentheses("a(b)c")
True
>>> is_valid_parentheses(")(")
False
"""
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop(0) if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
stack.append(char)
return not stack

Response:
Your code has an issue with how you handle the stack. Specifically, you're using `stack.pop(0)`,
which removes the first element from the stack, but stack operations should follow the
Last-In-First-Out (LIFO) principle, where you pop the last element.

Here's the corrected version of your function:

Python
def is_valid_parentheses(s):
"""
Write a function to check if the given string of parentheses is valid.
A string of parentheses is valid if every opening parenthesis has a
corresponding closing parenthesis in the correct order.
Additionally, the function should handle nested parentheses of different
types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()")
True
>>> is_valid_parentheses("([])")
True
>>> is_valid_parentheses("({[}])")
False
>>> is_valid_parentheses("a(b)c")
True
>>> is_valid_parentheses(")(")
False
"""
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
# Pop the top of the stack
top_element = stack.pop() if stack else '#'
# Check is the mapping is correct
if mapping[char] != top_element:
return False
elif char in mapping.values():
# Push opening brackets onton the stack
stack.append(char)
return not stack

Key Changes:
1. Stack Pop:
-​ Changed `stack.pop(0)` to `stack.pop()`. This ensures LIFO behavior, as the last added
element is removed first
2. Edge Case Handling
-​ Ensured the function gracefully handles an empty stack by using a placeholder (`'#'`).
3. Return Value:
-​ The function now correctly checks if the stack is empty at the end to determine if all
parentheses were properly closed.

### Testing:

Python
print(is_valid_parentheses("()")) # True
print(is_valid_parentheses("([])")) # True
print(is_valid_parentheses("({[}])")) # False
print(is_valid_parentheses("a(b)c")) # True
print(is_valid_parentheses(")(")) # False
print(is_valid_parentheses("{[()]}")) # True
print(is_valid_parentheses("{[(])}")) # False

This corrected code should now work as expected.


4.3 Test-case generation / output generation

4.3.1 Test-case generation

Prompt:
I have written a code snippet to check whether the given stream of parenthesis is balanced or
not. Can you help me generate a few test cases to validate the correctness of my code?

Python
def is_valid_parentheses(s):
"""
Write a function to check if the given string of parentheses is valid.
A string of parentheses is valid if every opening parenthesis has a
corresponding closing parenthesis in the correct order.
Additionally, the function should handle nested parentheses of different
types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()")
True
>>> is_valid_parentheses("([])")
True
>>> is_valid_parentheses("({[}])")
False
>>> is_valid_parentheses("a(b)c")
True
>>> is_valid_parentheses(")(")
False
"""
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop() if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
stack.append(char)
return not stack
Response:
Here are several test cases to validate the correctness of your function is_valid_parentheses.
These test cases cover various scenarios, including empty strings, nested structures, incorrect
nesting, and non-parenthesis characters:

Python
# Test case 1: Empty string
assert is_valid_parentheses("") == True​# An empty string is considered
balanced.

# Test case 2: Simple balanced parentheses


assert is_valid_parentheses("()") == True
assert is_valid_parentheses("[]") == True
assert is_valid_parentheses("{}") == True

# Test case 3: Simple unbalanced parentheses


assert is_valid_parentheses("(") == False
assert is_valid_parentheses(")") == False
assert is_valid_parentheses("{[}") == False

# Test case 4: Nested balanced parentheses


assert is_valid_parentheses("([])") == True
assert is_valid_parentheses("{[()]}") == True
assert is_valid_parentheses("([{}])") == True

# Test case 5: Nested unbalanced parentheses


assert is_valid_parentheses("([)]") == False
assert is_valid_parentheses("{[(])}") == False
assert is_valid_parentheses("{[}") == False

# Test case 6: Additional characters in string


assert is_valid_parentheses("a(b)c") == True
assert is_valid_parentheses("a[b]c{d}") == True
assert is_valid_parentheses("a(b[c{d}e]f)g") == True
assert is_valid_parentheses("a(b[c{d)e]f}g") == False

# Test case 7: String with only non-parentheses characters


assert is_valid_parentheses("abcdef") == True # No parentheses, so it's valid.

# Test case 8: Edge cases


assert is_valid_parentheses("(((((((") == False # All opening, no closing.
assert is_valid_parentheses(")))))))") == False # All closing, no opening.
assert is_valid_parentheses("((") == False # Incomplete opening.
assert is_valid_parentheses("))") == False # Incomplete closing.
# Test case 9: Mixed nested and unbalanced
assert is_valid_parentheses("[(])") == False # Incorrectly nested.
assert is_valid_parentheses("{[}]") == False # Incorrectly nested.

# Test case 10: Large balanced and unbalanced strings


assert is_valid_parentheses("()" * 1000) == True # Large balanced.
assert is_valid_parentheses("(" * 1000 + ")" * 999) == False # Missing one
closing parenthesis.

You can use these test cases to ensure your function handles a wide range of input
scenarios effectively.

4.3.2 Output Generation

Prompt:
I have written a code snippet to check whether the given stream of parenthesis is balanced or
not. Below is my code snippet:

Python
def is_valid_parentheses(s):
"""
Write a function to check if the given string of parentheses is valid.
A string of parentheses is valid if every opening parenthesis has a
corresponding closing parenthesis in the correct order.
Additionally, the function should handle nested parentheses of different
types (e.g., '(', ')', '[', ']', '{', '}') and ensure they are properly
matched and nested.
The function should also handle additional characters that are not
parentheses, ignoring them.

>>> is_valid_parentheses("()")
True
>>> is_valid_parentheses("([])")
True
>>> is_valid_parentheses("({[}])")
False
>>> is_valid_parentheses("a(b)c")
True
>>> is_valid_parentheses(")(")
False
"""
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop() if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
stack.append(char)
return not stack

Can you tell me what would be the output of this function if my input is '(())))'?

Response:
Let's analyze the code step-by-step for the input (()))):

Initialize Variables:
stack = []
mapping = {')': '(', '}': '{', ']': '['}

Process Each Character:


For the first character '(':
'(' is not in mapping, but it is in mapping.values(), so it is pushed onto the stack.
stack = ['(']

For the second character '(' :


Similarly, '(' is pushed onto the stack.
stack = ['(', '(']

For the third character ')':


')' is in mapping, so the code tries to pop the top element of the stack. The top
element is '(', which matches mapping[')'].
stack = ['(']

For the fourth character ')':


')' is in mapping, so the code tries to pop the top element of the stack. The top
element is '(', which matches mapping[')'].
stack = []

For the fifth character ')':


')' is in mapping, so the code tries to pop the top element of the stack. However,
the stack is empty (stack = []), so top_element = '#'.

Since mapping[')'] (which is '(') does not match top_element ('#'), the function
returns False.

Final Output:
The function returns False for the input (()))), as there are unmatched closing
parentheses.

4.4 Code Explanation

Prompt:
Can you explain what this code snippet is trying to achieve?

Python
def is_valid_parentheses(s):
stack = []
mapping = {')': '(', '}': '{', ']': '['}
for char in s:
if char in mapping:
top_element = stack.pop() if stack else '#'
if mapping[char] != top_element:
return False
elif char in mapping.values():
stack.append(char)
return not stack

Response:
This code snippet is designed to determine if a string of parentheses is valid.
Here's a breakdown of its functionality:

1. **Initialization**:
-​ A stack is initialized to keep track of opening parentheses.
-​ A dictionary `mapping` is created to map closing parentheses to their corresponding
opening parentheses.

2. **Iteration**:
-​ The function iterates through each character in the string `s`.
-​ If the character is a closing parenthesis (`)`, `}`, or `]`), it checks the top element of the
stack:
-​ If the stack is empty, it uses a placeholder (`'#'`) to indicate there’s no matching
opening parenthesis.
-​ If the top element of the stack does not match the corresponding opening
parenthesis specified in the `mapping`, the function returns `False`.
-​ If the character is an opening parenthesis ( `(`, `{`, or `[`), it is pushed onto the
stack.

3. **Final Check**:
-​ After iterating through the string, the function returns `True` if the stack is empty
(indicating all parentheses were properly matched and closed), otherwise `False`.

In summary, the function checks if the parentheses in the string are balanced and
correctly nested

You might also like