This Sounds Unclear Evaluating ChatGPT Capability in Translating End-User Prompts Into Ready-to-Deploy Python Code

This study evaluates ChatGPT-4's ability to translate natural language instructions from end users into deployable Python code for smart home automation. The results indicate that ChatGPT-4 can generate coherent code and identify ambiguities in user prompts, achieving a 94% accuracy rate, although many ambiguities remain unresolved, which could impact safety and security. The findings suggest a need for improved interaction paradigms to enhance the usability of generated code for non-expert users.

Uploaded by

Zhigen Wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views4 pages

This Sounds Unclear Evaluating ChatGPT Capability in Translating End-User Prompts Into Ready-to-Deploy Python Code

Uploaded by

Zhigen Wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

"This Sounds Unclear": Evaluating ChatGPT Capability in

Translating End-User Prompts into Ready-to-Deploy Python Code

Margherita Andrao Diego Morra Teresa Paccosi
Università di Trento Politecnico di Milano Università di Trento
Fondazione Bruno Kessler Milan, Italy Fondazione Bruno Kessler
Trento, Italy Trento, Italy

Maristella Matera Barbara Treccani Massimo Zancanaro

Politecnico di Milano Università di Trento Università di Trento
Milan, Italy Trento, Italy Fondazione Bruno Kessler
Trento, Italy

ABSTRACT 1 INTRODUCTION
In this paper, we present a study aimed at evaluating how ChatGPT- AI-assisted code generator capabilities by Large Language Models
4 understands end-users’ natural language instructions to express (LLMs) are paving the way for new possibilities in the future of soft-
automation rules for smart home applications and how it translates ware development. While platforms such as Stack Overflow have
them into Python code ready to be deployed. Our study used 34 previously offered entry-level dedicated support to those with at
natural language instructions written by end users who were asked least some programming knowledge, the capability of models such
to automate scenarios presented as visual animations. The results as ChatGPT-4, CoPilot, and other specialized LLMs to generate code
show that ChatGPT-4 can produce coherent and effective code even from natural language prompts is becoming increasingly pervasive
if the instructions present ambiguities or unclear elements, under- among both beginner and expert programmers [11]. However, the
standing natural language instructions and autonomously resolving existing literature still assumes that prompts are produced by ex-
94% of them. However, the generated code still contains numerous perts in software development who can clearly and unambiguously
ambiguities that could potentially affect safety and security aspects. articulate the requirements. Nevertheless, End-User Development
Nevertheless, when appropriately prompted, ChatGPT-4 can subse- (EUD), especially in the field of home automation, is one of the
quently identify those ambiguities. This prompts a discussion about domains where this advancement has the potential to establish new
prospective interaction paradigms that may significantly improve standards. While the availability of user-friendly interfaces for com-
the immediate usability of the generated code. mercial microcontrollers and sensors for the domotic Internet of
Things (IoT) increases yearly, specific programming skills are still
CCS CONCEPTS required to orchestrate and customize their operations. Research
in EUD has proposed several approaches to facilitate naive users
• Human-centered computing → Human computer interac-
in defining those operations themselves, even without the need to
tion (HCI); HCI design and evaluation methods; User studies;
acquire technical skills. Using trigger-action rules has proven an
effective approach [7]. Enabling the creation of ready-to-deploy
KEYWORDS trigger-action rules from user-generated unconstrained natural lan-
End-user development (EUD), Large language models (LLMs), ChatGPT- guage (NL) may significantly democratize the creation of home
4, Task-automation systems automation systems. However, there is still a need to explore the
capability of LLMs to interpret instructions from naive users and
ACM Reference Format: generate ready-to-use code. The ability to interpret incorrect or
Margherita Andrao, Diego Morra, Teresa Paccosi, Maristella Matera, Barbara ambiguous prompts that may not adhere to the structured format
Treccani, and Massimo Zancanaro. 2024. "This Sounds Unclear": Evaluating typical of trigger-action rules is crucial. This is especially important
ChatGPT Capability in Translating End-User Prompts into Ready-to-Deploy considering potential applications that aim to bridge non-expert
Python Code. In International Conference on Advanced Visual Interfaces 2024 user needs with rule-based systems.
(AVI 2024), June 03–07, 2024, Arenzano, Genoa, Italy. ACM, New York, NY,
This paper contributes to the ongoing discussion by presenting
USA, 4 pages. https://doi.org/10.1145/3656650.3656693
a study that explores how ChatGPT-4 understands ambiguous re-
quests provided by end users and how it identifies and corrects
ambiguities in the generated code.

This work is licensed under a Creative Commons Attribution-NoDerivs International

4.0 License.
2 RELATED WORKS
AVI 2024, June 03–07, 2024, Arenzano, Genoa, Italy EUD investigates how naive users and non-professional develop-
© 2024 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-1764-2/24/06 ers can be enabled to create, modify, or extend software systems
https://doi.org/10.1145/3656650.3656693 using a range of methods, techniques, and tools [3, 8]. Cloud-based
AVI 2024, June 03–07, 2024, Arenzano, Genoa, Italy Andrao, et al.

platforms, such as IFTTT1 , assist users in the definition of trigger- contained unclear structure, terminology, or details that cast doubt
action rules for task-automation systems [5]. Research has explored on the outcome due to subjective interpretation and as (ii) complete
the effectiveness of composition paradigms, including innovative if the instruction included all the elements (state, event, action)
visual paradigms and conversational-based approaches, broaden- presented in the scenario, or incomplete if one or more elements
ing the scope of user interaction with smart-home technologies were left implicit.
[2, 6, 10]. Recent works have turned attention to the influence of Only the 34 instructions evaluated as ambiguous and complete
LLMs in this domain, emphasizing that interacting with ChatGPT by both researchers were considered for further analyses. For each
can be challenging for end users who are not expert programmers rule, we provided ChatGPT-4 with the instructions, the list of sen-
due to NL ambiguities that hinder code generation [10, 13]. sors and smart devices involved in the scenario (the context) and
The quality and reliability of code generated by LLMs are increas- "questioned" it asking to: (i) Generate Python code ready to deploy
ing [9, 14, 15]. Some works highlighted ChatGPT’s outperforming (the prompt was: "Given the following rule and the following context,
ability in generating and solving code problems in comparison to make a Python code ready to be deployed. Just output the code without
other models [1, 12]. Less has been done to assess how the interpre- any further comments." ); (ii) Identify errors and ambiguities in the
tation of user prompts affects the output of reliable and functional rule written by the user (the prompt was: "Given the following rule
code ready for deployment [13]. Further investigation is required and the context, identify if there are possible ambiguities or errors in
to explore the reliability, cleanliness, and security of the code gener- the way the instructions were written." ).
ated from inadequate or incomplete prompts. Especially in the EUD Results. The 34 Python code snippets generated by ChatGPT-
of IoT systems, where non-expert users are tasked with program- 4 were analyzed by two independent researchers to evaluate the
ming systems, security is crucial and makes these considerations correctness, resulting in an accuracy rate of 94%. Only two of the
particularly relevant. [4]. generated codes were deemed incorrect. The first one lacked a
condition explicitly stated in the prompt, while the second one
3 THE STUDY overlooked an adverb that establishes the need for a time interval
Our study aimed to explore ChatGPT-42 capability to accurately for the rule to be correctly executed. When prompted to identify
generate correct code from trigger-action rules for home automa- ambiguities in the 34 natural language instructions, ChatGPT-4 de-
tion described by users using NL. Additionally, we aimed to examine tected 237 ambiguities in total, from a minimum of 3 to a maximum
how ChatGPT-4 assists users in recognizing ambiguities and de- of 10 for each instruction (M = 6.97, SD = 1.45). Two researchers
tecting potential errors to create clearer, more accurate, and safer independently coded the descriptions of ambiguities and then ex-
trigger-action rules. amined if and how ChatGPT-4 had autonomously solved these
Participants: Sixteen (16) participants, eight females and eight ambiguities in the respective ready-to-deploy Python code snippet.
males, aged between 24 and 60 (M = 32.81; SD = 11.44) were in- Four main themes emerged from the ambiguities’ descriptions.
volved in the study. Two had no prior experience with programming Theme 1: Ambiguities related to the outcome of the rule
languages or home automation tools; six had minimal experience and its variables (83 instances). Eleven (11) ambiguities involved
with smart home environments but no programming experience. uncertainties regarding the end of the action ("The rule specifies
The remaining eight (M = 4; F = 4) were expert programmers. Five ’after 7 PM’ but does not indicate until when this rule applies." ). Nine
of them had experience with smart home environments. All par- (9) involved ambiguities related to temporal aspects as the rule did
ticipants were native Italian speakers, and the instructions were not clearly state the sequence/simultaneity of occurrences to trigger
produced in Italian. the action and two (2) involved suggestions for additional sensors
Methods and procedure. Participants were exposed to 12 sce- or devices that, if integrated, can ensure the needed outcome. Other
narios of smart home automation presented as silent video (to avoid 39 involved possible ambiguities in error handling ("The rule does
language bias), and they were asked to write the rules to imple- not account for what should happen if the door fails to open or close." ),
ment that automation. Each video lasted around 15 seconds and and specifically, in handling possible or imaginary conflicts, false
represented a combination of one state (e.g., "it’s daytime"), one negatives or positive triggers, or possible safety hazards. Finally, 22
event (e.g., "the temperature rises"), and one action (e.g., "open the described possible undesirable outcomes (e.g., automatically turning
windows"). The initial segment of each video portrayed the smart on a fireplace based on temperature and motion) associated with
home’s response when the state is false (e.g., it’s night, the temper- potentially harmful outcomes for safety, security risks, damages,
ature rises, and no action occurs), followed by the representation and high/unusual energy consumption. Looking at the Python code
when the state is true. Each session was individual, and each partic- snippets, 13 of these ambiguities were fully resolved by ChatGPT-4,
ipant had to write the instructions in natural language to explain four (4) were partially solved, and 66 were not addressed in the
to an "intelligent system" how to implement that automation. generated code.
Data analysis. In total, 199 instructions were collected (some Theme 2: Ambiguities related to the language (55 instances).
participants wrote multiple independent/alternative rules to define Twenty-one (21) ambiguities involved word usage in formulating
the same scenario). Two researchers independently coded the 199 instructions ("The rule mentions activating an evacuation plan but
instructions as: (i) non-ambiguous if the instruction provided clear does not provide details on what the plan entails." ), with some words
references to conditions and actions or ambiguous if the instruction being overly specific, others being too generic or unconventional.
Twenty-four (24) were related to the use of generic expressions,
1 https://ifttt.com/ employing words instead of defining specific values ("The rule does
2 the latest available version at the study time in November 2023 not specify what constitutes ’day.’ Is it based on specific hours?") In
"This Sounds Unclear": Evaluating ChatGPT Capability in Translating End-User Prompts AVI 2024, June 03–07, 2024, Arenzano, Genoa, Italy

in resolving ambiguities related to the second theme language, suc-

cessfully addressing 31 out of 55 cases. This is frequently attributed
to its skill in interpreting incorrect verbal aspects or unconventional
temporal and grammatical structures. However, ChatGPT-4 shows
less expertise in dealing with ambiguities associated with the first
theme outcome, as it replicated the same ambiguities found in the
prompt in approximately 66 out of 80 cases. Notably, ChatGPT-4
tends to overlook two specific categories: error handling and am-
biguities that might lead to unintended effects. Additionally, there
were instances in each theme where ChatGPT-4 only partially re-
solved ambiguities. For example, in a case involving feedback to
house occupants, the code included code debugging-level warn-
ing messages but not direct messages to the occupants. Another
case involved the figurative interpretation of values, like defining
"darkness" in a sentence. ChatGPT-4 identified the ambiguity but
inadequately addressed it in the generated code. The code incorpo-
rated a logic to read data from the light sensor as a trigger, but it
did not give the user a structure to set a threshold for the sensor
data, offering only boolean true/false options for action triggering.

Figure 1: Distribution of solved ambiguities by theme (above)

and identification code (below). 4 DISCUSSION AND CONCLUSION
The results provide evidence of the potential of LLMs, such as
ChatGPT-4, in understanding NL instructions and translating them
into functional code. Despite the ambiguities in the formulation,
only one (1) instance, the ambiguity was related to an ambiguous
as well as the simplicity of the prompts and the context provided,
typo in a rule. In three (3) cases, the ambiguity was related to
ChatGPT-4 was able to consistently generate syntactically accurate
the use of conjunctions, particularly the expression "and/or". In
and complete Python code. Furthermore, ChatGPT-4 could also
three (3), the ambiguity was related to a rule composed of multiple
detect many ambiguities in the users’ expressions. Nevertheless,
sentences while, in other three (3), it was related to the language
it fails, if not directly prompted, to properly recognize 67% of the
used in general ("If the smart home system’s programming interface
ambiguities. Although these fails do not directly impact the gen-
is in English or another language, the rule should be translated and
erated code’s correctness, they can bring to undesired effects or
formatted according to the system’s requirements."). In this group,
security issues. In our scenario, these aspects are even worse since
31 ambiguities were fully resolved in the code, three (3) were only
our users would not be able to control the Python code and spot the
partially resolved, and 21 had not been addressed.
issues. From our results, we can argue that an effective interaction
Theme 3: Ambiguities related to the necessity of content
for novice users should not rely on a direct code generation but
extension (55 instances). This included: 30 cases of suggestions on
it should involve ChatGPT-4 in initially identifying ambiguities,
specifying alternative states/conditions to avoid ambiguities; nine
followed by iterative resolution processes that employ negotiation
(9) cases of determining when/how often to monitor the state; six
to ensure the accurate generation of code. We acknowledge some
(6) cases for providing details of the action ("The rule does not specify
limitations in our study, starting from the limited number of partic-
how much the curtains should close." ), and 10 cases concerning the
ipants that can affect the generalizability of the emerged themes.
need of confirmation/alert feedback ("The rule does not include any
Additionally, ChatGPT-4 ability to address ambiguities could be
provisions for alerting the occupants of the home that the window
mitigated by future model releases, emphasizing the importance
will be closed. This could be important for awareness and safety.").
of continuing research in this area. Nevertheless, we believe that
Of these ambiguities, 14 were fully resolved in the generated code,
this first study on automatically generated code from end users’
four (4) partially, while 37 were not addressed.
natural language requests may help in future research such as
Theme 4: Ambiguities related to the system’s functioning
on the design of effective interaction paradigms that facilitate the
(44 instances). This related to: 27 suggestions/comments on sen-
negotiation process to mitigate the ambiguities. This will require
sor localization and integration within a broader system; 17 ac-
an extended research-through-design approach that addresses the
curacy/sensitivities of sensors that can lead to false positives or
multiple aspects emerging from the study outlined in this paper.
negatives triggers ("It mentions a movement sensor, but movement
sensors can sometimes give false positives or false negatives."). Of
these instances, 19 ambiguities were fully resolved, one (1) was
partially resolved, and 24 were not addressed. ACKNOWLEDGMENTS
Overall, ChatGPT-4 was able to fully resolve 28% of the ambigui- This research received partial support from the PNRR project FAIR-
ties above and to partially resolve 5% of them, while 67% were not Future AI Research (PE00000013), under the NRRP MUR program
addressed (see Fig 1). The analysis reveals ChatGPT-4’s proficiency funded by the NextGenerationEU.
AVI 2024, June 03–07, 2024, Arenzano, Genoa, Italy Andrao, et al.

REFERENCES Transactions on Computer-Human Interaction (TOCHI) 24, 2 (2017), 1–33.

[1] Imtiaz Ahmed, Ayon Roy, Mashrafi Kajol, Uzma Hasan, Partha Protim Datta, and [8] Henry Lieberman, Fabio Paternò, Markus Klann, and Volker Wulf. 2006. End-User
Md Rokonuzzaman Reza. 2023. ChatGPT vs. Bard: a comparative study. Authorea Development: An Emerging Paradigm. Springer Netherlands, Dordrecht, 1–8.
Preprints (2023). https://doi.org/10.1007/1-4020-5386-X_1
[2] Margherita Andrao, Fabrizio Balducci, Bernardo Breve, Federica Cena, Giuseppe [9] Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li
Desolda, Vincenzo Deufemia, Cristina Gena, Maristella Matera, Andrea Mattioli, Li, Xuan-Bach D. Le, and David Lo. 2023. Refining ChatGPT-Generated Code:
Fabio Paternò, et al. 2023. Understanding Concepts, Methods and Tools for End- Characterizing and Mitigating Code Quality Issues. arXiv:2307.12596 [cs.SE]
User Control of Automations in Ecosystems of Smart Objects and Services. In [10] Alberto Monge Roffarello and Luigi De Russis. 2023. Defining Trigger-Action
International Symposium on End User Development. Springer, 104–124. Rules via Voice: A Novel Approach for End-User Development in the IoT. In
[3] Carmelo Ardito, Maria F. Costabile, Giuseppe Desolda, Marco Manca, Maristella International Symposium on End User Development. Springer, 65–83.
Matera, Fabio Paternò, and Carmen Santoro. 2019. Improving Tools that Allow [11] Shuyin Ouyang, Jie M Zhang, Mark Harman, and Meng Wang. 2023. LLM is
End Users to Configure Smart Environments. In End-User Development, Alessio Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation.
Malizia, Stefano Valtolina, Anders Morch, Alan Serrano, and Andrew Stratton arXiv preprint arXiv:2308.02828 (2023).
(Eds.). Springer International Publishing, Cham, 244–248. [12] Stephen R. Piccolo, Paul Denny, Andrew Luxton-Reilly, Samuel H. Payne, and
[4] Bernardo Breve, Giuseppe Desolda, Francesco Greco, and Vincenzo Deufemia. Perry G. Ridge. 2023. Evaluating a large language model’s ability to solve program-
2023. Democratizing Cybersecurity in Smart Environments: Investigating the ming exercises from an introductory bioinformatics course. PLOS Computational
Mental Models of Novices and Experts. In International Symposium on End User Biology 19, 9 (09 2023), 1–16. https://doi.org/10.1371/journal.pcbi.1011511
Development. Springer, 145–161. [13] Gian Luca Scoccia. 2023. Exploring Early Adopters’ Perceptions of ChatGPT
[5] Miguel Coronado and Carlos A. Iglesias. 2016. Task Automation Services: as a Code Generation Tool. In 2023 38th IEEE/ACM International Conference on
Automation for the Masses. IEEE Internet Computing 20, 1 (2016), 52–58. Automated Software Engineering Workshops (ASEW). 88–93. https://doi.org/10.
https://doi.org/10.1109/MIC.2015.73 1109/ASEW60602.2023.00016
[6] Giuseppe Desolda, Carmelo Ardito, and Maristella Matera. 2017. Empowering [14] Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating
end users to customize their smart environments: model, composition paradigms, the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on
and domain-specific tools. ACM Transactions on Computer-Human Interaction GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. arXiv:2304.10778 [cs.SE]
(TOCHI) 24, 2 (2017), 1–52. [15] Li Zhong and Zilong Wang. 2023. Can ChatGPT replace StackOverflow? A
[7] Giuseppe Ghiani, Marco Manca, Fabio Paternò, and Carmen Santoro. 2017. Per- Study on Robustness and Reliability of Large Language Model Code Generation.
sonalization of context-dependent applications through trigger-action rules. ACM arXiv:2308.10335 [cs.CL]

Benchmark Design Considerations
No ratings yet
Benchmark Design Considerations
6 pages
Maximizing Productivity With Chatgpt
100% (1)
Maximizing Productivity With Chatgpt
130 pages
Natural Language To Code: Improving Semantic Reasoning in Code Generation Models
No ratings yet
Natural Language To Code: Improving Semantic Reasoning in Code Generation Models
10 pages
Thesis
No ratings yet
Thesis
83 pages
KIDE4I A Generic Semantics-Based Task-Oriented Dia
No ratings yet
KIDE4I A Generic Semantics-Based Task-Oriented Dia
26 pages
Anjana Tiha Masters Project
No ratings yet
Anjana Tiha Masters Project
26 pages
Tong King Lee - Artificial Intelligence and Posthumanist Translation Chat GPT Versus The Translator
No ratings yet
Tong King Lee - Artificial Intelligence and Posthumanist Translation Chat GPT Versus The Translator
22 pages
Evaluation On ChatGPT For Chinese Language
No ratings yet
Evaluation On ChatGPT For Chinese Language
19 pages
Educational Facilitator Cover Letter
100% (2)
Educational Facilitator Cover Letter
5 pages
Iratj 08 00240
No ratings yet
Iratj 08 00240
6 pages
Thesis
No ratings yet
Thesis
82 pages
Emerging Technology
No ratings yet
Emerging Technology
32 pages
ChatGPT How To Use
No ratings yet
ChatGPT How To Use
37 pages
2023 Ranlp-1 18
No ratings yet
2023 Ranlp-1 18
11 pages
A Year of ChatGPT Translators Attitudes and Degre
No ratings yet
A Year of ChatGPT Translators Attitudes and Degre
18 pages
Legal Chat Advisory
No ratings yet
Legal Chat Advisory
26 pages
Team Omega
No ratings yet
Team Omega
6 pages
ChatGPT: Applications, Opportunities, and Threats
No ratings yet
ChatGPT: Applications, Opportunities, and Threats
13 pages
RV4Chatbot: Are Chatbots Allowed To Dream of Electric Sheep?
No ratings yet
RV4Chatbot: Are Chatbots Allowed To Dream of Electric Sheep?
18 pages
ChatGPT Primer
No ratings yet
ChatGPT Primer
13 pages
How To Create An AI Influencer in 10 Easy Steps - Tech Pilot
No ratings yet
How To Create An AI Influencer in 10 Easy Steps - Tech Pilot
10 pages
Ease 24
No ratings yet
Ease 24
10 pages
AI in Banking Adoption Playbook
No ratings yet
AI in Banking Adoption Playbook
51 pages
CD CS111ggggggggggggggggg
No ratings yet
CD CS111ggggggggggggggggg
54 pages
Sparks of Artificial General Intelligence: Early Experiments With GPT-4
No ratings yet
Sparks of Artificial General Intelligence: Early Experiments With GPT-4
155 pages
5 BP
No ratings yet
5 BP
7 pages
AI-: E GPT-4: Assisted Coding Xperiments With
No ratings yet
AI-: E GPT-4: Assisted Coding Xperiments With
10 pages
3 Chatghibl
No ratings yet
3 Chatghibl
4 pages
Tesis Master Ramon Martinez Jimenez
No ratings yet
Tesis Master Ramon Martinez Jimenez
80 pages
Lesson 06 Advanced ChatGPT
No ratings yet
Lesson 06 Advanced ChatGPT
42 pages
ChatGPTandGPT 4forProfessionalTranslators SaiCheongSiu SSRN Id4448091
No ratings yet
ChatGPTandGPT 4forProfessionalTranslators SaiCheongSiu SSRN Id4448091
37 pages
376-Article Text-930-1-10-20180411
No ratings yet
376-Article Text-930-1-10-20180411
5 pages
2025 Generative Ai in Professional Services Report tr5433489 RGB
No ratings yet
2025 Generative Ai in Professional Services Report tr5433489 RGB
33 pages
Natural Language Understanding Chatbots
No ratings yet
Natural Language Understanding Chatbots
4 pages
2024 ctt-1 2
No ratings yet
2024 ctt-1 2
11 pages
Introduction To Docs and Image Based Voice Chatbots
No ratings yet
Introduction To Docs and Image Based Voice Chatbots
17 pages
本书版权归Kogan Page所有
No ratings yet
本书版权归Kogan Page所有
761 pages
ChatGPT For Robotics Design Principles and Model Abilities
No ratings yet
ChatGPT For Robotics Design Principles and Model Abilities
15 pages
Launches: The Largest AI Legal Prompt Database For Lawyers (1,000,000 and Growing) - Now With Free Trial Access
No ratings yet
Launches: The Largest AI Legal Prompt Database For Lawyers (1,000,000 and Growing) - Now With Free Trial Access
3 pages
Voice Ai Chatbot: Mr. K. Devadas, A. Shanmukha Chandra, A. Akshay, D. Tarun
No ratings yet
Voice Ai Chatbot: Mr. K. Devadas, A. Shanmukha Chandra, A. Akshay, D. Tarun
10 pages
Research Proposal
No ratings yet
Research Proposal
20 pages
Reasechpaperon LLM
No ratings yet
Reasechpaperon LLM
25 pages
Arxiv ChatGPTV7
No ratings yet
Arxiv ChatGPTV7
24 pages
Can Chatgpt Support Developers? An Empirical Evaluation of Large Language Models For Code Generation
No ratings yet
Can Chatgpt Support Developers? An Empirical Evaluation of Large Language Models For Code Generation
5 pages
Reasechpaperon LLM
No ratings yet
Reasechpaperon LLM
25 pages
Arxiv ChatGPT 2024
No ratings yet
Arxiv ChatGPT 2024
36 pages
RM Assignment K
No ratings yet
RM Assignment K
7 pages
JETIR2408186
No ratings yet
JETIR2408186
7 pages
Major Project
No ratings yet
Major Project
13 pages
Major Project PPT 3 2
No ratings yet
Major Project PPT 3 2
13 pages
Innovative OpenAI Seminar Report
100% (2)
Innovative OpenAI Seminar Report
23 pages
Unleashing Your Potential: Writing A Book With ChatGPT AI Technology 171253
No ratings yet
Unleashing Your Potential: Writing A Book With ChatGPT AI Technology 171253
3 pages
FINAL-MIDTERM Major2
No ratings yet
FINAL-MIDTERM Major2
20 pages
Medical Assistance Chatbot
No ratings yet
Medical Assistance Chatbot
4 pages
Review of Chatbot System in Marathi Language:, Manjiri Pittule, Ravina Mane, Amit Rathod
No ratings yet
Review of Chatbot System in Marathi Language:, Manjiri Pittule, Ravina Mane, Amit Rathod
6 pages
Tech AI Magazine - December 2024
No ratings yet
Tech AI Magazine - December 2024
46 pages
Taskweaver: A Code-First Agent Framework: Equal Contribution
No ratings yet
Taskweaver: A Code-First Agent Framework: Equal Contribution
23 pages
Chatbots
No ratings yet
Chatbots
15 pages
Los Placeres de La Lectura
100% (1)
Los Placeres de La Lectura
7 pages
Hith Pariwala
No ratings yet
Hith Pariwala
24 pages
Chatbot To Support The Customer Service Process
No ratings yet
Chatbot To Support The Customer Service Process
9 pages
The Ultimate ChatGPT Guide For Beginners - V1
100% (1)
The Ultimate ChatGPT Guide For Beginners - V1
138 pages
Youtube Automation
No ratings yet
Youtube Automation
3 pages
Engineer Homework
100% (1)
Engineer Homework
5 pages
Making ChatGPT Work For You - A Generative AI Prompt Guide For Public Servants
No ratings yet
Making ChatGPT Work For You - A Generative AI Prompt Guide For Public Servants
9 pages
English For Specific Purposes
No ratings yet
English For Specific Purposes
6 pages
Third Review Chatbot
No ratings yet
Third Review Chatbot
19 pages
Exploring The Capabilities and Limitations of GPT
No ratings yet
Exploring The Capabilities and Limitations of GPT
3 pages
Survey Paper Group25
No ratings yet
Survey Paper Group25
8 pages
How To Start A Presentation (+ Examples)
No ratings yet
How To Start A Presentation (+ Examples)
20 pages
A SWOT Analysis of ChatGPT Implications For Educational Practice and Research
No ratings yet
A SWOT Analysis of ChatGPT Implications For Educational Practice and Research
16 pages
Projects GenAI Pinnacle Program
No ratings yet
Projects GenAI Pinnacle Program
14 pages
Create A ChatGPT-Based App To Control Inventor With Natural Language
No ratings yet
Create A ChatGPT-Based App To Control Inventor With Natural Language
10 pages
Hambatan Dan Tantangan Chat GPT Dalam Translation
No ratings yet
Hambatan Dan Tantangan Chat GPT Dalam Translation
11 pages
Biology Homework 3
100% (1)
Biology Homework 3
6 pages
Recent Deep Learning Based NLP Techniques For Chatbot Development An Exhaustive Survey
No ratings yet
Recent Deep Learning Based NLP Techniques For Chatbot Development An Exhaustive Survey
4 pages
Expert ChatGPT Prompt Guide by The Rundown University
No ratings yet
Expert ChatGPT Prompt Guide by The Rundown University
25 pages
Group Case Study Assignment Fall 2023-2
No ratings yet
Group Case Study Assignment Fall 2023-2
11 pages
AI Content Self-Detection For Transformer-Based Large Language Models
No ratings yet
AI Content Self-Detection For Transformer-Based Large Language Models
12 pages
The Relationship Between AI and Humans
No ratings yet
The Relationship Between AI and Humans
2 pages
ES (AI Module) Student Workbook - Year 1 (English)
No ratings yet
ES (AI Module) Student Workbook - Year 1 (English)
46 pages
How ChatGPT Millionaire
100% (19)
How ChatGPT Millionaire
57 pages
ChatGPT Bias
No ratings yet
ChatGPT Bias
9 pages
ChatGpt PDF
86% (7)
ChatGpt PDF
19 pages
Chatgpt Developer Cheatsheet
100% (1)
Chatgpt Developer Cheatsheet
56 pages

This Sounds Unclear Evaluating ChatGPT Capability in Translating End-User Prompts Into Ready-to-Deploy Python Code

Uploaded by

This Sounds Unclear Evaluating ChatGPT Capability in Translating End-User Prompts Into Ready-to-Deploy Python Code

Uploaded by

"This Sounds Unclear": Evaluating ChatGPT Capability in

Translating End-User Prompts into Ready-to-Deploy Python Code

Maristella Matera Barbara Treccani Massimo Zancanaro

This work is licensed under a Creative Commons Attribution-NoDerivs International

in resolving ambiguities related to the second theme language, suc-

Figure 1: Distribution of solved ambiguities by theme (above)

REFERENCES Transactions on Computer-Human Interaction (TOCHI) 24, 2 (2017), 1–33.

You might also like