0% found this document useful (0 votes)

1 views

On ML-Based Program Translation- Perils and Promises

Uploaded by

zhuyang9158

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

On ML-Based Program Translation- Perils and Promises

Uploaded by

zhuyang9158

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

On ML-Based Program Translation:

Perils and Promises

Aniketh Malyala† Katelyn Zhou† Baishakhi Ray Saikat Chakraborty
Silver Creek High School Silver Creek High School Columbia University Microsoft Research
San Jose, CA, USA San Jose, CA, USA New York, NY, USA Redmond, WA, USA
[email protected] [email protected] [email protected] [email protected]

1
Abstract—With the advent of new and advanced pro- . The Swedish bank Nordea also started their migration
arXiv:2302.10812v1 [cs.PL] 21 Feb 2023

gramming languages, it becomes imperative to migrate in 2020. While such migrations to newer PLs eventually
legacy software to new programming languages. Unsu- save money, the investment for the migration is poten-
pervised Machine Learning-based Program Translation
could play an essential role in such migration, even tially more costly because of PLs adhering to completely
without a sufficiently sizeable reliable corpus of parallel different programming philosophies (e.g., object-oriented
source code. However, these translators are far from vs. functional).
perfect due to their statistical nature. This work in- To address these issues, researchers propose automated
vestigates unsupervised program translators and where
and why they fail. With in-depth error analysis of
tools to convert programs written in one high-level lan-
such failures, we have identified that the cases where guage (e.g., Java) to another high-level language (e.g.,
such translators fail follow a few particular patterns. Python), commonly known as Transpiler or Transcom-
With this insight, we develop a rule-based program piler [7], [8]. Traditionally, Transpilers are rule-based
mutation engine, which pre-processes the input code translations [9]–[11]. A program written in the source
if the input follows specific patterns and post-process
the output if the output follows certain patterns. We
language is represented as an abstract syntax tree, which
show that our code processing tool, in conjunction with is then translated into the target language by hand-
the program translator, can form a hybrid program written rules, a.k.a templates. Such manual rule-driven
translator and significantly improve the state-of-the- translations are not scalable, especially in the presence
art. In the future, we envision an end-to-end program of external libraries and APIs. Furthermore, when the
translation tool where programming domain knowledge
can be embedded into an ML-based translation pipeline
two language structures are very different (e.g., Functional
using pre- and post-processing steps. language Haskell and Procedural Object-oriented language
Index Terms—Code generation, code translation, Java), writing conversion rules may not always be possible.
program transformation Finally, programs generated using such manual rules often
lack readability.
I. Introduction To overcome these issues, researchers proposed Machine
Learning (ML)-based transpilers where ML models trans-
In today’s software development ecosystem, Program- late between two high-level programming languages by
ming Languages (PL) are evolving rapidly, either as new learning the statistical alignments between the two lan-
languages or new features of existing languages. In the guages [12]–[15]. However, getting a meaningful, aligned
past few years, many languages such as Go, Rust, Swift, language corpus is challenging [16], [17]. To this end,
TypeScript, Python3, etc. have become popular. It is often Roziere et al. [8] proposed an unsupervised learning-based
challenging to keep pace with such evolution—developers approach, TransCoder, where alignments between PLs are
trained in one programming language find it hard to adapt learned through back-translation [18]. A program source
to the new paradigm [1]. language is first translated to a target language using
There exists a large body of legacy software written a forward-directional translator. The generated target
in old languages like COBOL, Fortran, etc. Maintain- program is then translated back to the source language
ing them is challenging as present-day developers would using a backward-direction translator. With joint opti-
need to have a good understanding of these outdated mization, these forward-backward translator pairs learn
languages [2]–[6]. Organizations have been investing a the alignments between the source and target languages in
lot to migrate their legacy code to newer programming their respective directions without requiring an explicitly
languages. For example, in 2012, the Commonwealth Bank aligned corpus.
of Australia spent 1 Billion Australian Dollars over the
subsequent five years to migrate its core banking platform 1 https://www.reuters.com/article/us-usa-banks-cobol/banks-
scramble-to-fix-old-systems-as-it-cowboys-ride-into-sunset-
† Equal contribution. idUSKBN17C0D8
Incorrect Translation Original Code
Original Code
int f_gold(int x){ Incorrect Translation
def f_gold(x): while((x!=0) && (x%10==0)){ def f_gold(arr1, arr2, ...):
while(x!=0): ... sorted=[0]*(m+n)
Transcoder Transcoder -1, 0, 0, 0, ...
... } ...
return(sm==x) return(sm==x)?1:0; return sorted[k-1]
}

Post-Processing Pre-Processing

int f_gold(int x){ int f_gold(int[] list1, int[] list2){

while(x!=0){ def f_gold(list1, list2, ...):
int[ ] sorted=new int[m+n];
... sorted=[0]*(m+n)
Transcoder ...
} ...
return sorted1 [ k - 1] ;
return(sm==x)?1:0; return sorted[k-1]
}
}
Pre-Processed Code
Correct Translation Correct Translation

(a) Post-Processing: The TransCoder generated code has (b) Pre-Processing: TransCoder cannot translate a Python
an extra incorrect x%10==0 condition; post-processing removed array parameter correctly. When pre-processing converts the
that. arr variable to list , the TransCoder translates correctly.

Fig. 1: Motivating Example: Python to Java Translation.

It turns out that unsupervised learning can outper- categorize such failures into two distinct categories – (a)
form all the previous approaches. However, since the semantic errors and (b) syntactic errors. With further
TransCoder-based model is entirely driven by the sta- investigation into each of these categories, we observe
tistical properties of the languages, it cannot guarantee that translations prone to semantic errors follow specific
the syntactical or semantic accuracy of the generated human-observable patterns and are amenable to easy post-
code. Figure 1 shows a motivating example. While the processing corroborating hypothesis 1 (see Figure 1a as
TransCoder model almost correctly translated the input an example). In contrast, when models make a syntac-
code in Figure 1a, the translated Java method contains tically invalid translation, we observe that the inputs
an additional conditional clause, x % 10 == 0 . A knowl- follow a few specific patterns and are fixable with input
edgeable developer can further mutate almost correctly program transformation through pre-processing (hypoth-
translated code to obtain greater accuracy, especially if esis 2). Figure 1b shows an example.
common patterns of mistakes the model makes can be ML-based code translation models come with enor-
identified. mous promises. However, without syntactic or semantic
Hypothesis 1. While “unsupervised” translators are not guidance, we cannot exploit their full potential. As a
perfect, their results can be post-processed if we know the proof-of-concept, we incorporate such guidance with a
model’s common patterns of mistakes (i.e., “blind spots”). rule-based transformer that can pre-process and/or post-
process the source code; these transformers can be coupled
In addition, since these models are trained in an ad
with TransCoder to build a hybrid program translator,
hoc, unsupervised way, they do not explicitly learn the
a.k.a. transpiler. Our initial prototype can improve the
syntactic and semantic alignments across language com-
vanilla ML-based TransCoder by 86% for Java to Python
ponents. For instance, the while loop is semantically
translation and 50% for Python to Java translation.
equivalent in Java and Python. However, for loops in
This indicates that guiding the ML model with program-
these two languages are semantically different—Java for
property-aware techniques has significant potential in pro-
loop construction often contains an updated expression
gram translation.
for updating the loop control variable; in Python’s for
loop, such capacity is limited. Thus, TransCoder often II. Study Design
fails to translate a Java for loop to a Python one.
TransCoder is a state-of-the-art and popular model that
Hypothesis 2. Once we identify the model’s inabilities, we accomplishes programming language translation using un-
can systematically mutate the input code to bypass the
common error-producing patterns. supervised learning fueled by a GeeksforGeeks unlabelled
dataset. It is a gigantic transformer-based model trained
In this pilot study, we aim to understand the common on a public Github corpus repository of roughly 2.8 million
pitfalls of TransCoder and how we can improve them. open-source repositories. Yet, the reported accuracy is still
For this purpose, we chose a large open-source unsuper- suboptimal; TransCoder’s performance is evaluated via a
vised program translation model, TransCoder, released metric known as computational accuracy, or the ability
by FaceBook AI [8], which is trained on 128M GitHub of a translated program to produce the same output as
repositories and has recently gained much attention. We the source code when run. The computational accuracy
then performed a rigorous manual study to find common of TransCoder’s Python to Java translation is 68.7%, and
areas where TransCoder fails to translate correctly. We 56.1% the other way around.
To understand what kind of errors TransCoder com- out of 50 Java to Python (J2P) and 19 out of 50 Python
monly makes, we dug deeper into the TransCoder- to Java (P2J) examples suffer from this problem.
generated translations using 100 examples. Two of the Fixes. Once these focal methods are translated in iso-
authors went through the code examples and noted their lation (without the additional context), the TransCoder
findings which were verified by another two authors. For generates the correct output. Figure 2 Row 1 shows an
each case, all of the authors reached a consensus about example. While focal method f gold is called and the
the type of potential error. To this end, we identify some main method is still in the context, TransCoder could not
common error patterns TransCoder is making. Leveraging generate any meaningful translation. However, when we
these findings, we propose a hybrid technique combining remove the additional contexts, the translation accuracy
machine learning and traditional rule-based solutions that significantly improves.
can give an end-to-end solution to the code translation In the rest of the paper, we treat the TransCoder
problem. as a function translator. The translation errors observed
Dataset. Facebook AI’s Github page [19] provided ex- will henceforth be mainly errors that occurred when we
tensive testing data for the TransCoder model taken from singularly translated the functions using the TransCoder.
the GeeksForGeeks dataset. The testing dataset provided 2. Loop Conversion. Vanilla TransCoder performs
is comprised of around 280 files each in Python, Java, poorly while translating complex for loop to while loop,
and C++. Each file has a method, f gold(), which is to especially for Java to Python translation. As Java for
be translated, along with a main method containing test loops generally allow more functionalities than Python
cases. We randomly sampled 50 test cases for both Java for loops (e.g., different increment of the loop variables,
to Python and Python to Java translation analysis. For more variables, more conditions), the TransCoder model
each test case, we used the TransCoder to translate each has difficulty translating complex for loops from Java to
file, analyzed the progress of each translation, and marked Python. Complex for loops appeared in 6 out of 50 sam-
what errors were similar in multiple file translations and ples, and all of them could not produce correct outputs,
potential solutions. where 4 out of the 6 produced garbage translation.
Fixes. We hypothesize that it would be beneficial to con-
III. Preliminary Results vert the for loops to while loops before passing the input
TABLE I: Common Error Patterns found in to TransCoder, as the latter is syntactically equivalent
TransCoder in Python and Java. Thus, as a pre-processing step, we
performed semantic preserving transformation to covert
Java to Python Python to Java for to while . Such pre-processing significantly improved
(J2P) (P2J)
the translation of all 6 incorrect cases. Figure 2 second
1. Additional Context 18% 38%
2. Loop Conversion 12% 0%
row shows an example.
3. Type Sensitivity 38% 4% 3. Type Sensitivity. We find that TransCoder can be
4. Extra Constraints 0% 50% sensitive to certain types. For example, 19 out of 50
5. Miscellaneous Errors 14% 16%
examples J2P examples contain an array as a parameter.
(Mostly) Correct 22% 18% TransCoder fails to translate all these cases, as shown in
the third-row of Figure 2. For P2J as well, (see Figure 1b),
Based on this study, we identify 4 different categories of when the input focal method contains two or more pa-
errors. Table I shows the distribution. In comparison to rameters with names arr , TransCoder fails to translate
Java to Python translation (J2P), Python to Java (P2J) them. Note that, since Python is a dynamically typed
has a slightly higher rate of success—22% vs. 18%. This language, we have to rely on the variable names to infer
section discusses the common error patterns and potential their types. However, the corresponding ground truth Java
ways to fix them using template-based pre-processing and code confirms the intended type is indeed an array.
post-processing approaches. These percentages are calcu- Fixes. We explore a preprocessing step where without
lated by taking the percentage of the 50 test cases in both changing the code’s semantics we tried to use equivalent
J2P and P2J that display the mentioned errors. Figure 2 types or classes. For instance, in the above case, we change
illustrate the errors and plausible solutions, and the errors all the array parameter references in the Java code to a
are described in greater detail below. List of the equivalent data type, as the Python translation
1. Additional Context. The goal of the model is to of a Java array versus a Java List is identical. Note that
accurately translate one method, typically called the focal we can not use the exact same data type when converting
method. However, the focal method is often surrounded an array to List. Instead, we must use the wrapper class
by a main method and test cases. We call these extra data type (int to Integer, double to Double, etc). Such type
surroundings ’additional context’. TransCoder tends to transformation in the pre-processing helped us to improve
get confused between arguments inside and outside the TransCoder’s performances across all the 19 cases.
method, and will sometimes translate the additional con- 4. Generating Extra Constraints. The most prominent
text as well, resulting in incorrect or unreadable code. 9 issue for Python to Java translation is generating extrane-
Original Translation Updated Translation (Pre-/Post-Processing)

Source Target Source Target

def main():
1. Additional Context

void f_gold (int [ ] [ ] mat , int n ) {

... # Pre-processing: No main method
int principal = 0 ;
f_gold(...)
int secondary = 0 ;
... def f_gold ( mat, n ) :
No meaningful translation, ...
def f_gold ( mat, n ) : principal = 0
only tokenized function and ... println(
principal = 0 secondary = 0
nothing under it "Principal Diagonal:"+principal);
secondary = 0 ...
... println(
... print("Principal Diagonal:",principal)
"Secondary Diagonal:"+secondary);
print("Principal Diagonal:",principal) print("Secondary Diagonal:",secondary)
}
print("Secondary Diagonal:",secondary)

// Pre-processing: convert the while loop

2. Loop Conversion

String f_gold(...) { String f_gold(...) { def f_gold ():

int i=0,j=len-1; int i = 0 , j = len - 1; i , j = 0, len
for ( ;i<j;i++,j--) { while(i < j){ while i < j:
... No meaningful translation, ... ...
... println("Not Possible"); only series of imports i++; j--; i += 1
} ... println("Not Possible"); j -= 1
return String.valueOf(str); } print ("Not Possible")
} return String.valueOf(str) ; return str
}

// Pre-processing: change arr type to

int f_gold(int arr[ ],int n) { list
3. Type Sensitivity

... hmap = new HashMap<>(); int f_gold(List<Integer> arr,int n){ def f_gold ( arr , n ) :
... ... hmap = new HashMap<>(); hmap = { }
for(Integer a:hmap.keySet()){ ... ...
No meaningful translation,
if (hmap.get(a)%2!=0) for (Integer a:hmap.keySet()){ for a in hmap.keys():
only series of imports
return a ; if (hmap.get(a)%2!=0) if hmap[a] % 2 != 0 :
} return a ; return a
return - 1 ; } return - 1
} return - 1 ;
}

// Post-processing:
4. Extra Constraint

int f_gold (int x) { // Remove extra constraint

def f_gold ( x ) :
int temp = x ; int f_gold (int x) {
temp = x
int n = 0 ; No change of source (pre-processing) int temp = x ;
n = 0
while((x*x<n) && (x*x<m)) required. int n = 0 ;
while ( x*x < n ) :
... Post processing only. while((x*x<n))
...
return (sm == temp)? 1:0; ...
return ( sm == temp )
} return (sm == temp)? 1:0;
}

Fig. 2: Detected error patterns and their proposed rule-based solutions

ous logical operators to if and else if and while state- To evaluate the effectiveness of each mutation, we first
ments. Out of 50 examples, 25 had such issues. Although determined of which sampled test cases each mutation
such additional logical operators are syntactically valid, was applicable to. After translating the both original
they can potentially change code semantics. The last row source code and the mutated source code, we classified the
of Figure 2 is an example. mutation as a success or fail for each test case. If multiple
Fixes. As a post-processing step, we discard all the logical mutations were applicable to any test case, we would
constraints that do not appear in the source version. apply all possible combinations to ensure successes in each
This is due to the observation that although the model translation. The rate of success of a specific mutation, or
appends logical constraints, it never modifies the original rule, is computed as the number of successes divided by
conditions. the number of cases it was applicable to. Each of our
Overall Results. The performance of each mutation is identified mutations have a 100% success rate, though
measured by the rate of success. We classify ”success” in there are errors for which we have not yet discovered a
two cases: viable mutation for yet (Miscellaneous Errors in Table 1).
1. If a translated program does not compile, a success IV. Related Works
is when the translation of the program after applying Multiple previous studies have investigated the possi-
mutations compiles. bility of programming language translation through ma-
2. If a translated program does compile, but with error, chine learning. However, almost all studies rely on super-
a success is when the translation of the program after vised learning [17], [20]–[24]. This approach is unrealistic,
applying mutations runs more similarly to the original though accurate, as it is difficult to accumulate a high total
program. More specifically, if the translated code can be of labeled, correctly translated, datasets [17].
more easily interpreted to have the same functionality While it is difficult to come across labeled datasets,
as the source code, we would classify the mutation as a some researchers have found it effective to train their
success. model based on a technique called back-translation [8],
[16]. Being unsupervised, the capability of these models are in such cases, the rule-based approach may need to provide
not limited by the quantity of the annotated parallel data, more guidance.
making them state-of-the-art for program translation. In To this end, we envision building a scalable, modular,
this work, we case study one such model, TransCoder [8]. end-to-end system combining pre-processing, translation,
Other research has delved into the application of SMT and post-processing steps. We also intend to investigate
(statistical machine translation) [12], [14], [25] models in the usage of code editing models [22], [33] as pre-processing
the translation of programming languages. These studies and program repair tools [29], [34], [35] as post-processing
have also reached conclusions similar to this project, that a steps for better generalization.
majority of test cases have errors, but only need small fixes
to produce correct translations. These models can also be Acknowledgement
improved in a more program-analysis-oriented approach, This work is supported in part by NSF grants SHF-
as our techniques demonstrate as well [14]. 2107405, SHF-1845893, IIS-2040961, IBM, and VMWare.
Researchers also proposed translation models for in- Any opinions, findings, conclusions, or recommendations
language code transformation for syntactic repair [26], expressed herein are those of the authors and do not
[27], semantic program repair [28]–[30], refactoring [31], necessarily reflect those of the US Government, NSF, IBM
[32], etc. More recently, researchers have been proposing or VMWare.
general purpose code transformation models “pre-trained”
from developer-written code transformation collected from References
GitHub [33], or rule-based transformations [22]. In the [1] L. A. Meyerovich and A. S. Rabkin, “Empirical analysis of
future, we aim at investigating both the syntactic and programming language adoption,” in Proceedings of the 2013
ACM SIGPLAN international conference on Object oriented
semantic repair models as our pre-processing and post- programming systems languages & applications, 2013, pp. 1–18.
processing components. [2] R. J. Kizior, D. Carr, and P. Halpern, “Does cobol have a
future?” in Proc. Information Systems Education Conf, vol. 17,
V. Conclusion & Future Work no. 126, 2000.
[3] N. Stern, COBOL for the 21st Century. John Wiley & Sons,
Paper Summary. In this paper, we discuss the pit- Inc., 2007.
falls of unsupervised program translators and present [4] H. M. Sneed, “Migrating from cobol to java,” in 2010 IEEE
International Conference on Software Maintenance. IEEE,
the potential of program-property-aware rules that can 2010, pp. 1–7.
guide the ML-based translation as pre-/post- processing [5] J. Pu, Z. Zhang, J. Kang, Y. Xu, and H. Yang, “Using aspect
steps. We developed a proof-of-concept in-language pro- orientation in understanding legacy cobol code,” in 31st Annual
International Computer Software and Applications Conference
gram transformer for pre-processing the input and post- (COMPSAC 2007), vol. 2. IEEE, 2007, pp. 385–390.
processing the output of TransCoder. We show that a [6] N. Wilde, M. Buckellew, H. Page, and V. Rajlich, “A case study
simple rule-based in-language program transformer can of feature location in unstructured legacy fortran code,” in Pro-
significantly outperform program translation performance. ceedings Fifth European Conference on Software Maintenance
and Reengineering. IEEE, 2001, pp. 68–76.
Our preliminary results, along with detailed instruc- [7] R. Kulkarni, A. Chavan, and A. Hardikar, “Transpiler and it’s
tions to replicate each mutation, are publicly available advantages,” International Journal of Computer Science and
at https://github.com/kzh23/Replication-Package-ICSE- Information Technologies, vol. 6, no. 2, pp. 1629–1631, 2015.
[8] B. Roziere, M.-A. Lachaux, L. Chanussot, and G. Lample, “Un-
NIER-2023-Unsupervised-ML. While the ML-based trans- supervised translation of programming languages,” Advances
lator relies on statistical knowledge embedded in “big in Neural Information Processing Systems, vol. 33, pp. 20 601–
data”, we propose to embed programming domain knowl- 20 611, 2020.
[9] “Babel is a javascript compiler,” https://babeljs.io/, accessed:
edge into the translation pipeline. 2010-10-12.
Future Work. This paper serves as an initial attempt [10] K. Kimura, A. Sekiguchi, S. Choudhary, and T. Uehara, “A
toward combining ML-based program translation and pro- javascript transpiler for escaping from complicated usage of
cloud services and apis,” in 2018 25th Asia-Pacific Software
gram analysis-based program mutation. We aim to build Engineering Conference (APSEC). IEEE, 2018, pp. 69–78.
more sophisticated and automated techniques for program [11] “2to3 — automated python 2 to 3 code translation,” https://
transformation in the future. As evidenced by our ini- docs.python.org/3/library/2to3.html, accessed: 2010-10-12.
[12] K. Aggarwal, M. Salameh, and A. Hindle, “Using machine
tial results, guiding the ML-based tools with program- translation for converting python 2 to python 3 code,” PeerJ
property-aware rules has immense potential in program PrePrints, Tech. Rep., 2015.
translation. In the future, we will leverage how smartly [13] G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ranzato,
“Phrase-based & neural unsupervised machine translation,”
incorporate such guidance in the ML pipelines. For in- arXiv preprint arXiv:1804.07755, 2018.
stance, currently, the vanilla TransCoder can only trans- [14] A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Lexical sta-
late methods in isolation. Such limitations will hinder the tistical machine translation for language migration,” in Proceed-
ings of the 2013 9th Joint Meeting on Foundations of Software
adaptability of the proposed techniques in real life, where Engineering, 2013, pp. 651–654.
an entire project written in legacy language needs to be [15] Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda,
translated. We will further study the applicability of the and S. Nakamura, “Learning to generate pseudo-code from
source code using statistical machine translation,” in 2015 30th
proposed technique in low-resourced languages where we IEEE/ACM International Conference on Automated Software
will not get enough sample data for the training ML model; Engineering (ASE). IEEE, 2015, pp. 574–584.
[16] W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, [25] S. Karaivanov, V. Raychev, and M. Vechev, “Phrase-based sta-
“Summarize and generate to back-translate: Unsupervised tistical translation of programming languages,” in Proceedings
translation of programming languages,” arXiv preprint of the 2014 ACM International Symposium on New Ideas, New
arXiv:2205.11116, 2022. Paradigms, and Reflections on Programming & Software, 2014,
[17] X. Chen, C. Liu, and D. Song, “Tree-to-tree neural networks for pp. 173–184.
program translation,” Advances in neural information process- [26] T. Ahmed, N. R. Ledesma, and P. Devanbu, “Synfix: Automat-
ing systems, vol. 31, 2018. ically fixing syntax errors using compiler diagnostics,” arXiv
[18] S. Edunov, M. Ott, M. Auli, and D. Grangier, “Understanding preprint arXiv:2104.14671, 2021.
back-translation at scale,” arXiv preprint arXiv:1808.09381, [27] ——, “Synshine: Improved fixing of syntax errors,” IEEE Trans-
2018. actions on Software Engineering, 2022.
[19] Facebookresearch, “Facebookresearch/transcoder: Pub- [28] S. Chakraborty, Y. Ding, M. Allamanis, and B. Ray, “Codit:
lic release of the transcoder research project Code editing with tree-based neural models,” IEEE Transac-
https://arxiv.org/pdf/2006.03511.pdf.” [Online]. Available: tions on Software Engineering, pp. 1–1, 2020.
https://github.com/facebookresearch/TransCoder [29] Z. Chen, S. Kommrusch, M. Tufano, L.-N. Pouchet,
[20] W. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Unified D. Poshyvanyk, and M. Monperrus, “Sequencer: Sequence-
pre-training for program understanding and generation,” in Pro- to-sequence learning for end-to-end program repair,” IEEE
ceedings of the 2021 Conference of the North American Chapter Transactions on Software Engineering, vol. 47, no. 9, pp.
of the Association for Computational Linguistics: Human 1943–1959, 2019. [Online]. Available: https://www.cs.wm.edu/
Language Technologies. Online: Association for Computational ∼denys/pubs/seq2seq4repair TSE cameraready.pdf
Linguistics, Jun. 2021, pp. 2655–2668. [Online]. Available: [30] M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White,
https://www.aclweb.org/anthology/2021.naacl-main.211 and D. Poshyvanyk, “An empirical study on learning bug-
[21] S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, fixing patches in the wild via neural machine translation,”
C. B. Clement, D. Drain, D. Jiang, D. Tang, G. Li, L. Zhou, ACM Transactions on Software Engineering and Methodology
L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, (TOSEM), vol. 28, no. 4, pp. 1–29, 2019.
N. Sundaresan, S. K. Deng, S. Fu, and S. Liu, “Codexglue: [31] M. Aniche, E. Maziero, R. Durelli, and V. Durelli, “The ef-
A machine learning benchmark dataset for code understanding fectiveness of supervised machine learning algorithms in pre-
and generation,” CoRR, vol. abs/2102.04664, 2021. dicting software refactoring,” IEEE Transactions on Software
[22] S. Chakraborty, T. Ahmed, Y. Ding, P. Devanbu, and B. Ray, Engineering, 2020.
“Natgen: Generative pre-training by” naturalizing” source [32] A. M. Sheneamer, “An automatic advisor for refactoring soft-
code,” in 2022 The ACM Joint European Software Engineering ware clones based on machine learning,” IEEE Access, vol. 8,
Conference and Symposium on the Foundations of Software pp. 124 978–124 988, 2020.
Engineering (ESEC/FSE). ACM, 2022. [33] J. Zhang, S. Panthaplackel, P. Nie, J. J. Li, and M. Gligoric,
[23] Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, “Coditt5: Pretraining for source code and natural language
L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “CodeBERT: editing,” arXiv preprint arXiv:2208.05446, 2022.
A pre-trained model for programming and natural languages,” [34] H. Ye, M. Martinez, and M. Monperrus, “Neural program repair
in Findings of the Association for Computational Linguistics: with execution-based backpropagation,” in Proceedings of the
EMNLP 2020. Online: Association for Computational Lin- 44th International Conference on Software Engineering, 2022,
guistics, Nov. 2020, pp. 1536–1547. pp. 1506–1518.
[24] D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. LIU, L. Zhou, [35] M. Yasunaga and P. Liang, “Break-it-fix-it: Unsupervised
N. Duan, A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, learning for program repair,” in International Conference on
C. Clement, D. Drain, N. Sundaresan, J. Yin, D. Jiang, and Machine Learning. PMLR, 2021, pp. 11 941–11 952. [Online].
M. Zhou, “Graphcode{bert}: Pre-training code representations Available: https://arxiv.org/pdf/2106.06600.pdf
with data flow,” in International Conference on Learning Rep-
resentations, 2021.

Developing Apps with Python and Flet
From Everand
Developing Apps with Python and Flet
Williams Asiedu
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
From Everand
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
Tim Warren
5/5 (1)
Online Auction System PDF
No ratings yet
Online Auction System PDF
89 pages
Seam Tracking Th6D Manual V.1.0 (Binzel - Scansonic) 2014
No ratings yet
Seam Tracking Th6D Manual V.1.0 (Binzel - Scansonic) 2014
24 pages
Cheat Sheet PDF
No ratings yet
Cheat Sheet PDF
21 pages
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)
Thinking About Star
From Everand
Thinking About Star
Francis McCabe
No ratings yet
S004
No ratings yet
S004
5 pages
Multilingual Code Snippets Training for Program Translation
No ratings yet
Multilingual Code Snippets Training for Program Translation
8 pages
Programming in Star
From Everand
Programming in Star
Francis McCabe
No ratings yet
PowerShell Practitioner: Understanding The Core Building Blocks of Programming & Scripting through PowerShell, Plus Debunking Popular Misconceptions
From Everand
PowerShell Practitioner: Understanding The Core Building Blocks of Programming & Scripting through PowerShell, Plus Debunking Popular Misconceptions
Stevens-Sobolewski Justin
No ratings yet
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
2022 - Multilingual Training For Software Engineering
No ratings yet
2022 - Multilingual Training For Software Engineering
13 pages
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
C Programming : All-in-One Resource for C Programming , Comprehensive Tutorials, Expert Tips, and a Wide Range of Exercises for All Skill Levels
From Everand
C Programming : All-in-One Resource for C Programming , Comprehensive Tutorials, Expert Tips, and a Wide Range of Exercises for All Skill Levels
Aria Thane
No ratings yet
Java for Beginners
From Everand
Java for Beginners
Zoe Codewell
No ratings yet
Mastering C: A Comprehensive Guide to Programming Excellence
From Everand
Mastering C: A Comprehensive Guide to Programming Excellence
THE NORTHERN HIMALAYAS
No ratings yet
Go Functional Programming Simplified: A Practical Guide with Examples
From Everand
Go Functional Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
From Everand
Basics of Programming: A Comprehensive Guide for Beginners: Essential Coputer Skills, #1
DG. Junior
No ratings yet
C# Functional Programming Made Easy: A Practical Guide with Examples
From Everand
C# Functional Programming Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Objective-C Programming Nuts and bolts
From Everand
Objective-C Programming Nuts and bolts
Keith Lee
No ratings yet
Go in Practice
From Everand
Go in Practice
Aarav Joshi
No ratings yet
C# OOP Step by Step: A Practical Guide with Examples
From Everand
C# OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Rust In Practice: A Programmers Guide to Build Rust Programs, Test Applications and Create Cargo Packages
From Everand
Rust In Practice: A Programmers Guide to Build Rust Programs, Test Applications and Create Cargo Packages
Rustacean Team
No ratings yet
Rust In Practice
From Everand
Rust In Practice
GitforGits
No ratings yet
Programming Paradigms
From Everand
Programming Paradigms
Zoe Codewell
No ratings yet
Java Functional Programming Made Simple: A Practical Guide with Examples
From Everand
Java Functional Programming Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
How to Learn PHP, MySQL and Javascript Quickly!: For Dummies
From Everand
How to Learn PHP, MySQL and Javascript Quickly!: For Dummies
Andrei Besedin
5/5 (1)
Basic Guide to Programming Languages Python, JavaScript, and Ruby
From Everand
Basic Guide to Programming Languages Python, JavaScript, and Ruby
Kiet Huynh
No ratings yet
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
From Everand
Understanding Software Engineering Vol 2: Programming principles and concepts to build any software.
Gabriel Clemente
5/5 (1)
Assembly Language: From Basics to Expert Proficiency
From Everand
Assembly Language: From Basics to Expert Proficiency
William Smith
No ratings yet
Code Foundations
From Everand
Code Foundations
Zoe Codewell
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
From Everand
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Brady Ellison
5/5 (2)
Compli Er
No ratings yet
Compli Er
52 pages
Cracking the Golang Coding Interview: A Comprehensive Guide to Algorithmic Problem Solving
From Everand
Cracking the Golang Coding Interview: A Comprehensive Guide to Algorithmic Problem Solving
Aarav Joshi
No ratings yet
Core Objective-C in 24 Hours
From Everand
Core Objective-C in 24 Hours
Keith Lee
5/5 (1)
TransCoder - YDATA Seminar
No ratings yet
TransCoder - YDATA Seminar
32 pages
Ai Powered Code Converter and Code Analyser – Code AI
No ratings yet
Ai Powered Code Converter and Code Analyser – Code AI
6 pages
Deepercoder: Code Generation Using Machine Learning: Ntroduction
No ratings yet
Deepercoder: Code Generation Using Machine Learning: Ntroduction
6 pages
A Guide To All Programming and Coding Languages
From Everand
A Guide To All Programming and Coding Languages
Don Carlos
No ratings yet
C# Data Structures Explained: A Practical Guide with Examples
From Everand
C# Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Learning Java: A Step-by-Step Journey Through Core Programming Concepts
From Everand
Learning Java: A Step-by-Step Journey Through Core Programming Concepts
Aarav Joshi
No ratings yet
Fully Autonomous Programming With Large Language Models
No ratings yet
Fully Autonomous Programming With Large Language Models
10 pages
Understanding Python: Beginner's Guide to Programming
From Everand
Understanding Python: Beginner's Guide to Programming
Sabry Fattah
No ratings yet
Racket Unleashed: Building Powerful Programs with Functional and Language-Oriented Programming
From Everand
Racket Unleashed: Building Powerful Programs with Functional and Language-Oriented Programming
Robert Johnson
No ratings yet
C# Algorithms for New Programmers: A Practical Guide with Examples
From Everand
C# Algorithms for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Specification and Translation of Programming Languages
No ratings yet
Specification and Translation of Programming Languages
7 pages
Translation Process Presentation
No ratings yet
Translation Process Presentation
13 pages
2408.09701v1
No ratings yet
2408.09701v1
11 pages
C++ OOP Made Simple: A Practical Guide with Examples
From Everand
C++ OOP Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
C Language for Beginners with Easy Tips of C Basic Programming
From Everand
C Language for Beginners with Easy Tips of C Basic Programming
Publicancy Ltd
No ratings yet
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Unit-1 Introduction To Compilers: Goals of Translation
No ratings yet
Unit-1 Introduction To Compilers: Goals of Translation
22 pages
The Art of Code: Exploring the World of Programming Languages
From Everand
The Art of Code: Exploring the World of Programming Languages
Sam Steed
No ratings yet
C# Essentials for New Coders: A Practical Guide with Examples
From Everand
C# Essentials for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
An Introduction to Functional Programming Through Lambda Calculus
From Everand
An Introduction to Functional Programming Through Lambda Calculus
Greg Michaelson
No ratings yet
Go Debugging from Scratch: A Practical Guide with Examples
From Everand
Go Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Simple Golang Programming for Beginners
From Everand
Simple Golang Programming for Beginners
Terry T. Diaz
No ratings yet
Java Beginner Guide
From Everand
Java Beginner Guide
Namo
No ratings yet
Getting Started with Go: A Practical Guide with Examples
From Everand
Getting Started with Go: A Practical Guide with Examples
William E. Clark
No ratings yet
SSH To Beaglebone Black Over Usb: Created by Simon Monk
No ratings yet
SSH To Beaglebone Black Over Usb: Created by Simon Monk
18 pages
Harshit Jain
No ratings yet
Harshit Jain
1 page
NIST SP 800-82r3 Ipd
100% (1)
NIST SP 800-82r3 Ipd
317 pages
Refer Below Table and Answer The Question
No ratings yet
Refer Below Table and Answer The Question
3 pages
Crimson 3.2 Software Guide English (LP1153C) 19MB
No ratings yet
Crimson 3.2 Software Guide English (LP1153C) 19MB
412 pages
The University of Auckland: Total
No ratings yet
The University of Auckland: Total
17 pages
Data Structures001
No ratings yet
Data Structures001
8 pages
Unit-III - Root Locus
No ratings yet
Unit-III - Root Locus
67 pages
All About Exadata 888828.1
No ratings yet
All About Exadata 888828.1
6 pages
Quezon City University: Cc106 - Introduction To Application Development and Emerging Technologies Assignment # 1
No ratings yet
Quezon City University: Cc106 - Introduction To Application Development and Emerging Technologies Assignment # 1
2 pages
Introduction To Computer Application
No ratings yet
Introduction To Computer Application
19 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
IP Camera IE Browser - User Manual V1.1（English Version）
No ratings yet
IP Camera IE Browser - User Manual V1.1（English Version）
24 pages
Summasphere Project Plan
No ratings yet
Summasphere Project Plan
16 pages
Positioning Content: D-R Ivan Chorbev
No ratings yet
Positioning Content: D-R Ivan Chorbev
49 pages
6.2 D HD - Custom Web Application
No ratings yet
6.2 D HD - Custom Web Application
2 pages
Christopher P Resume
No ratings yet
Christopher P Resume
2 pages
PrimeTime GCA User Guide 2016
No ratings yet
PrimeTime GCA User Guide 2016
172 pages
Semester 3final Exam Preparation
No ratings yet
Semester 3final Exam Preparation
5 pages
L7-COMP1806-2024
No ratings yet
L7-COMP1806-2024
41 pages
Denonhdmi 1
No ratings yet
Denonhdmi 1
48 pages
WPA3
No ratings yet
WPA3
3 pages
P e Scada 001 R01
No ratings yet
P e Scada 001 R01
1 page
Wildfly Installation Steps
No ratings yet
Wildfly Installation Steps
3 pages
Software Construction Lecture 1
No ratings yet
Software Construction Lecture 1
30 pages
Final Lec Exam
No ratings yet
Final Lec Exam
25 pages
DobotVisionStudio Software User Guide V1.4.1 en
No ratings yet
DobotVisionStudio Software User Guide V1.4.1 en
304 pages
Adafruit Ultimate Gps Logger Shield
No ratings yet
Adafruit Ultimate Gps Logger Shield
30 pages

On ML-Based Program Translation- Perils and Promises

Uploaded by

On ML-Based Program Translation- Perils and Promises

Uploaded by

On ML-Based Program Translation:

Perils and Promises

int f_gold(int x){ int f_gold(int[] list1, int[] list2){

Fig. 1: Motivating Example: Python to Java Translation.

Source Target Source Target

void f_gold (int [ ] [ ] mat , int n ) {

// Pre-processing: convert the while loop

String f_gold(...) { String f_gold(...) { def f_gold ():

// Pre-processing: change arr type to

int f_gold (int x) { // Remove extra constraint

Fig. 2: Detected error patterns and their proposed rule-based solutions

You might also like