0% found this document useful (0 votes)
9 views

Coding Contest RP

The document analyzes source code submissions from competitive programming contests to predict a user's rank and country based on their coding style. Key features are extracted from the source code, including n-grams from tokens and traversals of the abstract syntax tree. Machine learning models like neural networks and gradient boosted decision trees are trained and evaluated, with the neural network achieving the highest prediction accuracies of 77.2% for rank and 72.5% for country.

Uploaded by

gfhghjvb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Coding Contest RP

The document analyzes source code submissions from competitive programming contests to predict a user's rank and country based on their coding style. Key features are extracted from the source code, including n-grams from tokens and traversals of the abstract syntax tree. Machine learning models like neural networks and gradient boosted decision trees are trained and evaluated, with the neural network achieving the highest prediction accuracies of 77.2% for rank and 72.5% for country.

Uploaded by

gfhghjvb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Analysis of Code Submissions in Competitive Programming Contests

CS 229 Project, Autumn 2018


Wenli Looi ([email protected])
Source Code: https://github.com/looi/CS229

Abstract—Algorithmic programming contests provide an op- II. R ELATED W ORK


portunity to gain insights into coding techiques. This paper an-
alyzes contest submissions on Codeforces, a popular competitive Allamanis et al. (2018) [2] is a survey of machine learning
programming website where participants solve about 5 to 10 techniques used to analyze source code. There has been
algorithmic problems in a typical contest. We attempt to predict significant past work on applying techniques from natural
a user’s rank and country based on a single C++ source code
submission. Features were generated by running source code language processing (NLP) and analyzing the abstract syntax
through the Clang C++ compiler and extracting bigrams from tree (AST), both of which are techniques used in this project.
the tokens and traversal of the abstract syntax tree (AST). Out Machine learning has been applied to many problems, such as
of several models, the neural network model achieved the highest autocompletion, inferring coding conventions, finding defects,
accuracies of 77.2% in predicting rank (within one rank) and translating code, and program synthesis.
72.5% in predicting country. Despite not achieving the highest
accuracy, the GDA model was easier to interpret and allowed us Burrows et al. (2007) [3] attempted to determine the author
to find specific differences in coding styles between programmers of C source code by finding N-grams in the tokens, similar to
of different ranks and countries. what is done here. They classified a code sample by finding
the closest code sample in the corpus as measured by some
I. I NTRODUCTION similarity measure. On a collection of 1640 documents written
Codeforces [1] is one of many competitive programming by 100 authors, they were able to identify the author with 67%
platforms that provide an opportunity to gain insights into accuracy using 6-grams. A disadvantage of this approach is
coding techniques. In a contest, participants solve about 5 to that searching through the entire training corpus may have
10 well-defined algorithmic programming problems by writing scalability issues as it grows.
short stand-alone solutions in a programming language such A study by Ugurel et al. (2002) [4] attempted to classify
as C++, Java, or Python. Each solution is generally written by C/C++ source code archives into various categories, like
only one person (user). It is known whether the solution passed Database, Network, and Word Processor. For features, they
or failed. In addition, the user’s skill level (rating/rank) and used single tokens from the source code as well as bigrams
declared country are known. Most problems on Codeforces and lexical phrases from README files and comments. A
have hundreds or thousands of passing submissions. Unlike support vector machine (SVM) was then trained to perform
many other platforms, all submissions on Codeforces are the classification. They achieved an accuracy of around 40-
publicly viewable, making it an ideal candidate for analysis. 70% depending on the features and data set.
On Codeforces, users are assigned a numerical rating based More recently, recurrent neural networks (RNNs), such as
on their performance in past contests. Users are then assigned long short-term memory (LSTM) based networks, have been
one of ten ranks based on their rating, ranging from Newbie used to classify source code. Alsulami et al. (2017) [5] used
to Legendary Grandmaster. These ranks are shown in the Data an LSTM-based network to determine the author of Python
Set section in Table II. and C++ source code. They fed a traversal of the AST into
The goal of this project is to predict a user’s rank (within the RNN, which inspired the traversal-based method used in
one rank) and country based solely on a single passing source this paper. The RNN included an embedding layer to convert
code submission. As well, some interpretation of the learned AST nodes into a fixed length vector. Their best-performing
models is done to find differences in coding styles between model, a bidirectional LSTM, achieved 85% accuracy on a
skill levels and countries. Since only passing submissions are C++ dataset with 10 authors and 88.86% accuracy on a Python
considered, predictions are based only on coding style and not dataset with 70 authors.
whether the code works or not (all code works). Techniques similar to doc2vec (Le et al., 2014) [6], where
Analysis from this project will not only highlight the coding entire documents are converted to an embedding space, have
techniques of competitive programmers, but may also be also been used to classify programs. Piech et al. (2015) [7]
relevant for code written in industry or academia. While code encoded programs as a linear map between a precondition
written in programming contests differs from real-world code, space and postcondition space. They used the linear maps with
some coding best practices may apply to both, and the analysis an RNN to predict feedback that an educator would provide
techniques used here may also be applicable to other code for a piece of code, achieving 90% precision with recall of
bases. Code from programming contests, however, is easier to 13-48% depending on the coding problem.
analyze than most code for the reasons described above (e.g. Recurrent neural networks may not be better than N-gram
written by exactly one person). based methods, however. Hellendoorn et al. (2017) [8] found
that carefully tuned N-gram based models are more effective TABLE III
than RNN and LSTM-based networks in modeling source DATA SET USED FOR COUNTRY PREDICTION
code. They had higher modeling performance (entropy) and Country # in % of
were able to provide more accurate code suggestions. Data Data
This paper uses an approach based on N-grams and AST India 12331 29.31%
China 8818 20.96%
traversal with machine learning methods taught in CS 229. Russia 6761 16.07%
Bangladesh 4536 10.78%
III. DATA S ET Vietnam 1753 4.17%
Ukraine 1694 4.03%
The data set currently consists of 10 contests on Codeforces Poland 1664 3.95%
Egypt 1662 3.95%
from Aug to Nov 2018. All of the contests are “combined United States 1450 3.45%
division” contests, open to users of all ranks. Each contest Iran 1406 3.34%
has ~6k submissions for a total of ~60k, with the data format Total 42075 100.00%
shown in Table I.

TABLE I IV. P REPROCESSING


DATA FORMAT FOR EACH SUBMISSION
Before applying machine learning algorithms, the C++
Country Rating Source Code source code is preprocessed as shown in Fig. 1.
RU 2193 #include <iostream>\n#include...
US 1747 #include <bits/stdc++.h>\nusing... Source code is first converted to a sequence of strings. It is
··· ··· ··· run through the Clang C++ compiler to produce a list of tokens
and an abstract syntax tree (AST). Comments are removed
as our focus is coding style. To help the learning algorithms
Not all contest submissions are in the data set. Only generalize better, all tokens equal to a variable or function
contestants with a declared country and who participated in at name in the AST, or representing a string or character literal,
least one previous contest are considered. For each problem, are replaced with special tokens !!VAR, !!FUN, !!STR,
only the latest passing submission for each user is considered and !!CHR respectively. The AST is converted to a list of
(if any). Only C++ solutions are considered. C++ is the most strings as a pre-order traversal, where additionally a special
popular language, with about 90% of total submissions being token endblock is added when all of a node’s children have
C++ in the contests used here. been visited. To simplify later processing, the processed tokens
The number of submissions for each rank in the data set is are concatenated with the AST traversal to produce a single
shown in Table II. A user’s rating may change based on their sequence of strings.
performance in the contest. We only consider the user’s rating The sequence of strings is then processed further. Bigrams
before the contest. (and unigrams) with at least 1% frequency in the training set
For the country analysis, we used a subset of the data are counted to produce features. To help prevent the learning
consisting only of the users in the 10 most common countries. algorithms from favoring shorter or longer solutions, each
These countries cover about 70% of the full data set. A count vector is normalized by the L2 norm. (TF-IDF [10] was
summary of this data set is shown in Table III. briefly tested, but L2 normalization seemed to work better.)
Both data sets have a significant class imbalance. Various The features are then scaled to have zero mean and unit
techniques were needed to handle this, as described later. variance. Without the normalization and scaling, the GDA
We implemented a custom scraper for Codeforces in Python model had a much lower accuracy, and the logistic regression
using lxml [9] to parse the HTML. model failed to converge in training (probably because the
gradients were ill-behaved).
TABLE II In the data set, the average number of tokens in a program
DATA SET USED FOR RANK PREDICTION is 428, the average length of the AST traversal is 627, and the
Codeforces Rank Rating # in % of average length of the concatenated sequence is 1055.
Bounds Data Data
Legendary Grandmaster 3000+ 378 0.63% V. M ETHODS
International Grandmaster 2600-2999 1319 2.20%
Grandmaster 2400-2599 1781 2.97% Several learning algorithms are used to predict a user’s
International Master 2300-2399 1733 2.89% rating and country based on their code. Here, m denotes the
Master 2100-2299 6375 10.64% number of training examples, x(i) denotes the feature vector
Candidate Master 1900-2099 8756 14.61%
Expert 1600-1899 17824 29.75% for example i, and y (i) denotes the label for example i.
Specialist 1400-1599 11445 19.10%
Pupil 1200-1399 7405 12.36% A. Linear regression
Newbie 0-1199 2896 4.83%
Total 59912 100.00% Linear regression is used to predict the user’s rating and
the rank is inferred from the rating using Table II. Due to
class imbalance, we used a weighted least squares loss where
Prediction assumes a uniform prior due to class imbalance:
Source code
int n; // my var
int main() { p(y = k) = 1/(# classes) (forced uniform)
scanf("%d", &n); p(x|y = k) ∼ N (µk , Σ)
...
C. Logistic regression
libclang C++ parser For country prediction, we use softmax regression with a
weighted cross-entropy loss. Again, the weight w(i) is the
(i)
Raw tokens Abstract syntax tree inverse of the class size. ŷk denotes the predicted probability
"int", "n", ";", "int", TranslationUnit that example i is in class k.
"main", "(",")", "{", VarDecl n 
"scanf", "(", "\"%d\"", FunctionDecl main (i)

(i) (i)
 exp θkT x(i)
",", "&", "n", ")", ";", CompoundStmt ŷk = p y = k|x =P T (i)

... CallExpr scanf ... j exp θj x

m
X
Processed tokens AST traversal max w(i) log ŷy(i)
"int", "!!VAR", ";", "TranslationUnit", θ
i=1
"int", "main", "(",")", "VarDecl", "endblock",
"{", "scanf", "(", "FunctionDecl", For rank prediction, since the goal is to predict within
"!!STR", ",", "&", "CompoundStmt",
"!!VAR", ")", ";", ... "CallExpr", ...
one rank of the actual rank, we trained a separate logistic
regression model for each rank. Each training example of rank
r is considered to be a positive example in the models for ranks
Concatenated sequence
within 1 rank of r.
"int", "!!VAR", ";", ..., "TranslationUnit",
"VarDecl", "endblock", ... (i) 1
ŷk = 
1 + exp −θkT x(i)
Bigram extraction (min. 1% freq)
m #X
X ranks 
max w(i) 1{|j − y (i) | ≤ 1} log ŷj
Normalization and scaling θ
i=1 j=1


Approx. 2k features +1{|j − y (i) | > 1} log(1 − ŷj )

All examples are considered positive examples in three ranks,


Fig. 1. Data preprocessing. except for examples of the lowest and highest ranks, which are
only considered positive in two ranks. Therefore, their weight
is multiplied by 3/2 here. Empirically, this results in a model
the weight w(i) is the inverse of the number of users in the where the classification accuracy for each rank is more even.
training set with the same rank:
m 2 D. Neural network
1 X (i)  T (i)
min w θ x − y (i) The neural network model is similar to logistic regression
θ 2
i=1 and uses the same loss functions. A single fully-connected
All of the other methods are classification algorithms, rather rectified linear layer with 100 units is inserted between the
than regression, as they seemed to work better (see results). input and output layers as shown in Fig. 2. Adding even more
hidden units seemed to increase the accuracy but this was not
fully tested due to time constraints.
B. Gaussian discriminant analysis (GDA)
The maximum likelihood estimators of each class mean µk Input ReLU Output
and the covariance matrix Σ are computed, where again the (~2000) (100) Sigmoids/
weight w(i) is the inverse of the class size: Softmax

Pm (i)
i=1 1{y = k}x(i)
µk = P m (i)
i=1 1{y = k}
Pm (i)
 T
i=1 w x − µy(i) x(i) − µy(i)
(i)
Fig. 2. Neural network architecture.
Σ= Pm (i)
i=1 w
VI. E XPERIMENTS TABLE IV
ACCURACY FOR EACH MODEL (10- FOLD CROSS VALIDATION )
All experiments were conducted using 10-fold cross valida-
tion. For each type of model, we trained 10 models where each Model Accuracy Accuracy
model is trained on 9 contests (~54k examples) and tested on 1 (Rank±1) (Country)
Train Test Train Test
contest (~6k examples). The values reported here are averages Random/constant 30.0% 30.0% 10.0% 10.0%
over the 10 models. With this methodology, the models are Linear regresion 69.6% 60.1% N/A N/A
tested on problems never seen in training. This ensures that the GDA 75.7% 67.2% 75.0% 65.0%
Logistic regression 86.1% 71.6% 92.2% 68.4%
models are not learning specific features about the problems
Neural network 94.4% 77.2% 97.0% 72.5%
in the training set.
Due to the class imbalance described before, accuracy is
defined as the weighted accuracy where the weight w(i) of
each example is the inverse of the class size in the test set. when predicting a user’s rating in the test set. Given that ranks
For rank, we allow the predicted rank to be within one rank have a rating range of ~200, this is a fairly large error.
of the actual rank. If y (i) is the actual label and ŷ (i) is the GDA worked surprisingly well, achieving accuracies that
predicted label for example i: are almost as high as logistic regression. While GDA assumes
Pm that p(x|y) is multivariate Gaussian, logistic regression does
w(i) 1{y (i) = ŷ (i) }
Accuracy (Country) = i=1 Pm (i)
not make that assumption and is capable of modeling a large
i=1 w variety of other distributions. Since the accuracies are similar,
Pm
w(i) 1{|y (i) − ŷ (i) | ≤ 1} this indicates that p(x|y) is Gaussian to some degree.
Accuracy (Rank±1) = i=1 Pm (i)
i=1 w Out of all the algorithms, the neural network had the highest
The weighted accuracy shows how well the model can accuracies. The neural network was probably able to learn
predict all classes and not just the majority. A model that more complex relationships between the features compared
strongly favors larger classes would achieve a high unweighted to the other algorithms. Perhaps some combination of several
accuracy but low weighted accuracy. bigrams is highly indicative of rank or country. Interpretation
For the linear regression model, we also report the weighted of the neural network is out of scope of this project, however.
root mean-squared error (RMSE) for the predicted rating: The high training accuracies, compared to test accura-
sP cies, may indicate overfitting. In the neural network, dropout
m (i) y (i) − ŷ (i) 2

i=1 w
helped reduce overfitting (as described before), but no other
RMSE = Pm (i) regularization techniques were used. We briefly tried using
i=1 w
principal component analysis (PCA) to reduce the number of
Scikit-learn [11] is used to train the linear regression and GDA features, and L2 regularization on the parameters, but these
models, while TensorFlow [12] is used to train the logistic re- techniques decreased the test accuracy. More data helped
gression and neural network models. Models were trained with reduce overfitting, as the accuracy values are about 5% higher
the entire training set as a single batch. For logistic regression, than initial tests performed with 5 contests instead of 10.
we used gradient descent with a 0.1 learning rate, while for For each actual rank and country, the neural network test
the neural network, we used the Adam algorithm [13] with a accuracies are shown in Fig. 3 and 4. The model seems to be
0.0001 learning rate. These learning rates were experimentally able to predict all ranks with similar accuracy. For country,
found to converge. 50% dropout is used for the hidden layer, the model is able to predict the more common countries with
meaning that on every iteration, 50% of the hidden nodes are higher accuracy despite the weighted loss function used. This
inactive. This helps prevent the network from overfitting and may be because there is significantly more training data for
was found to increase the test accuracy. the more common countries.
VII. R ESULTS AND D ISCUSSION
The accuracies obtained for each model are shown in Table Legendary Grandmaster
IV. For reference, the accuracy of a model that outputs a International Grandmaster
random or constant output is shown in the first row. A model Grandmaster
International Master
that outputs a constant or random rank, except for the highest
Master
and lowest rank, would achieve 30% accuracy because there Candidate Master
are 3 ranks within 1 rank of the chosen rank. For country, Expert
however, we require that the model classifies the exact country, Specialist
and there are 10 countries in the data set. Pupil
Newbie
Classification was found to work better than regression
0 0.25 0.5 0.75 1
when predicting the rank. This may be because classification
optimizes what we actually care about, which is predicting
the correct rank, rather than the rating. The linear regression Fig. 3. Neural network test accuracy for rank (±1) by actual rank
model had a weighted RMSE (as previously defined) of 545
of the AST and appears exactly once per per program, but
India
China
since the count is normalized by the L2 norm of the count
Russian vector, its value will be higher in shorter programs. Thus, it
Bangladesh appears that GDA has learned to associate smaller programs
Vietnam with lower skill levels, despite having the L2 normalization to
Ukraine try to prevent this. It makes sense that a long program would
Poland
likely indicate a hard problem and a high skilled competitor.
Egypt
United States Tables VII and VIII show the features with the highest class
Iran means for Chinese and American competitors respectively.
0 0.25 0.5 0.75 1 It seems that Chinese competitors often use getchar to
read single characters from standard input, and import C
Fig. 4. Neural network test accuracy for country by actual country input libraries like cstdio. American competitors seem to
often spell out std in their code (like std::cout <<
std::endl) instead of importing the entire namespace with
VIII. I NTERPRETATION OF THE GDA MODEL using namespace std, and use ld which is a commonly
While the GDA model did not achieve the highest accuracy, used alias for long double.
its simplicity makes it possible to interpret the learned model
more easily. For this analysis, we randomly chose one of the TABLE VII
models from the 10-fold cross validation. To determine the S TRONGEST INDICATORS OF A C HINESE COMPETITOR
unigrams and bigrams that were the strongest indicators of
= getchar getchar getchar ( char !!VAR
high and low skill level, we compared the class means µk ; char cstdio cstdio > < cstdio
for the International Grandmaster and Pupil ranks and found cstring cstring > < cstring > !!CHR
the features where the class means had the largest (positive) < !!CHR { scanf
and smallest (negative) absolute difference. These features UnexposedExpr CharacterLiteral >= !!CHR
are shown in Tables V and VI. The features are ordered in
decreasing strength from left to right and top to bottom.

TABLE V TABLE VIII


S TRONGEST INDICATORS OF HIGH SKILL LEVEL S TRONGEST INDICATORS OF AN A MERICAN COMPETITOR

ifdef # ifdef assert endif ( std << std ld > > struct
# endif assert ( ( ... | ( :: < ld std ::
FunctionTemplate TemplateTypeParameter | , std struct ; template
__VA_ARGS__ FunctionTemplate ClassTemplate
ifdef LOCAL LOCAL endblock ClassTemplate os <<

TABLE VI
S TRONGEST INDICATORS OF LOW SKILL LEVEL IX. C ONCLUSION AND F UTURE W ORK
cin >> cin >> !!VAR >> In this paper, we studied the application of machine learning
cout << cout
TranslationUnit InclusionDirective techniques in predicting the rank and country of a Codeforces
TranslationUnit std ; competitor based on a single source code submission. The
IfStmt BinaryOperator main main ( neural network model achieved the highest accuracy of 77.2%
accuracy in predicting rank (within one rank) and 72.5% in
From this analysis, we can see that both tokens, like predicting country. Despite not achieving the highest accuracy,
cin >>, and AST nodes, like FunctionTemplate, are the GDA model was easier to interpret and we were able to
important to the model. As well, both unigrams and bigrams find unigrams and bigrams that were the strongest indicators
are important, although they are often related. of certain skill levels and countries.
High skilled competitors appear to use #ifdef signifi- Future work may include testing RNN or LSTM based mod-
cantly, perhaps to change the code’s behavior at compile time els, as discussed in Related Work. Acquiring more data may
by defining macros in the compiler flags. Also, they appear to help reduce overfitting. Token processing could be improved,
use assertions and C++ function templates. for example by replacing class and macro names with special
Low skilled programmers appear to use cin and cout for tokens in addition to variable and function names. N-grams
input. This makes sense since scanf and printf are faster with N > 2 could be tested as only unigrams and bigrams
input methods and often preferred by experienced competitors. were considered here. More hidden units or layers could be
It is interesting to see TranslationUnit as a strong added to the neural network. Interpretation of the logistic
indicator of low skill level. TranslationUnit is the root regression or neural network model could be attempted.
R EFERENCES
[1] M. Mirzayanov. Codeforces. [Online]. Available: http://codeforces.com/
[2] M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, “A survey
of machine learning for big code and naturalness,” ACM Comput.
Surv., vol. 51, no. 4, pp. 81:1–81:37, Jul. 2018. [Online]. Available:
http://doi.acm.org/10.1145/3212695
[3] S. Burrows and S. M. Tahaghoghi, “Source code authorship attribution
using n-grams,” in Proceedings of the Twelth Australasian Document
Computing Symposium, Melbourne, Australia, RMIT University. Cite-
seer, 2007, pp. 32–39.
[4] S. Ugurel, R. Krovetz, and C. L. Giles, “What’s the code?: automatic
classification of source code archives,” in Proceedings of the eighth ACM
SIGKDD international conference on Knowledge discovery and data
mining. ACM, 2002, pp. 632–638.
[5] B. Alsulami, E. Dauber, R. Harang, S. Mancoridis, and R. Greenstadt,
“Source code authorship attribution using long short-term memory based
networks,” in European Symposium on Research in Computer Security.
Springer, 2017, pp. 65–82.
[6] Q. Le and T. Mikolov, “Distributed representations of sentences and
documents,” in International Conference on Machine Learning, 2014,
pp. 1188–1196.
[7] C. Piech, J. Huang, A. Nguyen, M. Phulsuksombati, M. Sahami, and
L. Guibas, “Learning program embeddings to propagate feedback on
student code,” in Proceedings of the 32Nd International Conference
on International Conference on Machine Learning - Volume 37,
ser. ICML’15. JMLR.org, 2015, pp. 1093–1102. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3045118.3045235
[8] V. J. Hellendoorn and P. Devanbu, “Are deep neural networks the best
choice for modeling source code?” in Proceedings of the 2017 11th
Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE
2017. New York, NY, USA: ACM, 2017, pp. 763–773. [Online].
Available: http://doi.acm.org/10.1145/3106237.3106290
[9] S. Behnel, M. Faassen, and I. Bicking, “lxml: Xml and html with
python,” 2005.
[10] G. Salton and M. J. McGill, “Introduction to modern information
retrieval,” 1986.
[11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
esnay, “Scikit-learn: Machine learning in Python,” Journal of Machine
Learning Research, vol. 12, pp. 2825–2830, 2011.
[12] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser,
M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray,
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar,
P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals,
P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,
“TensorFlow: Large-scale machine learning on heterogeneous systems,”
2015, software available from tensorflow.org. [Online]. Available:
https://www.tensorflow.org/
[13] D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available:
http://arxiv.org/abs/1412.6980

You might also like