Coding Contest RP

The document analyzes source code submissions from competitive programming contests to predict a user's rank and country based on their coding style. Key features are extracted from the source code, including n-grams from tokens and traversals of the abstract syntax tree. Machine learning models like neural networks and gradient boosted decision trees are trained and evaluated, with the neural network achieving the highest prediction accuracies of 77.2% for rank and 72.5% for country.

Uploaded by

gfhghjvb

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Coding Contest RP

Uploaded by

gfhghjvb

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Analysis of Code Submissions in Competitive Programming Contests

CS 229 Project, Autumn 2018

Wenli Looi ([email protected])
Source Code: https://github.com/looi/CS229

Abstract—Algorithmic programming contests provide an op- II. R ELATED W ORK

portunity to gain insights into coding techiques. This paper an-
alyzes contest submissions on Codeforces, a popular competitive Allamanis et al. (2018) [2] is a survey of machine learning
programming website where participants solve about 5 to 10 techniques used to analyze source code. There has been
algorithmic problems in a typical contest. We attempt to predict significant past work on applying techniques from natural
a user’s rank and country based on a single C++ source code
submission. Features were generated by running source code language processing (NLP) and analyzing the abstract syntax
through the Clang C++ compiler and extracting bigrams from tree (AST), both of which are techniques used in this project.
the tokens and traversal of the abstract syntax tree (AST). Out Machine learning has been applied to many problems, such as
of several models, the neural network model achieved the highest autocompletion, inferring coding conventions, finding defects,
accuracies of 77.2% in predicting rank (within one rank) and translating code, and program synthesis.
72.5% in predicting country. Despite not achieving the highest
accuracy, the GDA model was easier to interpret and allowed us Burrows et al. (2007) [3] attempted to determine the author
to find specific differences in coding styles between programmers of C source code by finding N-grams in the tokens, similar to
of different ranks and countries. what is done here. They classified a code sample by finding
the closest code sample in the corpus as measured by some
I. I NTRODUCTION similarity measure. On a collection of 1640 documents written
Codeforces [1] is one of many competitive programming by 100 authors, they were able to identify the author with 67%
platforms that provide an opportunity to gain insights into accuracy using 6-grams. A disadvantage of this approach is
coding techniques. In a contest, participants solve about 5 to that searching through the entire training corpus may have
10 well-defined algorithmic programming problems by writing scalability issues as it grows.
short stand-alone solutions in a programming language such A study by Ugurel et al. (2002) [4] attempted to classify
as C++, Java, or Python. Each solution is generally written by C/C++ source code archives into various categories, like
only one person (user). It is known whether the solution passed Database, Network, and Word Processor. For features, they
or failed. In addition, the user’s skill level (rating/rank) and used single tokens from the source code as well as bigrams
declared country are known. Most problems on Codeforces and lexical phrases from README files and comments. A
have hundreds or thousands of passing submissions. Unlike support vector machine (SVM) was then trained to perform
many other platforms, all submissions on Codeforces are the classification. They achieved an accuracy of around 40-
publicly viewable, making it an ideal candidate for analysis. 70% depending on the features and data set.
On Codeforces, users are assigned a numerical rating based More recently, recurrent neural networks (RNNs), such as
on their performance in past contests. Users are then assigned long short-term memory (LSTM) based networks, have been
one of ten ranks based on their rating, ranging from Newbie used to classify source code. Alsulami et al. (2017) [5] used
to Legendary Grandmaster. These ranks are shown in the Data an LSTM-based network to determine the author of Python
Set section in Table II. and C++ source code. They fed a traversal of the AST into
The goal of this project is to predict a user’s rank (within the RNN, which inspired the traversal-based method used in
one rank) and country based solely on a single passing source this paper. The RNN included an embedding layer to convert
code submission. As well, some interpretation of the learned AST nodes into a fixed length vector. Their best-performing
models is done to find differences in coding styles between model, a bidirectional LSTM, achieved 85% accuracy on a
skill levels and countries. Since only passing submissions are C++ dataset with 10 authors and 88.86% accuracy on a Python
considered, predictions are based only on coding style and not dataset with 70 authors.
whether the code works or not (all code works). Techniques similar to doc2vec (Le et al., 2014) [6], where
Analysis from this project will not only highlight the coding entire documents are converted to an embedding space, have
techniques of competitive programmers, but may also be also been used to classify programs. Piech et al. (2015) [7]
relevant for code written in industry or academia. While code encoded programs as a linear map between a precondition
written in programming contests differs from real-world code, space and postcondition space. They used the linear maps with
some coding best practices may apply to both, and the analysis an RNN to predict feedback that an educator would provide
techniques used here may also be applicable to other code for a piece of code, achieving 90% precision with recall of
bases. Code from programming contests, however, is easier to 13-48% depending on the coding problem.
analyze than most code for the reasons described above (e.g. Recurrent neural networks may not be better than N-gram
written by exactly one person). based methods, however. Hellendoorn et al. (2017) [8] found
that carefully tuned N-gram based models are more effective TABLE III
than RNN and LSTM-based networks in modeling source DATA SET USED FOR COUNTRY PREDICTION
code. They had higher modeling performance (entropy) and Country # in % of
were able to provide more accurate code suggestions. Data Data
This paper uses an approach based on N-grams and AST India 12331 29.31%
China 8818 20.96%
traversal with machine learning methods taught in CS 229. Russia 6761 16.07%
Bangladesh 4536 10.78%
III. DATA S ET Vietnam 1753 4.17%
Ukraine 1694 4.03%
The data set currently consists of 10 contests on Codeforces Poland 1664 3.95%
Egypt 1662 3.95%
from Aug to Nov 2018. All of the contests are “combined United States 1450 3.45%
division” contests, open to users of all ranks. Each contest Iran 1406 3.34%
has ~6k submissions for a total of ~60k, with the data format Total 42075 100.00%
shown in Table I.

TABLE I IV. P REPROCESSING

DATA FORMAT FOR EACH SUBMISSION
Before applying machine learning algorithms, the C++
Country Rating Source Code source code is preprocessed as shown in Fig. 1.
RU 2193 #include <iostream>\n#include...
US 1747 #include <bits/stdc++.h>\nusing... Source code is first converted to a sequence of strings. It is
··· ··· ··· run through the Clang C++ compiler to produce a list of tokens
and an abstract syntax tree (AST). Comments are removed
as our focus is coding style. To help the learning algorithms
Not all contest submissions are in the data set. Only generalize better, all tokens equal to a variable or function
contestants with a declared country and who participated in at name in the AST, or representing a string or character literal,
least one previous contest are considered. For each problem, are replaced with special tokens !!VAR, !!FUN, !!STR,
only the latest passing submission for each user is considered and !!CHR respectively. The AST is converted to a list of
(if any). Only C++ solutions are considered. C++ is the most strings as a pre-order traversal, where additionally a special
popular language, with about 90% of total submissions being token endblock is added when all of a node’s children have
C++ in the contests used here. been visited. To simplify later processing, the processed tokens
The number of submissions for each rank in the data set is are concatenated with the AST traversal to produce a single
shown in Table II. A user’s rating may change based on their sequence of strings.
performance in the contest. We only consider the user’s rating The sequence of strings is then processed further. Bigrams
before the contest. (and unigrams) with at least 1% frequency in the training set
For the country analysis, we used a subset of the data are counted to produce features. To help prevent the learning
consisting only of the users in the 10 most common countries. algorithms from favoring shorter or longer solutions, each
These countries cover about 70% of the full data set. A count vector is normalized by the L2 norm. (TF-IDF [10] was
summary of this data set is shown in Table III. briefly tested, but L2 normalization seemed to work better.)
Both data sets have a significant class imbalance. Various The features are then scaled to have zero mean and unit
techniques were needed to handle this, as described later. variance. Without the normalization and scaling, the GDA
We implemented a custom scraper for Codeforces in Python model had a much lower accuracy, and the logistic regression
using lxml [9] to parse the HTML. model failed to converge in training (probably because the
gradients were ill-behaved).
TABLE II In the data set, the average number of tokens in a program
DATA SET USED FOR RANK PREDICTION is 428, the average length of the AST traversal is 627, and the
Codeforces Rank Rating # in % of average length of the concatenated sequence is 1055.
Bounds Data Data
Legendary Grandmaster 3000+ 378 0.63% V. M ETHODS
International Grandmaster 2600-2999 1319 2.20%
Grandmaster 2400-2599 1781 2.97% Several learning algorithms are used to predict a user’s
International Master 2300-2399 1733 2.89% rating and country based on their code. Here, m denotes the
Master 2100-2299 6375 10.64% number of training examples, x(i) denotes the feature vector
Candidate Master 1900-2099 8756 14.61%
Expert 1600-1899 17824 29.75% for example i, and y (i) denotes the label for example i.
Specialist 1400-1599 11445 19.10%
Pupil 1200-1399 7405 12.36% A. Linear regression
Newbie 0-1199 2896 4.83%
Total 59912 100.00% Linear regression is used to predict the user’s rating and
the rank is inferred from the rating using Table II. Due to
class imbalance, we used a weighted least squares loss where
Prediction assumes a uniform prior due to class imbalance:
Source code
int n; // my var
int main() { p(y = k) = 1/(# classes) (forced uniform)
scanf("%d", &n); p(x|y = k) ∼ N (µk , Σ)
...
C. Logistic regression
libclang C++ parser For country prediction, we use softmax regression with a
weighted cross-entropy loss. Again, the weight w(i) is the
(i)
Raw tokens Abstract syntax tree inverse of the class size. ŷk denotes the predicted probability
"int", "n", ";", "int", TranslationUnit that example i is in class k.
"main", "(",")", "{", VarDecl n
"scanf", "(", "\"%d\"", FunctionDecl main (i)

(i) (i)
exp θkT x(i)
",", "&", "n", ")", ";", CompoundStmt ŷk = p y = k|x =P T (i)

... CallExpr scanf ... j exp θj x

m
X
Processed tokens AST traversal max w(i) log ŷy(i)
"int", "!!VAR", ";", "TranslationUnit", θ
i=1
"int", "main", "(",")", "VarDecl", "endblock",
"{", "scanf", "(", "FunctionDecl", For rank prediction, since the goal is to predict within
"!!STR", ",", "&", "CompoundStmt",
"!!VAR", ")", ";", ... "CallExpr", ...
one rank of the actual rank, we trained a separate logistic
regression model for each rank. Each training example of rank
r is considered to be a positive example in the models for ranks
Concatenated sequence
within 1 rank of r.
"int", "!!VAR", ";", ..., "TranslationUnit",
"VarDecl", "endblock", ... (i) 1
ŷk =
1 + exp −θkT x(i)
Bigram extraction (min. 1% freq)
m #X
X ranks
max w(i) 1{|j − y (i) | ≤ 1} log ŷj
Normalization and scaling θ
i=1 j=1

Approx. 2k features +1{|j − y (i) | > 1} log(1 − ŷj )

All examples are considered positive examples in three ranks,

Fig. 1. Data preprocessing. except for examples of the lowest and highest ranks, which are
only considered positive in two ranks. Therefore, their weight
is multiplied by 3/2 here. Empirically, this results in a model
the weight w(i) is the inverse of the number of users in the where the classification accuracy for each rank is more even.
training set with the same rank:
m 2 D. Neural network
1 X (i) T (i)
min w θ x − y (i) The neural network model is similar to logistic regression
θ 2
i=1 and uses the same loss functions. A single fully-connected
All of the other methods are classification algorithms, rather rectified linear layer with 100 units is inserted between the
than regression, as they seemed to work better (see results). input and output layers as shown in Fig. 2. Adding even more
hidden units seemed to increase the accuracy but this was not
fully tested due to time constraints.
B. Gaussian discriminant analysis (GDA)
The maximum likelihood estimators of each class mean µk Input ReLU Output
and the covariance matrix Σ are computed, where again the (~2000) (100) Sigmoids/
weight w(i) is the inverse of the class size: Softmax

Pm (i)
i=1 1{y = k}x(i)
µk = P m (i)
i=1 1{y = k}
Pm (i)
T
i=1 w x − µy(i) x(i) − µy(i)
(i)
Fig. 2. Neural network architecture.
Σ= Pm (i)
i=1 w
VI. E XPERIMENTS TABLE IV
ACCURACY FOR EACH MODEL (10- FOLD CROSS VALIDATION )
All experiments were conducted using 10-fold cross valida-
tion. For each type of model, we trained 10 models where each Model Accuracy Accuracy
model is trained on 9 contests (~54k examples) and tested on 1 (Rank±1) (Country)
Train Test Train Test
contest (~6k examples). The values reported here are averages Random/constant 30.0% 30.0% 10.0% 10.0%
over the 10 models. With this methodology, the models are Linear regresion 69.6% 60.1% N/A N/A
tested on problems never seen in training. This ensures that the GDA 75.7% 67.2% 75.0% 65.0%
Logistic regression 86.1% 71.6% 92.2% 68.4%
models are not learning specific features about the problems
Neural network 94.4% 77.2% 97.0% 72.5%
in the training set.
Due to the class imbalance described before, accuracy is
defined as the weighted accuracy where the weight w(i) of
each example is the inverse of the class size in the test set. when predicting a user’s rating in the test set. Given that ranks
For rank, we allow the predicted rank to be within one rank have a rating range of ~200, this is a fairly large error.
of the actual rank. If y (i) is the actual label and ŷ (i) is the GDA worked surprisingly well, achieving accuracies that
predicted label for example i: are almost as high as logistic regression. While GDA assumes
Pm that p(x|y) is multivariate Gaussian, logistic regression does
w(i) 1{y (i) = ŷ (i) }
Accuracy (Country) = i=1 Pm (i)
not make that assumption and is capable of modeling a large
i=1 w variety of other distributions. Since the accuracies are similar,
Pm
w(i) 1{|y (i) − ŷ (i) | ≤ 1} this indicates that p(x|y) is Gaussian to some degree.
Accuracy (Rank±1) = i=1 Pm (i)
i=1 w Out of all the algorithms, the neural network had the highest
The weighted accuracy shows how well the model can accuracies. The neural network was probably able to learn
predict all classes and not just the majority. A model that more complex relationships between the features compared
strongly favors larger classes would achieve a high unweighted to the other algorithms. Perhaps some combination of several
accuracy but low weighted accuracy. bigrams is highly indicative of rank or country. Interpretation
For the linear regression model, we also report the weighted of the neural network is out of scope of this project, however.
root mean-squared error (RMSE) for the predicted rating: The high training accuracies, compared to test accura-
sP cies, may indicate overfitting. In the neural network, dropout
m (i) y (i) − ŷ (i) 2

i=1 w
helped reduce overfitting (as described before), but no other
RMSE = Pm (i) regularization techniques were used. We briefly tried using
i=1 w
principal component analysis (PCA) to reduce the number of
Scikit-learn [11] is used to train the linear regression and GDA features, and L2 regularization on the parameters, but these
models, while TensorFlow [12] is used to train the logistic re- techniques decreased the test accuracy. More data helped
gression and neural network models. Models were trained with reduce overfitting, as the accuracy values are about 5% higher
the entire training set as a single batch. For logistic regression, than initial tests performed with 5 contests instead of 10.
we used gradient descent with a 0.1 learning rate, while for For each actual rank and country, the neural network test
the neural network, we used the Adam algorithm [13] with a accuracies are shown in Fig. 3 and 4. The model seems to be
0.0001 learning rate. These learning rates were experimentally able to predict all ranks with similar accuracy. For country,
found to converge. 50% dropout is used for the hidden layer, the model is able to predict the more common countries with
meaning that on every iteration, 50% of the hidden nodes are higher accuracy despite the weighted loss function used. This
inactive. This helps prevent the network from overfitting and may be because there is significantly more training data for
was found to increase the test accuracy. the more common countries.
VII. R ESULTS AND D ISCUSSION
The accuracies obtained for each model are shown in Table Legendary Grandmaster
IV. For reference, the accuracy of a model that outputs a International Grandmaster
random or constant output is shown in the first row. A model Grandmaster
International Master
that outputs a constant or random rank, except for the highest
Master
and lowest rank, would achieve 30% accuracy because there Candidate Master
are 3 ranks within 1 rank of the chosen rank. For country, Expert
however, we require that the model classifies the exact country, Specialist
and there are 10 countries in the data set. Pupil
Newbie
Classification was found to work better than regression
0 0.25 0.5 0.75 1
when predicting the rank. This may be because classification
optimizes what we actually care about, which is predicting
the correct rank, rather than the rating. The linear regression Fig. 3. Neural network test accuracy for rank (±1) by actual rank
model had a weighted RMSE (as previously defined) of 545
of the AST and appears exactly once per per program, but
India
China
since the count is normalized by the L2 norm of the count
Russian vector, its value will be higher in shorter programs. Thus, it
Bangladesh appears that GDA has learned to associate smaller programs
Vietnam with lower skill levels, despite having the L2 normalization to
Ukraine try to prevent this. It makes sense that a long program would
Poland
likely indicate a hard problem and a high skilled competitor.
Egypt
United States Tables VII and VIII show the features with the highest class
Iran means for Chinese and American competitors respectively.
0 0.25 0.5 0.75 1 It seems that Chinese competitors often use getchar to
read single characters from standard input, and import C
Fig. 4. Neural network test accuracy for country by actual country input libraries like cstdio. American competitors seem to
often spell out std in their code (like std::cout <<
std::endl) instead of importing the entire namespace with
VIII. I NTERPRETATION OF THE GDA MODEL using namespace std, and use ld which is a commonly
While the GDA model did not achieve the highest accuracy, used alias for long double.
its simplicity makes it possible to interpret the learned model
more easily. For this analysis, we randomly chose one of the TABLE VII
models from the 10-fold cross validation. To determine the S TRONGEST INDICATORS OF A C HINESE COMPETITOR
unigrams and bigrams that were the strongest indicators of
= getchar getchar getchar ( char !!VAR
high and low skill level, we compared the class means µk ; char cstdio cstdio > < cstdio
for the International Grandmaster and Pupil ranks and found cstring cstring > < cstring > !!CHR
the features where the class means had the largest (positive) < !!CHR { scanf
and smallest (negative) absolute difference. These features UnexposedExpr CharacterLiteral >= !!CHR
are shown in Tables V and VI. The features are ordered in
decreasing strength from left to right and top to bottom.

TABLE V TABLE VIII

S TRONGEST INDICATORS OF HIGH SKILL LEVEL S TRONGEST INDICATORS OF AN A MERICAN COMPETITOR

ifdef # ifdef assert endif ( std << std ld > > struct
# endif assert ( ( ... | ( :: < ld std ::
FunctionTemplate TemplateTypeParameter | , std struct ; template
__VA_ARGS__ FunctionTemplate ClassTemplate
ifdef LOCAL LOCAL endblock ClassTemplate os <<

TABLE VI
S TRONGEST INDICATORS OF LOW SKILL LEVEL IX. C ONCLUSION AND F UTURE W ORK
cin >> cin >> !!VAR >> In this paper, we studied the application of machine learning
cout << cout
TranslationUnit InclusionDirective techniques in predicting the rank and country of a Codeforces
TranslationUnit std ; competitor based on a single source code submission. The
IfStmt BinaryOperator main main ( neural network model achieved the highest accuracy of 77.2%
accuracy in predicting rank (within one rank) and 72.5% in
From this analysis, we can see that both tokens, like predicting country. Despite not achieving the highest accuracy,
cin >>, and AST nodes, like FunctionTemplate, are the GDA model was easier to interpret and we were able to
important to the model. As well, both unigrams and bigrams find unigrams and bigrams that were the strongest indicators
are important, although they are often related. of certain skill levels and countries.
High skilled competitors appear to use #ifdef signifi- Future work may include testing RNN or LSTM based mod-
cantly, perhaps to change the code’s behavior at compile time els, as discussed in Related Work. Acquiring more data may
by defining macros in the compiler flags. Also, they appear to help reduce overfitting. Token processing could be improved,
use assertions and C++ function templates. for example by replacing class and macro names with special
Low skilled programmers appear to use cin and cout for tokens in addition to variable and function names. N-grams
input. This makes sense since scanf and printf are faster with N > 2 could be tested as only unigrams and bigrams
input methods and often preferred by experienced competitors. were considered here. More hidden units or layers could be
It is interesting to see TranslationUnit as a strong added to the neural network. Interpretation of the logistic
indicator of low skill level. TranslationUnit is the root regression or neural network model could be attempted.
R EFERENCES
[1] M. Mirzayanov. Codeforces. [Online]. Available: http://codeforces.com/
[2] M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, “A survey
of machine learning for big code and naturalness,” ACM Comput.
Surv., vol. 51, no. 4, pp. 81:1–81:37, Jul. 2018. [Online]. Available:
http://doi.acm.org/10.1145/3212695
[3] S. Burrows and S. M. Tahaghoghi, “Source code authorship attribution
using n-grams,” in Proceedings of the Twelth Australasian Document
Computing Symposium, Melbourne, Australia, RMIT University. Cite-
seer, 2007, pp. 32–39.
[4] S. Ugurel, R. Krovetz, and C. L. Giles, “What’s the code?: automatic
classification of source code archives,” in Proceedings of the eighth ACM
SIGKDD international conference on Knowledge discovery and data
mining. ACM, 2002, pp. 632–638.
[5] B. Alsulami, E. Dauber, R. Harang, S. Mancoridis, and R. Greenstadt,
“Source code authorship attribution using long short-term memory based
networks,” in European Symposium on Research in Computer Security.
Springer, 2017, pp. 65–82.
[6] Q. Le and T. Mikolov, “Distributed representations of sentences and
documents,” in International Conference on Machine Learning, 2014,
pp. 1188–1196.
[7] C. Piech, J. Huang, A. Nguyen, M. Phulsuksombati, M. Sahami, and
L. Guibas, “Learning program embeddings to propagate feedback on
student code,” in Proceedings of the 32Nd International Conference
on International Conference on Machine Learning - Volume 37,
ser. ICML’15. JMLR.org, 2015, pp. 1093–1102. [Online]. Available:
http://dl.acm.org/citation.cfm?id=3045118.3045235
[8] V. J. Hellendoorn and P. Devanbu, “Are deep neural networks the best
choice for modeling source code?” in Proceedings of the 2017 11th
Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE
2017. New York, NY, USA: ACM, 2017, pp. 763–773. [Online].
Available: http://doi.acm.org/10.1145/3106237.3106290
[9] S. Behnel, M. Faassen, and I. Bicking, “lxml: Xml and html with
python,” 2005.
[10] G. Salton and M. J. McGill, “Introduction to modern information
retrieval,” 1986.
[11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
esnay, “Scikit-learn: Machine learning in Python,” Journal of Machine
Learning Research, vol. 12, pp. 2825–2830, 2011.
[12] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser,
M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray,
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar,
P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals,
P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,
“TensorFlow: Large-scale machine learning on heterogeneous systems,”
2015, software available from tensorflow.org. [Online]. Available:
https://www.tensorflow.org/
[13] D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available:
http://arxiv.org/abs/1412.6980

Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
Learning R Programming
From Everand
Learning R Programming
Kun Ren
5/5 (3)
Mastering C# and .NET Framework
From Everand
Mastering C# and .NET Framework
Marino Posadas
5/5 (6)
Cody's Data Cleaning Techniques Using SAS, Third Edition
From Everand
Cody's Data Cleaning Techniques Using SAS, Third Edition
Ron Cody
4.5/5 (3)
Practical Machine Learning: Learn how to build Machine Learning applications to solve real-world data analysis challenges with this Machine Learning book – packed with practical tutorials
From Everand
Practical Machine Learning: Learn how to build Machine Learning applications to solve real-world data analysis challenges with this Machine Learning book – packed with practical tutorials
Sunila Gollapudi
3/5 (2)
CASE STUDY - Political Globalization
No ratings yet
CASE STUDY - Political Globalization
5 pages
Automated Vulnerability Detectionin Source Code Using Deep Representation Learning
No ratings yet
Automated Vulnerability Detectionin Source Code Using Deep Representation Learning
7 pages
Codeforces
No ratings yet
Codeforces
3 pages
2022 - Multilingual Training For Software Engineering
No ratings yet
2022 - Multilingual Training For Software Engineering
13 pages
RJ 2020 017
No ratings yet
RJ 2020 017
19 pages
Personality Recognition Applying Machine Learning
No ratings yet
Personality Recognition Applying Machine Learning
5 pages
Plagiarism Detection in Programming Assignments Us
No ratings yet
Plagiarism Detection in Programming Assignments Us
9 pages
2504.15144v1
No ratings yet
2504.15144v1
10 pages
297 Proposal
No ratings yet
297 Proposal
3 pages
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
No ratings yet
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
9 pages
2308.10345v1
No ratings yet
2308.10345v1
18 pages
Shirley Yang Masc Thesis
No ratings yet
Shirley Yang Masc Thesis
65 pages
Source Code Analysis - An Overview: Radoslav Kirkov, Gennady Agre
No ratings yet
Source Code Analysis - An Overview: Radoslav Kirkov, Gennady Agre
18 pages
Bachelor Thesis: University of Marburg
No ratings yet
Bachelor Thesis: University of Marburg
56 pages
Unsupervised_Detection_of_Solving_Strate
No ratings yet
Unsupervised_Detection_of_Solving_Strate
9 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
software defect prediction ppr
No ratings yet
software defect prediction ppr
11 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Software Vulnerability Prediction Using Text Analysis Techniques
No ratings yet
Software Vulnerability Prediction Using Text Analysis Techniques
3 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Draft: An Empirical Comparison of C, C++, Java, Perl, Python, Rexx, and TCL For A Search/string-Processing Program
No ratings yet
Draft: An Empirical Comparison of C, C++, Java, Perl, Python, Rexx, and TCL For A Search/string-Processing Program
31 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Pratten_Thesis
No ratings yet
Pratten_Thesis
33 pages
Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and TCL Against C, C++, and Java
No ratings yet
Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and TCL Against C, C++, and Java
62 pages
Source Code Plagiarism
No ratings yet
Source Code Plagiarism
41 pages
Math for Security: From Graphs and Geometry to Spatial Analysis
From Everand
Math for Security: From Graphs and Geometry to Spatial Analysis
Daniel Reilly
No ratings yet
2103.11614v1 - Unknown
No ratings yet
2103.11614v1 - Unknown
34 pages
Survay of Programing Languages
No ratings yet
Survay of Programing Languages
37 pages
Mastering Algorithms for Competitive Programming: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Algorithms for Competitive Programming: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
C++ Regular Expressions Simplified: A Practical Guide with Examples
From Everand
C++ Regular Expressions Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
25654-Article Text-29717-1-2-20230626
No ratings yet
25654-Article Text-29717-1-2-20230626
9 pages
Concurrency in C++: Writing High-Performance Multithreaded Code
From Everand
Concurrency in C++: Writing High-Performance Multithreaded Code
Robert Johnson
No ratings yet
Modeling How Students Learn To Program: Chris Piech, Mehran Sahami, Daphne Koller, Stephen Cooper, Paulo Blikstein
No ratings yet
Modeling How Students Learn To Program: Chris Piech, Mehran Sahami, Daphne Koller, Stephen Cooper, Paulo Blikstein
6 pages
Computer Programming Using C
From Everand
Computer Programming Using C
Ramkrishna Ghosh
No ratings yet
C# Algorithms for New Programmers: A Practical Guide with Examples
From Everand
C# Algorithms for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
Sat - 3.Pdf - Code Smell Detection Using Machine Learning
No ratings yet
Sat - 3.Pdf - Code Smell Detection Using Machine Learning
11 pages
Applying Machine Learning To Software Fault Prediction: Bartłomiej Wójcicki, Robert Dąbrowski
No ratings yet
Applying Machine Learning To Software Fault Prediction: Bartłomiej Wójcicki, Robert Dąbrowski
18 pages
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
4. Studying the Quality of Source Code Generated by Different AI Generative Engines An Empirical Evaluation
No ratings yet
4. Studying the Quality of Source Code Generated by Different AI Generative Engines An Empirical Evaluation
19 pages
SO Snippet ENASE
No ratings yet
SO Snippet ENASE
10 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
A Systematic Literature Review On Fault Prediction Performance in Software Engineering PDF
No ratings yet
A Systematic Literature Review On Fault Prediction Performance in Software Engineering PDF
4 pages
Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Prediction of Cyber Attacks Using Data Science Technique
11 pages
C Programming Pocket Primer: An Essential Guide to C Programming Basics
From Everand
C Programming Pocket Primer: An Essential Guide to C Programming Basics
Mercury Learning and Information
No ratings yet
The Rust Programming Language, 2nd Edition
From Everand
The Rust Programming Language, 2nd Edition
Steve Klabnik
No ratings yet
A Novel Approach For Code Smells Detection Based On Deep Learning
No ratings yet
A Novel Approach For Code Smells Detection Based On Deep Learning
5 pages
Sodapdf Resized
No ratings yet
Sodapdf Resized
71 pages
Change of Problem Representation
No ratings yet
Change of Problem Representation
48 pages
20241121 Jype_an_education_oriented_integrated_pr
No ratings yet
20241121 Jype_an_education_oriented_integrated_pr
99 pages
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Final_project_milestone_submission
No ratings yet
Final_project_milestone_submission
4 pages
Mining Software Repositories With Topic Models: Stephen W. Thomas
No ratings yet
Mining Software Repositories With Topic Models: Stephen W. Thomas
43 pages
Hot Method Prediction Using Support Vector Machines: Ubiquitous Computing and Communication Journal
No ratings yet
Hot Method Prediction Using Support Vector Machines: Ubiquitous Computing and Communication Journal
7 pages
Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns
No ratings yet
Predicting Bad Commits: Finding Bugs by Learning Their Socio-Organizational Patterns
8 pages
Optimized Computing in C++: Mastering Concurrency, Multithreading, and Parallel Programming
From Everand
Optimized Computing in C++: Mastering Concurrency, Multithreading, and Parallel Programming
Peter Jones
No ratings yet
Arthur Morgan's 30 Best Quotes in RDR2
No ratings yet
Arthur Morgan's 30 Best Quotes in RDR2
1 page
11.RIR Recording Dates Corrected
No ratings yet
11.RIR Recording Dates Corrected
11 pages
FEB 7 ASSIGNED CASES
No ratings yet
FEB 7 ASSIGNED CASES
16 pages
Bhavarlal Jain
No ratings yet
Bhavarlal Jain
3 pages
Pvsa Guide 22-23 2
No ratings yet
Pvsa Guide 22-23 2
5 pages
Corruption in India: Click To Edit Master Subtitle Style
No ratings yet
Corruption in India: Click To Edit Master Subtitle Style
10 pages
Data Collection
No ratings yet
Data Collection
47 pages
Moti Presentation1
0% (1)
Moti Presentation1
8 pages
Web Testing
100% (2)
Web Testing
37 pages
SAP System PDF
100% (1)
SAP System PDF
9 pages
Skincare 7 Sept
No ratings yet
Skincare 7 Sept
38 pages
Earthquake Simulations of Large Scale Structures Using Opensees Software On Grid and High Performance Computing in India
No ratings yet
Earthquake Simulations of Large Scale Structures Using Opensees Software On Grid and High Performance Computing in India
4 pages
ANH411DE01 - CLASSROOM BASED LANGUAGE ASSESSMENT - 2331 (điều chỉnh)
No ratings yet
ANH411DE01 - CLASSROOM BASED LANGUAGE ASSESSMENT - 2331 (điều chỉnh)
10 pages
Movement Competency
No ratings yet
Movement Competency
3 pages
Portfolio Equity Derivatives Cash: 7229.94 29th June 2017 120 867592.80 457.55 29th June 2017 1200 549060.00
No ratings yet
Portfolio Equity Derivatives Cash: 7229.94 29th June 2017 120 867592.80 457.55 29th June 2017 1200 549060.00
6 pages
Anil Bhasin Havells
No ratings yet
Anil Bhasin Havells
2 pages
128 Sid 232
67% (3)
128 Sid 232
1 page
Adobe Forms - Create Table (Using Subform)
No ratings yet
Adobe Forms - Create Table (Using Subform)
6 pages
Ir Midterm
No ratings yet
Ir Midterm
4 pages
E Series OMNI-BEAM™ Sensors: Features
No ratings yet
E Series OMNI-BEAM™ Sensors: Features
10 pages
Sustainable Local Economic Development Indicator Framework A Tool For Property Building Redevelopment Projects
No ratings yet
Sustainable Local Economic Development Indicator Framework A Tool For Property Building Redevelopment Projects
20 pages
Tips Protect Divers Ocean: For To The Planet
No ratings yet
Tips Protect Divers Ocean: For To The Planet
1 page
Business Plan Chicken Diner
No ratings yet
Business Plan Chicken Diner
32 pages
Chef Basics Favorite Recipes
100% (2)
Chef Basics Favorite Recipes
58 pages
Global Positioning System (GPS) and Its Applications
No ratings yet
Global Positioning System (GPS) and Its Applications
15 pages
Bookkeeping2 Week3
No ratings yet
Bookkeeping2 Week3
17 pages
Part 7 Emachines EEP 3243
No ratings yet
Part 7 Emachines EEP 3243
39 pages
Pic Favorite
No ratings yet
Pic Favorite
38 pages
MurderMysteryGame21Jumpstreet-1
No ratings yet
MurderMysteryGame21Jumpstreet-1
8 pages

Coding Contest RP

Uploaded by

Coding Contest RP

Uploaded by

Analysis of Code Submissions in Competitive Programming Contests

CS 229 Project, Autumn 2018

Abstract—Algorithmic programming contests provide an op- II. R ELATED W ORK

TABLE I IV. P REPROCESSING

All examples are considered positive examples in three ranks,

TABLE V TABLE VIII

You might also like