2016 Eled

The document discusses a hybrid method for automatically assessing programming assignments that combines dynamic and static analysis. Dynamic analysis involves running tests on the code, while static analysis examines the code structure without running it. The proposed method transforms programs into control flow graphs for static similarity analysis and uses a unit testing framework for dynamic analysis to provide reliable automated evaluation.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

2016 Eled

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Hybrid assessment method for programming

assignments
Soundous Zougari, Mariam Tanana, Abdelouahid Lyhyaoui
Laboratory of Innovative Technologies, ENSA of Tangier,
University Abdelmalek Essaadi,
Tetouan, Morocco

Abstract— In this work, we address the issue of automatic have motivated many researchers to be interested in automating
assessment for programming assignments. The objective is to the process of assessing learners’ productions. The first
provide immediate feedback to the learners and save teachers reference comes from Hollingsworth who published on the
from manually managing all the students’ solutions. We will subject in 1960 [4]. The idea caught on quickly and several
present a method merging results from dynamic and static assessment systems have been developed [5]. Unfortunately,
analysis to ensure a reliable and objective evaluation job. While these systems are neither generic nor configurable and most of
dynamic analysis is based on unit testing framework, the static them are not available to the general public, that is why we
analysis will focus on finding the adequate structural similarity seek to develop our own assessment system.
measure after transforming the programs into control flow
graphs. In this sense, this work presents a reliable and objective
method of assessing learners’ productions that not only will
Keywords—programming assessment; dynamic analysis; reduce the workload for teachers but also provide useful
static analysis; graph representation; graph similarity feedbacks to students throughout their learning process.
Concerning the practical domain, we opted for introductory
I. INTRODUCTION programming courses for several reasons. Besides the fact that
Nowadays, computers have truly entered the mainstream these courses are the core of any engineer's training, this is a
and programming is no longer the closed industry it used to be. domain where assessment is of a great complexity, mainly
Apple’s co-founder, Steve Jobs, once said, “I think because it is characterized by the multitude of solutions to a
everybody… should learn how to program a computer because given problem.
it teaches you how to think”. According to recent studies, The remainder of the paper is organized as follows :
learning to program brings enormous benefits for life [1]. Section 2 discusses the methods mainly adopted by program
Besides improving one’s problem-solving abilities, it help analysis systems, namely dynamic and static. Afterwards, we
acquiring useful traits like perseverance, precision, focus, … describe the proposed hybrid approach and address the
and last but not least, it transform us from technology passive programs similarity issue in Section 3. Finally, Section 4
consumers into active producers which is incredibly contains conclusions of this paper and discusses about the
empowering. future research.

To learn and master a new programming language, students II. PROGRAM ANALYSIS METHODS
need to solve a large number of exercises in order to practice The validation of computer programs is a crucial part in
the new syntax and semantics of the language. The comments the cycle of their development. Two verification and
and feedback from teachers about the mistakes they made are validation techniques have stood out in recent years: dynamic
crucial to improve their knowledge. However, it is difficult for analysis and static analysis. The main difference between
teachers to manually manage all students’ solutions. Indeed, these two approaches is that the dynamic analysis requires the
correcting manually programming exercises can involve a lot execution of the program to check its accuracy, unlike the
of work and consume much time. It is often a complex and static analysis that examines a program without executing it.
daunting task since each program must be tested and its source This section aims to present them briefly:
code analyzed. In addition, the correction process is prone to
errors or omissions due to the fatigue and the repetitive nature A. Dynamic Analysis
of the task [2]. Dynamic analysis involves running the code to verify its
accuracy. This is achieved by using different and varied test
Furthermore, the advantages of the automatic assessment cases allowing maximum path coverage of the programs. It is
are especially appreciated in the context of e-learning [3]. intended to detect errors by comparing the gotten results with
Several universities worldwide offer numerous online courses. those expected by the specification. This method is adopted by
The number of students enrolled in these courses is on the most of the automatic programming assessment systems such
order of thousands. In online courses, the teaching process is as Ceilidh [6], TRY [7], BAGS [8], Kassandra [9].
carried out via the computer, with minimal or no contact with
the teacher. Therefore, fast and reliable automated assessments 1) Advantages of the dynamic analysis:
are particularly desirable. All these reasons and so many more
• The dynamic analysis is easy to implement, and B. Static Analysis
tester can be non-technical. The static program analysis is a family of techniques
• It makes possible evaluating any program allowing to collect information on the program without having
performance in terms of the generated results to run it and therefore eliminate the risks associated with its
compared with the expected outputs in the test case. execution. Among the tools that rely on static analysis lint [11]
and AutoLEP [12].
• It allows analysis of applications in which we do not
have access to the actual code. We can distinguish various methods within the approach of
2) Disadvantages of the dynamic analysis: static analysis ; Style analysis, Metric analysis, Keyword
• Risks related to the execution of the source code, analysis, Structural analysis,...
e.g. a buffer overflow that can cause a sudden 1) Advantages of the static analysis:
breakdown of a server and put the data at risk. • It can find weaknesses in the code at the exact
• The major drawback of this method is that if the location.
program does not compile, then it cannot be • It takes into account all possible execution paths.
assessed.
• The ability to analyze the program even if the code
• The generated feedback is limited to the expected contains errors as opposed to dynamic analysis.
outputs of the test case.
2) Disadvantages of the static analysis:
• We cannot check the conformity of the program as • The limitations of structural analysis since there are
for the requirements defined by the instructor. a variety of solutions for the same problem.
3) Dynamic analysis tool: To perform dynamic analysis,
• Intricate and difficult to apply in the context of
we suggest the use of a dedicated framework to automate and
complex programs.
conduct tests in a given language. This not only allows
separating the test code from the code, making possible testing • Automated tools can produce false positives and
it and thus facilitates its reuse, but also to do it without a false negatives.
manual intervention and a human interpretation. Finally, the
analysis of the gotten results could be automated since each Static analysis requires two steps: transformation of both,
test result has a status, generally ok or error [10]. student program and model program into an intermediate
representation and analysis after transformation. First, an
As a matter of fact, there exist a number of dynamic intermediate representation form should be selected and
analysis tools in the industry, we will be content with generated from the source code. A category generally used as
describing the xUnit framework since it is the one we are intermediate representation for programs is graph. There are a
using. lot of available graphs such as Control Flow Graph (CFG),
The xUnit framework is a family of several similar Program Dependence Graph (PDG) and System Dependence
frameworks gathered in a family named xUnit. The JUnit Graph (SDG). The second step consists in analyzing the
framework was the first to be widely known but different programs after transformation. Since we are representing the
development platforms and programming languages followed programs with graphs, the analysis turns into computing the
similarity of graphs.
including nUnit (.Net), Dunit (Delphi), CppUnit (C ++), ... We
will explain in what follows some of the JUnit basic features C. Graph similarity:
and architecture components shared with other frameworks of The problem of computing the similarity of graphs is also
the xUnit family. known as graph matching.
JUnit offers:
1) Graph matching: Graph matching is the process of
• Assertions for testing expected results finding a correspondence between the nodes and the edges of
• Test features for sharing common test data two graphs that satisfies some constraints ensuring that similar
• Test suites for easily organizing and running tests substructures in one graph are mapped to similar substructures
• Graphical and textual test runners in the other [13].

The tests to automate are expressed in classes in the form The matching process varies from exact to relative or
of test cases with their outcomes. JUnit runs these tests and inexact matching. The first category requires a strict
compares their outcomes with the expected results. That’s how correspondence among the two objects being matched or at
the class code is separated from the code that allows testing it. least among their subparts. Whereas in the inexact methods,
Often to test a class, it is easy to create a main() method the matching can occur even if the two graphs being compared
containing the treatment of the tests. The downside is that this are structurally different to some extent. Fig.1. displays a
superfluous code will be included in the class. Moreover, it generic classification of all graph matching types [14].
must be executed manually.
A large number of graph matching applications in diverse
fields, have been described in the literature (such as social
networks, image processing, biological networks, chemical The set of elementary graph edit operators typically
compounds, and computer vision) [15], and therefore there includes:
have been suggested many algorithms and similarity • vertex insertion to introduce a single new labeled
measures. vertex to a graph.
• vertex deletion to remove a single (often
disconnected) vertex from a graph.
• vertex substitution to change the label (or color) of a
given vertex.
• edge insertion to introduce a new colored edge
between a pair of vertices.
• edge deletion to remove a single edge between a pair
of vertices.
• edge substitution to change the label (or color) of a
given edge.

b) Feature extraction: the key idea behind these

methods is that similar graphs probably share certain
Fig. 1. Generic classification of graph matching types. properties, such as degree distribution, diameter, eigenvalues.
After extracting these features, a similarity measure is applied
2) Graph algorithms: Algorithms for graph comparison in order to assess the similarity between the aggregated
can be classified into three main classes [16]: statistics and, equivalently, the similarity between the graphs.
a) Edit distance / graph isomorphism: Graph These methods are powerful and scale well, as they map the
isomorphism is a bijective mapping from the nodes of graph G graphs to several statistics that are much smaller in size than
to the nodes of graph G’ that preserves all labels and the the graphs.
structure of the edges. It is a useful concept to find out if two
objects are the same, up to invariance properties inherent to Admittedly, a feature that contains important information
the underlying graph representation. Similarly, subgraph about a graph is the vector of its eigenvalues [15]. Let A1 and
isomorphism can be used to find out if one object is part of A2 be the adjacency matrices of graphs G1 and G2
another object, or if one object is present in a group of objects. respectively. Let also L1 = D1 − A1 and L2 = D2 − A2 be the
Maximum common subgraph can be used to measure the laplacians of the graphs, where D1 and D2 are the
similarity of objects even if there is no graph or subgraph corresponding diagonal matrices of degrees. In this method,
isomorphism between the corresponding graphs. Clearly, the we find the eigenvalues of the laplacians and we define the
larger the maximum common subgraph of two graphs is, the similarity between the graphs as:
greater is their similarity.
The graph edit distance is a generalization of the graph
isomorphism problem, where the target is to transform one
graph to the other by doing a number of operations (additions, (2)
deletions, substitutions of nodes or edges, and reversions of Where k is chosen s.t.
edges). This method associates each operation with a cost and
it attempts to find the sequence of operations that minimizes
the cost of matching the two graphs. The edit distance of two
graphs, G and G’, is defined as the shortest sequence of edit
operations that transform G into G’. Obviously, the shorter
this sequence is the more similar are the two graphs. Thus edit c) Iterative methods: the philosophy behind the iterative
distance is suitable to measure the similarity of graphs [17]. methods is that “two nodes are similar if their neighborhoods
are also similar”. This idea naturally leads to iterative methods
Generally, given a set of graph edit operations (also for computing similarity scores for the elements of these
known as elementary graph operations), the graph edit graphs, in which scores for similarity between elements
distance between two graphs G1 and G2, written as GED(G1, propagate along to neighboring elements at each time step,
G2) can be defined as [18]: this process ends when convergence is achieved.

These methods calculate similarity of nodes of two given

graphs, G1 and G2, by repeatedly refining the initial estimate
(1) of similarity using some update rule of form [19]:
Where P(G1,G2) denotes the set of edit paths
transforming G1 into (a graph isomorphic to) G2 and (3)
c(e)≥0 is the cost of each graph edit operation e.
Iterations are performed until some termination condition is
met. At the end, the similarity matrix is produced.
(4)
Different rules for update of similarity of two nodes are
proposed. They usually include summing all the similarities
between the neighbors of first node and the neighbors of the
second node [20].

III. OUR PROPOSAL

We can deduce that the strengths and weaknesses of the
dynamic and static approaches are complementary. We
therefore propose an original combination of these two
techniques. In this combination, the dynamic analysis reports
errors at runtime, then the static analysis evaluates the
structural properties of the programs. It must be emphasized
that the student’s solution will go through all the process even
if it generates runtime errors, although it will be penalized in
the final score. Figure 2 resumes the assessment approach.
To perform dynamic analysis, we run students’ programs
through a set of data, and then compare the output to the
predefined answers. As mentioned in the previous section, we
suggest the use of xUnit, a dedicated framework to automate
and conduct tests in a given language.
On the other hand, to evaluate the structural properties of
the programs (static analysis), we measure the similarity
degree by comparing the assessed program to the programs
belonging to the solution space provided by the teacher or
expert. A solution space is a set of paths representing the
different possible approaches for the same exercise. It can Fig. 2. Proposed assessment approach.
contain the correct solutions as well as the incorrect ones. It is
made by an expert and has deemed pedagogically interesting illustrated in Fig. 3, is used to provide an example for control
approaches. If a match is found; similarity measure is superior flow graphs. This is a simple program that initializes two
to a threshold defined by the teacher, the student’s program variables x and y, and executes 2 commands repeatedly in the
will be graded automatically, or else the program is submitted while loop until y is greater than or equal to 10.
to the teacher for manual assessment. In the last case, the We may note that in the flow control approach, the focus is
students’ solution can be added to the solution space if it is on the sequencing of operations in a process. Control flow
judged pedagogically interesting or a recurrent incorrect graphs are used as models to describe the structure of
solution. This approach will gradually decrease human computer programs. They are used both for static analysis [22]
intervention. and as a model for program coverage. Therefore, it’s a suitable
As previously stated, this method requires two steps; the representation for structural comparison. However, we will not
passage through the graphical representation of the compared definitively exclude other types of graphs because they can be
programs, which is addressed in section A, and a similarity or interesting for future modifications in our system.
a matching process. More details are given in section B.
A. Program’s graphical representation: voidmain() {
Each and every graphical method for program int x = 0;
representation supports some unique features [21, 22]. Since int y = 1;
our intention is to assess students’ productions in introductory while (y < 10) {
programming courses, we choose to represent programs with y = y * 2;
control flow graphs. x = x + 1;
The Control Flow Graph (CFG) is a directed graph where }
each node represents a basic block i.e. a straight-line piece of printf(“%d”,x);
code without any jumps or jump targets; jump targets start a printf(“%d”,y);
block, and jumps end a block. Directed edges are used to }
represent jumps in the control flow. It highlights loops,
conditional statements and branches. A path in this graph
represents a program implementation scenario. The program Fig. 3. Program example.
The program corresponding CFG is displayed in Fig. 4. IV. CONCLUSION
In this paper, we propose a hybrid assessment method for
students’ productions aiming introductory programming
courses. The approach is based on merging information from
two different evaluation methods, dynamic and static analysis.
The first is carried out using the xUnit framework, which
provides features that do not only ease the dynamic analysis
process but also makes it flexible and reusable.
The second analysis is achieved by comparing structurally,
Fig. 4. Control Flow Graph. the student solution with a set of expert provided programs.
This process consists of two parts: the transformation of the
B. Program’s graph matching: programs into an intermediate representation and the
Once the student and the expert provided solutions are examination of similarity measurements. The intermediate
transformed into graphs, we compare them in order to identify representation in our approach is based on the Control Flow
similarities and/or differences. As outlined in section II, Graphs (CFG). Our efforts right now are into analyzing
algorithms for graph comparison can be classified into three advantages and disadvantages of CFG-based code similarity
main classes. Reviewing the literature [16,17,19,20,23], we algorithms by comparing them experimentally and
can deduce the following: theoretically. This is such an important task since it can
seriously affect the accuracy and performance of our
• Graph isomorphism problem and its generalizations; assessment system.
edit distance and maximum (minimum) common sub(super) As soon as the new system is developed, we intend to test
graph have received an incredible amount of interesting it with real users. We plan to design and implement an
academic research. However, the drawback of graph experiment in real learning environments to assess the
isomorphism is that the exact versions of the algorithms are usability and performance of the proposed system. This
exponential and, thus, not applicable to large graphs. experiment will also allow us to evaluate its weaknesses and
therefore improve it.
• Although the feature extraction category is a response
to the emergence of very large graphs like the World Wide
Web or Social networks, it is not always accurate. In fact, REFERENCES
depending on the statistics that are chosen, it is possible to get [1] K. Heggart, “Coded for success: the benefits of programming among
results that are incorrect. school students”, June 2014.
[2] Higgins, S., Hall, E., Baumfield, V., Moseley, D. (2005). A meta-
• Regarding the iterative algorithms, they are random in analysis of the impact of the implementation of thinking skills
approaches on pupils. In: Research Evidence in Education Library.
the sense that they contain randomized operations where the London: EPPICentre, Social Science Research Unit, Institute of
pool of node to node correspondences is iteratively and Education, University of London.
randomly modified. Hence, the solution space is randomly [3] Allen, I.E., & Seaman, J. [2010]. Class Differences: Online Education
explored. Also, an obvious limitation of these methods is that in the United States, 2010. Newburyport, MA: Babson Survey Research
the computation time is linear with respect to the product of Group and The Sloan Consortium Green.
graph sizes due to the size of the similarity matrix. [4] J. Hollingsworth. Automatic graders for programming classes.
Communications of the ACM, 3:528–529, October 1960.
[5] C.Douce, D.Livingstone, and J. Orwell. 2005. Automatic test-based
In order to choose a category, we ought to analyze our assessment of programming: A review. Journal on Educational
needs; CFG similarity measures. While in principle, a CFG Resources in Computing (JERIC) 5, 3 (2005), 4.
can be arbitrarily complex, graphs found in real programs tend [6] S. Benford, E. Burke, E. Foxley, N. Gutteridge, and A. M. Zin.
to have some very specific properties: the outdegree of a node Experiences with the Ceilidh system. In Proceedings of the International
is often upper bounded by two (exceptions include nodes that Conference in Computer Based Learning in Science, Vienna, 1993.
represent switch statements, exception handling, and [7] Kenneth A. Reek. The TRY system -or- how to avoid testing student
programs. In Proceedings of the twentieth SIGCSE technical symposium
computed gotos); CFGs often resemble series-parallel graphs; on Computer science education, SIGCSE ’89, pages 112–116, New
CFGs are often reducible; and basic blocks tend to be small, York, NY, USA, 1989. ACM.
on the order of 4-7 instructions [23]. [8] J. B. Hext and J. W. Winings. An automatic grading scheme for simple
programming exercises. Commun. ACM, 12(5):272–275, May 1969.
Although plenty of CFG-based code similarity algorithms [9] Urs Von Matt. Kassandra: the automatic grading system. SIGCUE
Outlook, 22(1):26–40, January 1994.
have been proposed, there seems to have been little effort into
[10] HTTP://WWW.JMDOUDOUX.FR/JAVA/DEJ/CHAP-FRAMEWORKS-TEST.HTM
evaluating them relative to each other, or whether a newly
[11] Ian F. Darwin, « Checking C Programs with Lint », O'Reilly Media,
proposed algorithm improves on the state of the art. That is October 1988.
our concern at the moment. [12] Wang Tiantian; Su Xiaohong; Ma Peijun; Wang Yuying; Wang
Kuanquan, "AutoLEP: An Automated Learning and Examination
System for Programming and its Application in Programming Course,"
in Education Technology and Computer Science, 2009. ETCS '09. First
International Workshop on , vol.1, no., pp.43-46, 7-8 March 2009.
[13] Carlo Sansone, Marion Vento; “Thirty Years Of Graph Matching In
Pattern Recognition”, 2004.
[14] Endika Bengoetxea, PhD Thesis, 2002.
[15] Danai Koutra et al., “Algorithms for Graph Similarity and Subgraph
Matching”, 2011.
[16] Yannik Allard , “Investigate the Assessment of Information Structures
using Graphs”, 2015.
[17] Xinbo Gao, Bing Xiao, Dacheng Tao, Xuelong Li, “A survey of graph
edit distance”, 2010.
[18] https://en.wikipedia.org/wiki/Graph_edit_distance
[19] Mladen Nikolic, “Measuring Similarity of Graph Nodes by Neighbor
Matching”.
[20] V. D. Blondel et al., “ A measure of similarity between graph vertices:
applications to synonym extraction and web searching”, SIAM Review
46 (2004) 647-666.
[21] Evaluation of Flow Graph and Dependence Graphs for Program
Representation, Arora Vinay; Bhatia Rajesh Kumar; Singh Maninder,
October 2012.
[22] S. Rao Kosaraju. Analysis of structured programs. J. Comput. Syst. Sci.,
9(3) :232-255, 1974.
[23] Patrick P.F. Chan and Christian Collberg, “A Method to Evaluate CFG
Comparison Algorithms”.