Proposed Framework For Automatic Grading System of ER Diagram
Proposed Framework For Automatic Grading System of ER Diagram
Abstract—In this paper we present a preliminary research (ER) Diagram based on user business process. ER Diagram
with proposed framework for automatic grading system of Entity describes any entities or objects that related to the system.
Relationship (ER) Diagram. The proposed framework use ER Then, ER Diagram is translated to the Conceptual Data Model
Diagram in XML file format (XMI file) as an input to the system. (CDM) and finally to Physical Data Model (PDM). Any
There are two proposed approaches. First approach is using Tree components of ER Diagram will be converted to the tables,
Edit Distance algorithm to assess ER Diagram similarity. The attributes, data type, primary key, foreign key, and constraint.
output is similarity score and feedback for inputted ER Diagram.
Second approach is using machine learning algorithm to build In learning database, designing ER Diagram is a very
classifier that grade ER Diagram automatically. These two important process to produce a good information system. Every
approaches will be implemented and evaluated in the next phase student must be able to design ER Diagram in the right way
of research. according to the user business process. Good ER Diagram will
produce good database that record data accurately. However,
Keywords—Entity Relationship Diagram, similarity, grading with the limitation of students (in the learning process), they
system often design an ER diagram in the wrong way. Furthermore,
they often design similar or identical ER Diagram with another
I. INTRODUCTION student without giving attention to the theory, concept or
Automatic Grading System (AGS) is one of the most conventional rule for drawing ER Diagram.
popular topic at the moment. This system can help the process Moreover, sometime lecturer or teaching assistant did not
of software development in a company, the teaching process of have enough time to assess and giving feedback for many ER
software development in the university, especially those Diagram from student. Although they can do it, it was difficult
applying e-learning system. In software development process, for them to remember the same ER Diagram. Even, same error
developer can determine similarity of one module with others which frequently appeared on several design ER Diagram was
quickly by using AGS. produced by students. Consequently, lecturer or teaching
In university, AGS gives many advantages to lecturer, assistant gave inconsistent grade for the student (different
especially for programming course. The lecturer assesses grade for similar ER Diagram).
programming task faster. He/she only inputs student answer to At this time, the development of technology is very fast. It
AGS and AGS results grade directly to the user based on the can be seen with XML data format that widely used today.
test case. It reduces time and effort of lecturer. In addition, XML format is not used only for information exchange but also
there is a possibility AGS can give real time feedback to the in software development. Many software provide a feature to
student especially in e-learning system. The student would not export the design to XML format (XMI file). For instance,
have to wait for the availability of teaching assistants or faculty Enterprise Architect provide a functionality to export ER
member to learn about the quality of their programs in order to Diagram design to XMI file. XMI is a compressed file format
improve themselves. The most important advantage is AGS has created by XMill. XMI files are XML files, which usually
a standardization to assess computer programs. The standard contain metadata information, which have been compressed.
was built based on rubric that maps a score (quantitative XMI stands for "XML Metadata Interchange". XML files to be
measure) to the ability of the student to solve a problem. very large, the compression used by XMill into .XMI files will
Database is one of component in information system that make the files around half the size of other compression
store data. In software development lifecycle, database is not techniques. XMI files can be decompressed using XML or
just show up after requirement gathering process. The database similar XMill compression software [1].
designer have to design a diagram called Entity Relationship Please see example of ER Diagram design in Figure 1:
By using Enterprise Architect tools, above ER Diagram can XMI file structure which consist of xml tag
be exported to XMI file. Figure 2 below shows the summary of
It can be seen that each ER Diagram component can be automatically detected complexity of program empirically by
represented into a XML document. A XML document is a running the program for inputs of different sizes, recording the
representation of a tree with a root, nodes, edge, and leaf. Each time to run and fitting a model to it. Based on these
component is mapped into element tag with value of attribute is measurements, a comprehensive report was generated for each
name of ER Diagram component. Therefore, this paper submitted program.
proposes Automatic Grading System framework for ER
Diagram which use XMI file as input. This approach is The most widespread approach currently used for automatic
expected to provide accurate, consistent assessment and assessment of programs was the evaluation of the number of
feedback for ER Diagram. test-cases they passed [3]-[6]. Unfortunately, this approach was
wrought with problems. Programs which pass a high number of
test-cases may not be efficient and may have been written with
II. RELATED WORK bad programming practices. On the other hand, programs that
Implementation of Automatic Grading system already did pass a low number of test-cases are many-a-times quite close to
by some researchers. As in the study [2] proposed automatic the correct solution; some unforced or inadvertent errors
grading system for computer program by using machine making them eventually fail the suite of test-cases designed for
learning. In this reference, research did some steps: Firstly, the the problem. Lastly, a score which is quantitatively defined as
system graded how close the program's logic to the correct the number of test-cases-passed completely disregards the
solution based on a rubric, using a novel machine learning requirement of the score to map to a human-intuitive rubric of
approach. This machine learning approach, based on highly- program quality.
informative features derived from abstract representations of a Another popular approach to the automated grading of
given program, is the subject of the present paper. Secondly, programs makes use of measuring the similarity between
the system provided a score on the programming practices used abstract representations (such as Control Flow Graphs and
in the program based on a rule-based system. Thirdly, it Program Dependence Graphs) of a candidate's program and
142
2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), Chiang Mai, Thailand
representations of correct implementations for the problem [7], optimal solution and need more effort to test with complex ER
[8], [9]. Although promising, the theoretical elegance is Diagram picture test case. Meanwhile, related studies for ER
damaged by the existence of multiple abstract representations diagram grading system with regard XMI file as input does not
for a correct solution to a given problem. Secondly, there is no exist yet. So that, there is a need to build automatic grading
underlying rubric that guides the similarity metric and neither system for ER Diagram using XMI file to obtain better and
are approaches to map the metric to a rubric discussed. Apart more accurate result.
from this, there have been publications on automatic correction
of small programs [10] and peer-based assessments of III. PROPOSED FRAMEWORK
programs [11], neither of which directly addresses the problem
of automatic grading of programs. This paper is a preliminary research where the process of
implementation and experiment still in progress. The proposed
Related to ER-Diagram, there are some papers did research approach to grade ER diagram can be divided into two
about automatic marking of ER-Diagrams with graph-based different frameworks: Assessment similarity ER Diagram
diagrams approach [12],[13]. These research proposed a using Tree Edit Distance and ER Diagram Grading System
diagrammatic reasoning which focused on imprecise diagrams. with Machine Learning approach.
There are five stages in this experiments translated a raster-
based image into a set of diagrammatic primitives such as A. Assessment similarity E-R Diagram
boxes, lines and text (segmentation and assimilation), uses
domain knowledge to identify minimal meaningful units The first proposed approach is using Tree Edit Algorithm to
(identification), combines MMUs into higher level, abstract calculate similarity of ER Diagram. ER Diagram from students
features (aggregation), and looks for meaning in a diagram and ER Diagram solution from lecturer must be exported into
(interpretation) [12], [13]. XMI file. These two XML are input for the algorithm. The
algorithm assesses similarity of ER Diagram based on XML
Some studies related to automatic grading system was still tag structure. The output of this approach is score or value of
a lot done for computer programming. Grading system for ER similarity and feedback about the students ER Diagram based
Diagram is still few and it focused based on the precise or on the solution. The detail of this approach can be seen in
imprecise diagram (box, line, etc.). It was based on image of Figure 3 below.
the ER Diagram. Therefore, this approach did not provide an
Students’s .xmi file Solution .xmi file
ER Diagram from Students
Exports
Inputs Inputs
Calculate similarity
with Tree Edit
Distance Algorithm
Results
1. Score Similarity
2. Feedback
Fig. 3. Assessment of similarity E-R Diagram using Tree Edit Distance Algorithm
Our approach of detecting similarity between two XMI files simpler data model for element in XMI document, which in
can be divided into two phases. In the first phase we have to turn represents a component in an ER Diagram. A side-effect is
detect the elements in the first XMI document that have a that the algorithm should be able to handle also other ER
corresponding element in the second one. Subsequently the Diagram models encoded in XMI. The data model is depicted
similarity between the two documents can be deduced and the in Figure 4. Basically it is a tree with typed elements that can
appropriate output can be created. be decorated with attributes. In addition to the tree-like
elements the data model might also have graph-like cross-
1) Data Model references.
To be independent of an actual existing meta model, e.g.
the complex ER Diagram meta model, we decided to design a
143
2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), Chiang Mai, Thailand
2) Tree Edit Distance Algorithm The formula to find minimal-cost sequence of node edit
The tree edit distance between ordered labeled trees is the operations that transforms one tree into another can be seen
minimal-cost sequence of node edit operations that transforms below.
one tree into another [14]. There are three operations in tree
edit distance algorithm:
- Delete a node and connect its children to its parent P variable is a cost to transform node in the tree. If node A
maintaining the order. in Tree 1 = node B in Tree 2, then P=0, because there is no
- Insert a node between an existing node and a node transformation. Otherwise, if node A in Tree 1 != node B
subsequence of consecutive children of this node. in Tree 2, then P=1.
- Rename the label of a node.
The algorithm for tree edit distance can be seen in algorithm
1 below.
function TreeEditDistance (Input m: integer , n:integer)(Output retval: integer)
Variable :
A,B : Tree; E : Array of Integer
Insert,delete,rename : integer
Algoritma :
{Tree A, B already defined}
For i ← 0 to m do
E[i,0] ← i
endFor;
For j ← 1 to n do
E[0,j] ← j
endFor;
for i ← 1 to m do
for j ← 1 to n do
insert = E[i-1,j] + 1
delete = E[i,j-1] + 1
rename = E[i-1,j-1]
if A[i] ≠ B[j]
rename += 1
E[i,j] ← min (insert, delete, rename)
Endfor
Endfor;
retval ← E[m,n]
return retval
Algorithm. 1. Edit Distance Algorithm
The output of this algorithm is editDist(A,B): minimum
number of insertion and deletion operations is needed to
transform one node in the tree to another tree.
4) Output of the algorithm
3) Similarity Function The result of the first approach, is a correspondence table
This approach grades student ER Diagram by calculating consisting of all matched element pairs. To represent the
similarity between students ER Diagram with ER Diagram similarity we create a unified document that contains all
solution from lecturer. Based on [15], maximum similarity elements of the two original documents, whereas the elements
between X and Y is reached when A and B are identical, no in the correspondence table are only contained once in the
matter how much commonality they share. The formula to unified document. The similarity of the two documents can be
calculate similarity between ER Diagram student and ER easily deduced:
Diagram solution with input editDist(A,B) from Tree Edit
Distance algorithm can be seen below 1. Structural difference (SD): Elements that have no
144
2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), Chiang Mai, Thailand
entry in the correspondence table are considered to be is all the differences components and what is student need to
structurally different. do to fix their ER Diagram according to the solution.
2. Attribute difference (AD): Corresponding elements
that differ in their attribute’s values get an attribute B. ER Diagram Grading System with Machine Learning
difference containing both, the old and the new value. approach
3. Relationship difference (RD): Corresponding The second approach for ER Diagram grading system is
elements which have different relationship and using machine learning algorithm. Machine learning is a set of
cardinality. methods that can automatically detect patterns in data, and then
4. Normalization difference (ND): The completeness of use the uncovered patterns to predict future data, or to perform
other kinds of decision making under uncertainty [16]. This
element based on normal form rule
approach has capability to learn from experience, training,
5. Similarity Score: The similarity value of the two analytical observation, and other means, results in a system that
documents. can continuously self-improve and thereby exhibit efficiency
Based on this information, the first approach will give and effectiveness. All steps of second approach can be seen in
feedback to the student about their ER Diagram. The feedback the Figure 5.
Machine Learning
TR1 EG_TR1
Training Data f1, f2, f3….fn (Ridge Regression,
TR2 Bayesian Network, EG_TR2 Expert Grading
(.xmi file) TR3 Support Vector EG_TR3
TR4 Machines, and
Random Forest)
EG_TR4
Feature Extraction/
Find feature value
T1 CG_T1
Test Data
T2
(.xmi file) Classifier CG_T2 Classifier Grading
T3
T4 CG_T3
CG_T4
Compare result
Expert Grading
The ER Diagram that exported into XMI file will be used - List of Entity: The main component of ER Diagram is
as training data. The system do feature extraction to find Entity. Entity depicts object that involved in the real world.
feature value from training data. All features value will be as The correct ER Diagram must have valid number of entity and
input to the Machine learning process. Moreover, machine name of entity. An entity can be strong entity or weak entity.
learning process also receive grade that given by the expert for
each training data. This grade will be used as an attribute class - List of Attribute Entity and Attribute Relationship: Attribute
in Machine Learning process. Based on training data and represent the characteristic of Entity or Relationship. An entity
Expert Grading, Machine learning process results classifier. As must have an identifier attribute called primary key.
a part of learning process, the classifier will be tested with - Relationship between Entities: One entity related to other
some test data in XMI file for grading ER Diagram. Test data entities called relationship. A relationship has cardinality.
can be student’s ER Diagram or other ER Diagrams from Cardinality notations define the attributes of the relationship
different sources. The Classifier grading will be compared to between the entities. Cardinalities can denote that an entity is
the Expert grading and update the rule in classifier to get the optional (for example, a sales rep could have no customers or
best result. The best classifier will be used to grade the ER could have many) or mandatory (for example, the must be at
Diagram automatically. least one product listed in an order). There are three main
cardinalities are one to one, one to many, many to many.
1) Features Extraction for Grading ER Diagram
Feature extraction is a process to extract some features - Normal form type: Last feature is Normalization.
value from ER Diagram by parsing value of XML tag element Normalization depicts the ER Diagram structure. This research
which store component of ER Diagram. Based on these only consider three normalization type are 1NF, 2NF, and 3NF.
features, rubric for ER Diagram score will be generated. The higher type has more entities.
Some features are extracted for ER Diagram grading are:
145
2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), Chiang Mai, Thailand
Rubric for giving score to ER Diagram based on feature to be implemented and tested with real and complex ER
extraction can be seen in the table 1 below. Diagram in some different test case.
TABLE I RUBRIC TO GRADE ER DIAGRAM
Score Interpretation V. FUTURE WORK
80 - 100 Completely correct and efficient
Perfect and Complete ER Diagram with correct component
This paper is a preliminary research to build automatic
(name and number) and using 3NF type. grading system of ER Diagram to support ER Diagram
60 - 79 Correct with silly errors learning in university. As a follow up, we are still performing
Complete ER Diagram with correct component of ER further experiment to implement these framework and test the
Diagram, but the relationship and cardinality is not perfect. framework with some test cases. We also need to investigate
Normalization type is 2NF the best machine learning algorithm to generate best classifier
40 – 59 Inconsistent for optimal grading system.
There are wrong entity, wrong attribute or wrong
relationship.
20 – 39 Emerging basic structures REFERENCES
Some entities, attributes and relationship appear. [1] N. Rishe, O. Wolfson, et.al, “Schema Based XML Comparison”,
0 - 19 Giberish Diagram EISWT-07 International Conference on Enterprise Information Systems
Diagram is not related to the problem at hand. and Web Technologies, 2007
[2] S. Srikant, V. Aggarwal, "A System to grade computer programming
skills using machine learning", Proceedings of the 20th ACM SIGKDD
This proves that with the appearance of each set of feature, international conference on Knowledge discovery and data mining, New
a human evaluator would provide a higher score – the absence York, 2014, pp. 1887-1896
of the entity would receive the lowest score; the presence of the [3] C. Douce, D. Livingstone, and J. Orwell, “Automatic test-based
right entity would be awarded some points. The presence of the assessment of programming: A review”, J. Educ. Resour. Comput., 5(3),
2005.
3NF would be awarded some more. The presence of a
relationship and cardinality would be awarded are relatively [4] M. Wick, D. Stevenson, P. Wagner, “Using testing and Junit across the
curriculum”, ACM SIDCSE Bulletin 37(1), 2005, pp. 236-240.
high score.
[5] Urs. Von Matt, “Kassandra: the automatic grading system”, SIGGUE
2) Machine Learning algorithm 22, 1994, pp. 26-40.
Machine learning is widely used in computer science and [6] M.Joy, N. Griffiths, R. Boyatt, “The BOSS online submission and
other field. Machine learning algorithms figure out how to assessment system”, Journal on Educational Resources in Computing
5(3), 2005.
perform important tasks by generalizing from examples. There
[7] K. Ala-Mukta, T. Uimonen, H.M. Jarvinen, “Supporting students in C++
are three components of machine learning algorithm: programming courses with automatic program style assessment”,
representation, evaluation, and optimization [17]. Journal of Information Technology Education 3, 2004, pp. 245-262.
Representation means a classifier must be represented in some [8] M. Vujosevic-Janicic, M. Nikolic, Dusan Tosic, and V. Kuncak,
formal language that the computer can handle. This classifier “Software verification and graph similarity for automated evaluation of
will be assessed by evaluation function (objective function or student’s assignments. Information and Software Technology”, 2012.
scoring function) to produce better classifiers. Finally, the [9] T. Wang, X. Su, Y. Wang et al, “Semantic similarity-based grading of
classifier is optimized to search among the classifiers in the student programs” Information and Software Technology, 49(2), pp. 99-
107.
language for the highest-scoring one. The choice of
[10] R. Singh, S. Gulwani, and A. Solar-Lezama, “Automated feedback
optimization technique is key to the efficiency of the learner, generation for introductory programming assignments”, Programming
and also helps determine the classifier produced if the Language Design and Implementation (PLDI), 2013.
evaluation function has more than one optimum. [11] J. Sitthiworachart, and M. Joy, “Web-based Peer Assessment in
Learning Computer Programming”, The 3rd IEEE International
There are two main types of machine learning, predictive or Conference on Advanced Learning Technologies: ICALT03, Athens,
supervised learning approach and descriptive or unsupervised Greece, 9-11 July 2003.
learning. The proposed framework in this paper focus on [12] P. Thomas, K. Waugh, and N. Smith, “Experiments in the automatic
supervised learning approach. Based on analysis, this approach marking of ER-Diagrams”, ITiCSE '05 Proceedings of the 10th annual
will implement four machine learning techniques: Ridge SIGCSE conference on Innovation and technology in computer science
Regression, Bayesian Network, Support Vector Machines, and education, New York, 2005, pp. 158-162.
Random Forest, combined with different feature selection [13] P. Thomas, K. Waugh, and N. Smith, “Computer Assisted Assessment
techniques. of Diagrams”, ITiCSE '07 Proceedings of the 12th annual SIGCSE
conference on Innovation and technology in computer science education,
New York, USA, 2007, pp. 68-72.
IV. CONCLUSION [14] J. Tekli, R. Chbeir, K. Yetongnon, “An overview on XML similarity:
background, current trends and future directions”, Elsevier Science,
The present work proposed two ways to build automatic 2002.
grading system of ER Diagram. Based on analysis, the [15] D. Lin, “An Information-Theoretic Definition of Similarity”, ICML '98
proposed framework can be implemented to generate optimal Proceedings of the Fifteenth International Conference on Machine
automatic grading system. XMI file make grading process Learning, San Francisco,USA, 1998, pp. 296-304.
easier than if we compare the imprecise diagram (picture). [16] K. P. Murphy, “Machine Learning: A Probabilistic Perspective”, The
Therefore, the proposed framework using XMI file as input and MIT Press Cambridge, Massachusetts London, England, August, 2012.
parse XML tag element for processing in similarity algorithm [17] P. Domingos, “A Few Useful Things to Know about Machine
and machine learning process. The proposed framework need Learning”, Communications of the ACM Magazine, 55(10), New York,
2014, pp. 78-87.
146