0% found this document useful (0 votes)
218 views5 pages

Association Rule Generation For Student Performance Analysis Using Apriori Algorithm

Uploaded by

Arun Mozhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views5 pages

Association Rule Generation For Student Performance Analysis Using Apriori Algorithm

Uploaded by

Arun Mozhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No.

1, March-April 2013

Association Rule Generation for


Student Performance Analysis using
Apriori Algorithm
D. Magdalene Delighta Angeline*
*Assistant Professor, Department of Computer Science and Engineering, Dr.G.U.Pope College of Engineering, Thoothukudi, Tamilnadu,
INDIA. E-Mail: [email protected]

Abstract—The objective of the educational institution that is producing good results in their academic exams
can be achieved by using the data mining techniques which can be applied to predict the performance of the
students and to impart the quality of education in the educational institutions. Data mining is used to extract
meaningful information and to develop relationships among variables stored in large data set. In this paper,
Apriori algorithm is used which extracts the set of rules, specific to each class and analyzes the given data to
classify the student based on their performance in academics. Students are classified based on their
involvement in doing assignment, internal assessment tests, attendance etc., which helps to predict the
performance of the student based on the pattern extracted from the educational database. This would help to
identify the average and below average students and to improve their performance to provide good results. This
analysis further helps matching organization„s requirement with students profile to provide placement for the
students. Also, the interestingness of a rule is measured using lift in itself and as a part in formulae. The range
of values that lift may take is used to normalize lift so that it is more effective as a measure of interestingness.
This standardization is extended to account for minimum support and confidence thresholds.

Keywords—Apriori, Association Rules, Data mining, Knowledge Discovery, Rule Filtering

Abbreviations—Cost-Sensitive Learning (CSL), Data Mining Extensions (DMX), Knowledge Discovery in


Databases (KDD), Left-Hand Side (LHS), Right-Hand Side (RHS)

database systems. Data mining is the application of efficient


I. INTRODUCTION algorithms to detect the desired patterns contained within the

E
given data.
DUCATION is an essential element for the
Association rules mining is one of the data mining
development of a country. Lack of knowledge in
technique which is expected to be very useful in applications.
higher educational system could prevent system
Association rules are required to assure a minimum support
management to achieve quality in education. Data mining
and a minimum confidence at the same time. Association rule
methodology can help associating this knowledge gaps in
generation consists of two steps: First, minimum support is
higher education system. A better student model yields better
applied to the given set of item. Second, using minimum
instruction, which leads to improved learning. More accurate
confidence and frequent itemsets rules are formed.
skill diagnosis leads to better prediction of what a student
Association Rules will allow to find out rules of the type: If A
knows which provides better assessment. Better assessment
then B where A and B can be particular items, values, words,
leads to more efficient learning overall. The main objectives
etc. An association rule is composed of two item sets:
of data mining in practice tend to be prediction and
1. Antecedent or Left-Hand Side (LHS)
description [Agrawal et al., 1994]. Predicting performance
2. Consequent or Right-Hand Side (RHS)
involves variables like attendance, IAT marks and assignment
It describes the relationship between support, confidence
grades etc. in the student database to predict the unknown
and interestingness. The support and confidence are usually
values. Data mining is the core process of knowledge
referred as interestingness measures of an association rule.
discovery in databases. It is the process of extracting of
Association rule mining is the process of finding all the
useful patterns from the large database. In order to analyze
association rules with the condition of minimum support and
large amount of information, the area of Knowledge
minimum confidence. Initially, the support and confidence
Discovery in Databases (KDD) provides techniques by which
values are computed for all the rules and it is then compared
the interesting patterns are extracted. Therefore, KDD utilizes
with the threshold values to prune with low value of support
methods at the cross point of machine learning, statistics and

ISSN: 2321 – 2381 © 2013 | Published by The Standard International Journals (The SIJ) 12
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 1, March-April 2013

or confidence Association rules mining was proposed by An algorithm for association rule induction is the Apriori
Agarwal. Many algorithms for generating association rules algorithm which proves to be the accepted data mining
were presented over time. Some of the popular known techniques in extracting association rules [Agrawal et al.,
algorithms are Apriori, Eclat and FP-Growth which is used to 1994], implemented the Apriori algorithm to mine single-
mine frequent itemsets. The mining exploits infrequent data, dimensional Boolean association rules from transactional
and high lowest support and high lowest confidence values. databases. The rules generated by Apriori algorithm makes it
Still, it always produces an enormous amount of rules. easier for the user to understand and further apply the result
This paper uses association rules to extract the student [Ma et al., 2000]. Employed the association rule method
performance pattern with Apriori algorithm which will be specifically Apriori algorithm to identifying novel,
helps the education institution to analyze the student unpredicted and exciting samples in hospital infection
performance and to improve it in order to provide good control. Another study by employed Apriori algorithm to
placement for the students. The teaching organization is generate the frequent item sets and designed the model for
responsible for producing better result and the placement of economic forecasting, presented their methods on modeling
students in the industry for the internship program. But it is and inferring user‟s intention via data. Association rules are
experiencing difficulty in analyzing the student‟s usually required to satisfy a user-specified minimum support
performance at the initial stage which could lead to a poor and a user-specified minimum confidence at the same time.
result in the institution. Hence, staff will face problems in In Kotsiantis et al., (2004), Naïve Bayes algorithm is
increasing the student result. On the other hand, some student used to predict the performance of the students and the
may find difficult to perform well in the examination. As a overall accuracy is found to be 72.48%. The relationship
result, this study is conducted to enhance the student result by between students university entrance examination results is
analyzing the pattern extracted from the association rules. studied using K-means clustering technique by which the
Similarly, this paper also studies the mining of success was studied [Erdogan & Timor, 2005]. The
association rules to extract the placement pattern which will secondary school student‟s performance is predicted by
be helpful for the industry in the placement. The educational analyzing the result with the data mining techniques like
institution also experiencing difficulty in matching Decision Trees, Random Forest, Neural Net-works and
organization„s requirement with students profile for several Support Vector Machines. The obtained results reveal that it
reasons. This situation could lead to a mismatched between is possible to achieve a high predictive accuracy [Cortez &
organization‟s requirement and students‟ background. Hence, Silva, 2008]. In Alaa el-Halees (2009), the data mining
students will face problems in giving good service to the technique is applied to discover association rules and the
company. On the other hand, companies as well could be discovered association rules are sorted according to the lift
facing difficulties in training the students and assigning them value. Then EM clustering technique is applied from which
with a project. The placement must be based on certain the outliers are detected and the performance is predicted.
criteria in order to best serve the organization and student. The recommender system techniques for educational data
For example, student who lives in Chennai should not be sent mining are used for predicting the performance of the
to an organization located in Bangalore. This is to avoid students. This technique mainly focuses on focus on reducing
problems in terms of accommodation, financial, and social. It the information overload and act as information filters [Thai-
has been decided that practicum students‟ should match the Nghe et al., 2010]. In Thai-Nghe et al., (2011, 2011A), a
organization‟s requirement. However, due to the large recommender system is used to predict the performance of
number of students registered every semester, matching the the student. The information of the individual students is used
organization with the students is a very tiresome process. to fore-casting his/her own performance. The class imbalance
in the data is solved using both resampling and cost-sensitive
learning (CSL) using support vector machines by which the
II. LITERATURE REVIEW
misclassification is reduced and the classification accuracy is
Data mining have been applied in various research works. improved [GB-Zadok et al., 2007; Thai-Nghe, 2010A].
One of the popular techniques used for mining data in KDD Association rule technique of data mining is used in
for pattern discovery is the association rule [Hipp et al., Magdalene Delighta Angeline & Samuel Peter James (2012)
2000]. According to Usama M. Fayyad & Gregory Piatetsky- and this paper extracts useful information from a large set of
Shapiro (1996), an association rule implies certain association data. Likewise, this technique is applied to students‟ data. In
relationships among a set of objects. It attracted a lot of Magdalene Delighta Angeline & Samuel Peter James (2012),
attention in current data mining research due to its capability the techniques mentioned above are used for matching the
of discovering useful patterns for decision support, selective organization with the students. This process is very
marketing, financial prediction, medical analysis and many demanding and involves a number of steps. In Magdalene
other applications. The association rules technique works by Delighta Angeline & Samuel Peter James (2012), the
finding all rules in a database that satisfies the determined association rule technique provides the extracted information.
minimum support and minimum confidence [Bing Liu et al., On the other hand, in this paper, the creation of Data Mining
1998]. Extensions (DMX) queries and their application to the
Association rule model result in acquiring specific

ISSN: 2321 – 2381 © 2013 | Published by The Standard International Journals (The SIJ) 13
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 1, March-April 2013

information depending on what the teacher wants to know Code, IAT grade, Gender, Address, City1, City2, Percentage,
about students‟ behavior patterns. Here the discussed Apriori Assignment mark Assignment submission, Correct Response,
mend algorithm generates rules by 92.86%. Self Confidence, Parental Education, Financial Lack, Interest
This study uses Apriori algorithm and this technique is and Degree aspiration. The data contains various types of
applied to the students‟ performance in the academics and values either string or numeric value. The target is
also in the placement. This technique is used to produce rules represented as analysis report. The analysis report was
with 100% confidence. grouped according to three categories (Good, Average and
Poor). The selected attributes are IAT grade, Assignment
submission, Assignment Grade, Correct Response, Self
III. APRIORI ALGORITHM
Confidence, Interest and Degree aspiration. The data were
Figure 1 gives the Apriori algorithm. The first pass of the then processed for generating rules.
algorithm simply counts item occurrences to determine the
4.2.2. Data Transformation
large 1-itemsets. A successive pass k contains two phases:
Initially, the large itemsets Lk-1 found in the (k-1)th pass are Transformation has been applied to attributes Correct
used to generate the candidate itemsets Ck. Then, the database Response, Assignment marks. The following rules are used to
is scanned and the support of candidates in Ck is counted. It is transform the assignment marks to string data.
necessary to determine the candidates in Ck for quick  If the Assignment mark = 9 Till 10 THEN Replace
counting that are contained in a given transaction t. Assignment Grade by A
Table 1 – Apriori Algorithm  If the Assignment mark = 7 Till 8 THEN Replace
L1= {frequent items}; Assignment Grade by B
for (k= 2; Lk-1 !=∅; k++) do begin  If the Assignment mark = 5 Till 6 THEN Replace
Ck= candidates generated from Lk-1 Assignment Grade by C
for each transaction t in database do  If the Assignment mark = 3 Till 4 THEN Replace
The count that are enclosed in t of all candidates in Assignment Grade by D
Ck is to be incremented  If the Assignment mark = 1 Till 2 THEN Replace
Lk = candidates in Ck with min_sup Assignment Grade by E
end Likewise, the following rules are used to transform the
return k Lk; percentage to string data
 If the Percentage = 81 Till 90 THEN Replace
IV. IMPLEMENTATION Percentage by S
 If the Percentage = 75 Till 80 THEN Replace
In an educational institution the overall performance of a Percentage by A
student is determined by internal assessment as well as  If the Percentage = 70 Till 74 THEN Replace
external assessment. Internal assessment is made on the bases Percentage by B
of a student‟s assignment marks, class tests, lab work,  If the Percentage = 65 Till 69 THEN Replace
attendance, previous semester grade and his/ her involvement Percentage by C
in extra curriculum activities. While at the same time external The data was then ready to be mined using association
assessment of a student based on marks scored in final exam. rules.
The proposed model helps to predict the students about poor,
average and good based on class performance as well as class 4.2.3. Rule Generation
attendance from the generated rules. The association rules using Apriori algorithm discussed in
4.1. Dataset section 3 was applied to generate rules.
The data set used in this study was obtained from department
of Computer Science, Dr.G.U.Pope College of Engineering in V. RESULT DISCUSSION
2011-12.
From the experiment result it is found that Apriori algorithm
4.2. Data Mining Process is used to obtain minimal rules. From the extracted pattern
The steps of data mining process are as follows: Apriori algorithm is found to be effective in predicting the
student under three categories: good, average and poor. The
4.2.1. Data Selection numbers of transactions used in this experiment are 21. The
parameters used for Apriori algorithm are minimum support,
The data have been generated by different reports of Internal
Assessment Test (IAT), Assignment, and Personal minimum confidence, and maximum rule length and lift
Counseling. The initial data contains the details gathered filtering [Toscher & Jahrer, 2010]. The importance of the rule
is measured using the lift value.
from a number of 21 students with 15 listed attributes which
include Register Number, Programme, Duration, Program

ISSN: 2321 – 2381 © 2013 | Published by The Standard International Journals (The SIJ) 14
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 1, March-April 2013

Table 2 – Apriori Parameters used in this System

Apriori Parameters

Support minimum 0.33

Confidence minimum 0.75

Max rule length 4

Lift filtering 1.1

Figure 2 – Counting Itemsets

Figure 3 – Rules Generated using Apriori Algorithm

ISSN: 2321 – 2381 © 2013 | Published by The Standard International Journals (The SIJ) 15
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 1, No. 1, March-April 2013

The output produced was evaluated in terms of accuracy. [3] Bing Liu, Wynne Hsu & Yiming Ma (1998), “Integrating
The accuracy of rules was attained according to the value of Classification and Association Rule Mining”, American
confidence value. The number of rules generated was 127 Association for Artificial Intelligence, Pp. 1–7.
[4] Y. Ma, Bing Liu, CK Wong, Philip S. Yu & SM Lee (2000),
with confidence 100% and support 38.095%. Since “Targeting the Right Students using Data Mining”,
confidence gets a value of 100 % the rule is an exact rule. Proceedings of International Conference on Knowledge
The running time for the application using apriori algorithm discovery and Data Mining, Boston, USA, Pp. 457–464.
is 15ms.The generated rules were found to be more accurate. [5] J. Hipp, U. Guntzer & N. Gholamreza (2000), “Algorithm for
From the generated rules the students was categorized into Association Rule Mining: A General Survey and
good, average and poor. The example of the patterns Comparison”, ACM SIGKDD Explorations Newsletter, Vol. 2,
No. 1, Pp. 58–64.
extracted from the rules is: [6] S. Kotsiantis, C. Pierrakeas & P. Pintelas (2004), “Prediction of
 If students percentage is between 80-95 THEN Student‟s Performance in Distance Learning using Machine
performance = „Good‟ Learning Techniques”, Applied Artificial Intelligence, Vol. 18,
 If students percentage is between 60-79 THEN No. 5, Pp. 411–426.
performance = „Average‟ [7] Erdogan & Timor (2005), “A Data Mining Application in a
Student Database”, Journal of Aeronautic and Space
 If students percentage is below 60 THEN Technologies, Vol. 2, No. 2, Pp. 53–57.
performance = „Poor‟ [8] GB-Zadok, A Hershkovitz, R Mintz & R Nachmias (2007),
With the help of the performance report (Good, Average “Examining Online Learning Processes based on Log Files
and Poor) of the student and the organizations criteria, the Analysis: A Case Study”, Research, Reflections and
placement for the student is provided. By analysing the Innovations in Integrating Ict in Education, Pp. 55–59.
[9] P. Cortez & A. Silva (2008), “Using Data Mining to Predict
performance report of the student, similar training can be
Secondary School Student Performance”, In EUROSIS, Pp. 5–
given. More training and concentration should be given to the 12.
poor student in order to make them pass in the examinations. [10] Alaa el-Halees (2009), “Mining Students Data to Analyze e-
Similar training and coaching should be given for the average Learning Behavior: A Case Study”,
student to perform better. https://uqu.edu.sa/files2/tiny_mce/plugins/filemanager/files/30/
papers/f158.pdf.
[11] A. Toscher & M.Jahrer (2010), “Collaborative Filtering
VI. CONCLUSION Applied to Educational Data Mining”, 16th ACM International
Conference on Knowledge Discovery and Data Mining, Pp. 1–
The system has been developed to analyze the discovered 11.
rules against user‟s knowledge. Discovered rules can be [12] N. Thai-Nghe (2010), “Recommender System for Predicting
pruned to remove redundant and insignificant rules. The Student Performance”, Proceedings of the 1st Workshop on
Recommender Systems for Technology Enhanced Learning,
scope of generated rules has been oriented to simplify the rule Vol. 1, Pp. 2811–2819.
set and to improve the performance. The Apriori algorithm [13] N. Thai-Nghe (2010A), “Cost-Sensitive Learning Methods for
relies on downward closure property to generate all frequent Imbalanced Data”, Proceedings of the IEEE International Joint
itemsets that has item support above minimum support and Conference on Neural Networks, Pp. 1–8.
generate confidence association rules that has confidence [14] N. Thai-Nghe (2011), “Factorization Techniques for Predicting
above the minimum confidence. The extracted rules helps to Student Performance”, Educational Recommender Systems and
Technologies: Practices and Challenges, Pp. 129–153.
predict the performance of the students and it identify the [15] N. Thai-Nghe (2011A), “Personalized Forecasting Student
average, below average and good students. The performance Performance”, Proceedings of the 11th IEEE International
report of the student also helps to improve the result of the Conference on Advanced Learning Technologies, Pp. 412–414.
student. This performance enhancement will also help the [16] D. Magdalene Delighta Angeline & I. Samuel Peter James
entire student to get placement in various industries according (2012), “Association Rule Generation using Apriori Mend
to the criteria. The educational institution gets benefitted with Algorithm for Student‟s Placement”, International Journal of
Emerging Sciences, Vol. 2, No. 1, Pp. 78–86.
the proposed system for their smooth and successful running
of the institution. The future work can be carried out with D. Magdalene Delighta Angeline is
some other data mining algorithm in terms of time factor. Assistant Professor in the Department of
Computer Science and Engineering in
Dr.G.U.Pope College of Engineering,
REFERENCES Sawyerpuram, Tamilnadu, India. She
obtained her Bachelor degree in Information
[1] R. Agrawal, Christos Faloutsos & Arun N. Swami (1994), Technology from Anna University, Chennai
“Efficient Similarity Search in Sequence Databases”, in the year 2007 and she obtained her Master
Proceedings of the 4th International Conference of degree in Computer and Information
Foundations of Data Organization and Algorithms, Pp. 69–84. Technology in Manonmaniam Sundaranar University, Tirunelveli.
[2] Usama M. Fayyad & Gregory Piatetsky-Shapiro (1996), She has over 5.7 years of Teaching Experience and published nine
“Advances in Knowledge Discovery and Data Mining”, papers in national conference, five papers in International
Editors: Usama M. Fayyad & Gregory Piatetsky-Shapiro, conferences and also published seven papers in various international
Cambridge, AAAI/MIT press, Pp. 1-625. journals. She also published three books. Her current area of
research includes Image Processing, Neural Networks, and Data
Mining.

ISSN: 2321 – 2381 © 2013 | Published by The Standard International Journals (The SIJ) 16

You might also like