0% found this document useful (0 votes)
109 views

Performance Analysis of Data Mining Classification Method Using Naïve Bayes Algorithm To Predict Student Graduation Timeliness

Graduation rate is one of the parameters of the effectiveness of educational institutions. The decrease in student graduation rate affects college accreditation. University database stores student administration and academic data,
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Performance Analysis of Data Mining Classification Method Using Naïve Bayes Algorithm To Predict Student Graduation Timeliness

Graduation rate is one of the parameters of the effectiveness of educational institutions. The decrease in student graduation rate affects college accreditation. University database stores student administration and academic data,
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Volume 5, Issue 12, December – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Performance Analysis of Data Mining Classification


Method Using Naïve Bayes Algorithm to Predict
Student Graduation Timeliness
1
Nurul Abdillah, 2Syaiful Zuhri Harahap, 3Ade Parlaungan Nasution
1 Health Information Management, STIKES Syedza Saintika 2 Informasi System, Faculty of Science & Technology, Labuhanbatu
University 3 Management, Faculty of Economics & Business, Labuhanbatu University

Abstract:- Graduation rate is one of the parameters of The Naïve Bayes method is a simple probabilistic
the effectiveness of educational institutions. The decrease helper that calculates a set of probabilities by calculating the
in student graduation rate affects college accreditation. frequency and combination of values in a given data set [3].
University database stores student administration and Thus the use of Data Mining method will provide the best
academic data, if explored appropriately using data accuracy results in data classifying. Previous research has
mining techniques, it can be known patterns or examined the performance comparison of several Data
knowledge to make decisions. The naive bayes algorithm Mining classification methods by comparing Decision Tree
aims to measure the level of accuracy to be applied in the and Naive Bayes algorithms. The study aims to predict
case of student graduation timeliness. The Naive Bayes which students drop out. From the results of accuracy
method is a classifier with probability and statistical testing using both, the highest accuracy is obtained in
methods to predict future opportunities based on past decision tree algorithms.
experience. This research uses student data of
Informatics Engineering Education program of Padang Research on the use of Decision Tree algorithms such
State University class of 2011. The variables used in this as J48, Naïve Bayes, Random Tree, and Decision Stump to
study were: NIM, name, gender, entry status, GPA, area identify students who are weak and likely to fail high exams.
of origin and employment status. Based on the test From the tests obtained that J48 algorithm is an algorithm
results by measuring the performance of the method, it that has the highest accuracy compared to the four algortima
is known that naive bayes has a good accuracy value of used [3].
93.48%. From the accuracy value can be concluded that
the algorithm naive bayes have a good performance in II. RESEARCH METHODS
predicting the timeliness of student graduation.

I. INTRODUCTION

Timely graduation is an important thing that needs to


be treated wisely by a college. Graduation rate is one of the
parameters of the effectiveness of educational institutions.
The decrease in student graduation rate will affect the
accreditation of universities. Therefore, it is necessary to
monitor and evaluate the tendency to graduate students on
time or not. The database of universities stores
administrative and academic data of students, such data if
explored appropriately using Data Mining techniques then
can know the pattern or science to make decisions.

The use of Data Mining classification method to


predict the timeliness of student graduation by using Naïve
Bayes algorithm can provide information on the accuracy of
student graduation timeliness. Data Mining is the process of
analyzing data and summarizing the results into useful
information. Technically, Data Mining is a process to find
correlations between many fields in large datasets[1]. Data
Mining has several methods, one of which is the
classification method which is a learning technique to Figure 3.1 Framework
classify data items into predetermined class labels.
Classification method has several algorithms one of them is
Naive Bayes.

IJISRT20DEC189 www.ijisrt.com 264


Volume 5, Issue 12, December – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Based on the framework in figure 3.1, each step can be British scientist Thomas Bayes, a simple probabilistic helper
described as follows: who calculates a set of probabilities by calculating the
1. Conduct a Field Survey Before starting the research first frequency and combination of values in a given data set.
conducted a survey in the field to get a qualitative picture This algorithm uses Bayes Theorem and assumes all
of the accuracy of student graduation at Padang State attributes to be independent given the class variable value
University. [5].
2. Identifying Problems The stage of problem identification
is the stage at which the research object formulates the Bayes' theorem is combined with Naive where it is
problem.. assumed the conditions between attributes are mutually free.
3. Conducting a Literature Study To achieve the objectives The classification of Naive Bayes assumed that there is or is
to be determined, it is necessary to study a literature not a particular tra feature of a class has nothing to do with
used. the characteristics of the other class. Naïve bayes is a
4. Collecting Data As for data collection is done in several simplification of the bayes method. Bayes' theorem is
ways, namely: simplified to:
a. Direct observation method
b. Interview method
c. Library study method
d. Browsing method Where:
5. Processing and Data Transformation At the Processing X : Data with unknown class
and Data Transformation stage, raw data will be H : X data hypothesis is a specific class.
converted and combined into the same format to be P(H| X) : Probability hypothesis H based on condition X (
processed into Data Mining. posteriori probability)
6. Implementing the Method After the analysis process, P(H) : Probability hypothesis H (prior probability)
then the next stage of testing is carried out. In testing P(X|H) : Probability X based on condition on H hypothesis
required computer hardware and software. At this stage P(X) : Probability of X
will be done implmentation method that has been
proposed before, will be tested using RapidMiner Naïve Bayesian Clasifier can be described as a cluster
Software method based on probability theory and Bayesian Theorem
7. Calculating Accuracy and Error At this stage will be assuming that each variable or decision-making parameter is
calculated accuracy and error values from the algorithm free (independence) being the existence of each variable has
Naive Bayes to evaluate the accuracy and error value of nothing to do with the existence of other attributes[6].
the measurement against the actual value or the value is
considered correct. The flow of the Naive Bayes method is as follows:
8. Making Results and Discussion Results and discussion 1. Calculates the chance value of a new case from each
aims to provide an overview and results obtained from hypothesis with an existing class (label) "P(XK| Ci)" .
this research. 2. Calculates the accumulated opportunity value of each
klas "P(X|Ci)"
III. LITERATURE 3. Calculates the value P(X|Ci) x P(Ci)
4. Specifies the class of the new case.
3.1 Classification
Classification is a process for finding a model or IV. ANALYSIS AND DESIGN
function that describes or distinguishes concepts or data
classes with the aim of estimating the class of an object 4.1 Data Mining Analysis Is a series of processes that
whose label is unknown. It can also be said to be a learning include the collection, use of data, historically to find
(classification) that maps an element (item) of data into one regularity, patterns or relationships in large data sets.
of several classes already defined [4]. 4.2 Data Collection In this study the data used is student
data of Padang State University Informatics and
Computer Engineering Education Study Program in
the class of 2011 and 2012. The data used amounted
to 46 records.
Classification is a technique by looking at the behavior 4.2.1 Variable Selection From student data, which is taken
and attributes of a defined group. This technique can as a variable the decision is to pass on time and late.
provide classification to new data by manipulating existing While taken as the determining variable in the
data that has been classified and by using the results to formation of decisions are gender, entry status, GPA,
provide a number of rules. These rules are used in new data area of origin and job starus.
for classified [4]. 4.2.2 Pre-Process After selecting a variable, the data format
will be transformed based on the selected variables.
3.2 Naïve Bayes 4.3 Classification Method The classification results
Naive Bayes Clasifier method is one of the algorithms obtained can provide information, about the accuracy
contained in classification techniques. Naive Bayes is a level and errors in the timeliness of graduation of
classifier of probability and statistical methods issued by the students of Padang State University. The use of Naïve

IJISRT20DEC189 www.ijisrt.com 265


Volume 5, Issue 12, December – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Bayes Algorithm is done with several stages to get 4.3 Accuracy and Error Rate of Naive Bayes Algorithm. In
the desired information. naive bayes accuracy calculation, obtained accuracy rate
4.3.1 Classification Process Using Naive Bayes Naive Bayes algorithm has an accuracy of 93.48%.

Table 4.2 Results of Probability Calculation Right and Table 4.3 Comparison Table of Accuracy and Error
Late Algorithms C4.5 and Naive Bayes
Probabilitas Algoritma Akurasi Error
No NIM Tepat Terlambat Prediksi Naive Bayes 93,48% 6,52 %
1 1102628 0,016 0 Tepat
2 1102631 0,003 0 Tepat In naive bayes error calculation, obtained error value
3 1102632 0,004 0 Tepat naive bayes algorithm has by 6.52 %.
4 1102638 0,001 0,003 Terlambat
5 1102644 0 0,014 Terlambat V. IMPLEMENTATION AND RESULTS
6 1102650 0,001 0,003 Terlambat
7 1102651 0,004 0 Tepat In Implementation and Results will be explained
8 1102656 0,004 0 Tepat Implementation or testing to find out the results of manual
9 1102663 0 0,014 Terlambat calculations with results using software supporting
algorithm Naïve Bayes. This aims to see whether the data
10 1102664 0 0,009 Terlambat
analyzed and processed is correct or not. The software used
11 1102668 0 0,014 Terlambat is Rapidminer Studio 7.5.3. Rapidminer Studio is an open
12 1102672 0,008 0 Tepat source Data Mining application. In the case of predicting the
13 1102675 0,008 0 Tepat timeliness of graduation of these students, the data to be
14 1102676 0 0,004 Terlambat used on Rapidminer amounted to 92 records.
15 1102678 0 0,004 Terlambat
16 1102687 0 0,01 Terlambat 5.1 Naive Bayes Algorithm Accuracy and Error Rates
17 1102688 0,01 0 Tepat
18 1102691 0 0,01 Terlambat a. Naive Bayes
19 1102692 0,01 0 Tepat Naive Bayes Accuracy Rate In naive bayes accuracy
20 1102696 0,014 0,018 Terlambat calculation obtained accuracy of 93.48% because it
produces 86 correctly classified data.
21 1102697 0,017 0,007 Tepat
22 1102698 0,012 0 Tepat
23 1102703 0,002 0,004 Terlambat
24 1102705 0 0,003 Terlambat
25 1102707 0,002 0,004 Terlambat
26 1106999 0 0,017 Terlambat
27 1107001 0 0,005 Terlambat
28 1107016 0,007 0 Tepat
29 1107017 0,002 0,004 Terlambat
30 1107025 0,008 0,012 Terlambat
31 1107033 0,003 0,006 Terlambat
32 1202175 0,012 0,001 Tepat Figure 5.1 Accuracy of Naive Bayes
33 1202183 0,012 0,001 Tepat
34 1202191 0,012 0 Tepat Naive Bayes Error Rate In Naive Bayes Error
35 1202196 0,002 0 Tepat calculation obtained accuracy of 6.52% because it produces
36 1202197 0,002 0 Tepat 6 incorrectly classified data.
37 1203244 0,015 0,002 Tepat
38 1203237 0,007 0,003 Tepat
39 1203238 0,003 0,001 Tepat
40 1203239 0,003 0,001 Tepat
41 1206507 0,016 0 Tepat
42 1206519 0,006 0 Tepat
43 1206520 0 0,005 Terlambat
44 1206522 0 0,042 Terlambat
45 1206538 0 0,008 Terlambat
46 1206545 0,017 0,007 Tepat
Figure 5.2 Naive Bayes Error

IJISRT20DEC189 www.ijisrt.com 266


Volume 5, Issue 12, December – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
VI. CONCLUSION [9]. P. Iwan, S. Z. Harahap, and A. A. Ritonga,
“RANCANG BANGUN TEMPAT SAMPAH
1. Measurement of kinejra accuracy of Naive Bayes OTOMATIS PADA UNIVERSITAS
classification method resulted in an accuracy value of LABUHANBATU,” INFORMATIKA, vol. 8, no. 2,
93.48%. pp. 1–5, 2020.
2. The error rate measurement in Naive Bayes algorithm [10]. S. Z. Harahap and S. Samsir, “Application Design The
results in an error rate of 6.52 %. Data Collection Features Of The Hotel Shades Of
3. From the tests that have been done, Naive Bayes Rantauprapat Using VBNET,” Int. J. Sci. Technol.
Algorithm has a good performance because C4.5 has a Manag., vol. 1, no. 1, pp. 1–6, 2020.
high accuracy value, the higher the accuracy value, the
more accurate the data classifying the closer to correct.
Naive Bayes algorithm also has a lower error value, the
lower the error value, the classifying the closer it is to
true.

REFERENCES

[1]. S. Z. Harahap and M. H. Dar, “APLIKASI DAN


PERANCANGAN SISTEM INFORMASI
PEMESANAN PADA UPI CONVENTION CENTER
DENGAN MENGGUNAKAN BAHASA
PEMROGRAMAN PHP DAN MYSQL,”
INFORMATIKA, vol. 6, no. 3, pp. 24–27, 2018.
[2]. M. H. Dar and S. Z. Harahap, “IMPLEMENTASI
SNORT INTRUSION DETECTION SYSTEM (IDS)
PADA SISTEM JARINGAN KOMPUTER,”
INFORMATIKA, vol. 6, no. 3, 2018.
[3]. M. Siddik and S. Z. Harahap, “APLIKASI
PENDUKUNG KEPUTUSAN PUPUK NON
SUBSISDI DENGAN METODE STRING
MATCHING (STUDI KASUS CV. FAMILY
GROUPS LABUHANBATU SELATAN),” U-NET J.
Tek. Inform., vol. 3, no. 3, pp. 12–17, 2019.
[4]. A. Nastuti and S. Z. Harahap, “Amelia Nastuti, Syaiful
Zuhri Harahap,” Tek. DATA Min. UNTUK
PENENTUAN PAKET HEMAT SEMBAKO DAN
KEBUTUHAN Hari. DENGAN MENGGUNAKAN
Algoritm. FP-GROWTH (STUDI KASUS DI
ULFAMART LUBUK ALUNG), vol. 7, no. 3, pp. 111–
119, 2019.
[5]. S. Samsir, D. Indra, G. Hts, and S. Z. Harahap, “SPK
Untuk Pemilihan Kepala Sekolah Menggunakan
Metode Saw dan Profile Matching,” U-NET J. Tek.
Inform., vol. 4, no. 1, pp. 7–12, 2020.
[6]. S. Samsir and S. Z. Harahap, “Application Design
Resume Medical By Using Microsoft Visual Basic .
Net 2010 At The Health Center Appointments,” Int. J.
Sci. Technol. Manag., vol. 1, no. 1, pp. 14–20, 2020.
[7]. R. Novita and S. Z. Harahap, “PENGEMBANGAN
MEDIA PEMBELAJARAN INTERAKTIF PADA
MATA PELAJARAN SISTEM KOMPUTER DI
SMK,” INFORMATIKA, vol. 8, no. 1, 2020.
[8]. M. Nasution, S. Pohan, and S. Z. Harahap,
“Implementasi Obrim ( Option-Based Risk
Management ) Sebagai Framework Investasi
Teknologi Informasi Perguruan Tinggi ( Studi Kasus :
Amik Labuhan Batu ),” INFORMATIKA, vol. 8, no. 1,
pp. 26–35, 2020.

IJISRT20DEC189 www.ijisrt.com 267

You might also like