Chapter 3
Chapter 3
The proposed methodology used in this study for Analyzing student academic performance is
based on clustering algorithms which belong to the process of Data Mining. The stages in the
process include the following:
Data Collection
The data set used in this research was obtained from Ladoke Akintola University of Technology
Ogbomoso OYO State in Western part of Nigeria, on the sampling method of Faculty of Computing
and Informatics Department of information Systems. Initially size of the data is 500. In this step,
data stored in different tables was joined in a single table, after the joining process, errors were
removed.
Data Preprocessing
Data pre-processing is an important step in Machine Learning because the quality of data and the
valuable knowledge that can be extracted from it directly affects our model's ability to learn thus, it
is critical that preprocess our data before feeding it into our model. Its aim is to transform raw data
into a format that mining algorithms can use. During this process, the following tasks are
completed.
1) DATA CLEANING
Data cleaning refers to the elimination of incomplete, missing or duplicate data. There are many
ways to fill in missing values for attributes, such as ignoring tuples, using a global constant to fill in
missing values, using the mean of attributes to fill in missing values, etc. Delete the grade records
of
the courses with more missing courses, and fill in the grade
records of the courses with fewer missing courses. This project follows the following principles:
Delete the score records with empty scores in more than two courses, and if there are still students
whose course scores are empty, fill it with the average value of the course. It is understood that a
course
with a score of 0 is a student’s absence from the exam, and the corresponding student’s score
record is deleted.
2) DATA INTEGRATION
In order to solve data redundancy, it is necessary to merge related courses. Since some courses are
divided into several
semesters, merging these courses and taking the average score of several semesters as the score
of the course is conducive to reducing the characteristics in the process of subsequent analysis.
3) DATA TRANSFORMATION
The original score data are presented in the form of a percentile system, with no difference of order
of magnitude, and no standardized operation is required. The K-means algorithm is only suitable for
processing numerical data.
When the data for analysis is combined into a table, in addition to setting the student number to
character type, the data type of each subject score is converted to numerical type, and decimal
places are set to 0.