0% found this document useful (0 votes)
5 views3 pages

Chapter 3

Uploaded by

Tobi Odedele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Chapter 3

Uploaded by

Tobi Odedele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Methodology

The proposed methodology used in this study for Analyzing student academic performance is
based on clustering algorithms which belong to the process of Data Mining. The stages in the
process include the following:

Data Mining Process

In present day’s educational system, a student’s performance is determined by the internal


assessment and end semester examination results . The end semester examination
is the mark obtained by the student at the end of semester examination. Each student has to get
minimum marks to pass a semester course from both internal and end semester examination.

Data Collection

The data set used in this research was obtained from Ladoke Akintola University of Technology
Ogbomoso OYO State in Western part of Nigeria, on the sampling method of Faculty of Computing
and Informatics Department of information Systems. Initially size of the data is 500. In this step,
data stored in different tables was joined in a single table, after the joining process, errors were
removed.

Data Preprocessing

Data pre-processing is an important step in Machine Learning because the quality of data and the
valuable knowledge that can be extracted from it directly affects our model's ability to learn thus, it
is critical that preprocess our data before feeding it into our model. Its aim is to transform raw data
into a format that mining algorithms can use. During this process, the following tasks are
completed.

1) DATA CLEANING
Data cleaning refers to the elimination of incomplete, missing or duplicate data. There are many
ways to fill in missing values for attributes, such as ignoring tuples, using a global constant to fill in
missing values, using the mean of attributes to fill in missing values, etc. Delete the grade records
of
the courses with more missing courses, and fill in the grade
records of the courses with fewer missing courses. This project follows the following principles:
Delete the score records with empty scores in more than two courses, and if there are still students
whose course scores are empty, fill it with the average value of the course. It is understood that a
course
with a score of 0 is a student’s absence from the exam, and the corresponding student’s score
record is deleted.
2) DATA INTEGRATION
In order to solve data redundancy, it is necessary to merge related courses. Since some courses are
divided into several
semesters, merging these courses and taking the average score of several semesters as the score
of the course is conducive to reducing the characteristics in the process of subsequent analysis.
3) DATA TRANSFORMATION
The original score data are presented in the form of a percentile system, with no difference of order
of magnitude, and no standardized operation is required. The K-means algorithm is only suitable for
processing numerical data.
When the data for analysis is combined into a table, in addition to setting the student number to
character type, the data type of each subject score is converted to numerical type, and decimal
places are set to 0.

The Data Mining Tools


The experimental tool used was R programming language,R is a powerful tool for data analysis,
Statistical modeling, and visualization. its extend libraries and packages together with graphical
user
interfaces for easy access to this functionality make it a popular choice among data scientists and
researchers .

Proposed Clustering Methodology


In recent years, the effectiveness of the use of clustering techniques in student perfor-
mance analysis studies has attracted the interest of many researchers. The clustering
technique refers to one method of grouping several similar objects into one cluster while
different objects into another. The clustering technique will be very useful if the labelled
information from students in the dataset is unknown. In addition, the division of large data
sets into small, logical clusters will make it easier for researchers to examine and explain
the meaning of the data.
K-means Algorithm. The research main choice is the k-means algorithm, a popular
clustering technique. This technique is popular because the way it is implemented is very
simple, and the results are also easy to understand. The k-means algorithm is a method for
grouping nearby objects into the k number of the centroid. The elbow method is a popular
way to figure out the best number of clusters. When given several clusters, k, this approach
calculates the total of the within-cluster variance, also known as inertia, and then shows
the variance curve concerning k. The best number of clusters could be the k value at the
curve’s initial turning point.
The alternative technique is to use silhouette plot analysis by calculating the coeffi-
cients for each data point to measure its similarity with its cluster as compared to other
clusters. The value of the silhouette coefficient is in the range [1,−1] where a high value
indicates that the object is well matched to its cluster.

The procedures of the


K-means algorithm:
1) Arbitrarily select k samples from n samples as the initial clustering centers, and the initial
clustering center is randomly determined.
2) Assign all other sample to the nearest clustering center.
3) Calculate the clustering center of each cluster, and Euclidean distance is used as the formula for
calculating distance

Clustering Model Evaluation


In the clustering analysis phase, the accuracy or quality of clustering results will
be determined and confirmed. It is an important measurement in determining which
algorithm achieved the best performance by using input data for the study. Clustering
evaluation is a stand-alone process and is not included during the clustering process. It is
always carried out after the final output of the clustering is produced. There are two
methods practiced in measuring the quality of clustering results: internal validation and
external validation.
Internal validation is the process of evaluating clustering that is compared to the
results of the clustering itself, namely the relationship between the structures of clusters
that have been formed. This is more realistic and efficient in solving problems involving
educational datasets with increasing daily sizes and dimensions.

You might also like