Chapter 04
Chapter 04
Chapter 4
Research Methodology Followed
The proposed methodology has been performed in phased manner comprising of following: a)
study of related literature, b) study of functional requirement of education and training followed
in academic institute, c) synthesizing and analyzing of educational dataset, d) algorithmic study
of data mining techniques used, e) implementation of mining techniques on dataset, extracting
and evaluating results generated and f) predicting the academic trends obtained using different
data mining techniques.
Study of Related Literature: For the proposed study, we have studied the literature which
includes Knowledge discovery in databases (KDD) processes, data mining methods, and
techniques. Apart from this, literature related to software and tools like WEKA, Rapid Miner,
SPSS, Hadoop and MatLab etc. have been studied.
49 | P a g e
RESEARCH METHODOLOGY FOLLOWED
1) The quality assurance of academic institutions has been predicted taking into
consideration the parameters which are categorized as: a) teaching skills, b) course
content and c) infrastructure. This quality assessment of an institution has been done
using Regression techniques of statistics which is the first approach of data mining
followed.
2) Then, the classification of the educational dataset is performed using Decision tree
classifiers. A decision tree classifier has been implemented in order to obtain the
following: a) prediction of performance of students in a particular class, b)
identification of students whose attendance is short and have performed poorly in
sessional, and d) calculating information gain, which is a metric that shows how well
one attribute classifies the training data.
50 | P a g e
RESEARCH METHODOLOGY FOLLOWED
educational dataset taking K-clusters, which helps instructors to achieve the following
results: a) identification of students who are short of attendance, b) performed poorly in
sessional, and c) cluster students who need special attention. Apart from it on, it is also
concluded, that on increasing the value of K, the accuracy becomes better and K-Means
find the better grouping of the data.
4) Educational data analysis related to clustering of students using neural network based
classification and clustering techniques have been performed. Neural networks are
basically a group of interconnected neurons which uses computational or mathematical
models to process information. A self-organizing map is a type of ANN (Artificial
Neural Network) which consists of neurons and each neuron is associated with a weight
vector of the same dimension as the input data vectors. It is an unsupervised neural
network algorithm which projects high-dimensional data onto a two-dimensional map.
In this technique, similar data items are mapped to nearby locations which help in
pattern recognition. Neural network based pattern recognition does the following: a)
classify inputs into a set of target categories, b) helps to select data, c) create and train a
network, d) evaluate its performance using cross-entropy and confusion matrices.
5) Another data mining technique which has been followed in research methodology is
Association rule mining. The knowledge has been extracted from a semi-synthesized
dataset specially created for this purpose for the students of engineering background.
Using ARM technique, preferable courses have been extracted from the dataset for
students to undergo industrial training. ARM is used to find associations between
frequently occurring variables. Association rules are generated based on the frequent
variables in datasets. Apriori is the algorithm used for mining of frequent patterns from
the transaction database. Through this methodology, rules have been discovered using
Apriori algorithm which helps instructors a) to find interest of students towards industry
oriented courses in an e-learning environment, b) to enhance the effectiveness of
academic planning/decision-making, c) to extract knowledge rules related to industry
demanding courses which needs to be introduced into syllabi.
6) Support Vector Machines are one of the supervised learning methods which have been
used in our proposed methodology for both regression and classification. Using this data
51 | P a g e
RESEARCH METHODOLOGY FOLLOWED
mining technique on another educational dataset, SVM classifiers have predicted the
placement of students based on parameters which are as follows: a) attendance, b) GPA,
c) reasoning skills, d) quantitative skills, e) communication skills, f) technical skills. It
is also concluded that, in many cases, students focus only on their regular curriculum
besides attaining those skills which are also necessary for the overall development of
student and their placements.
7) Further, in this direction, the Naive Bayes data mining technique has been used to
students. The same dataset and attributes have been used for experimentation purpose
which is being used for support vector machines. The knowledge extracted using this
technique has helped to obtain the following results: a) helps management authorities of
the institute to improve student placements, b) helps instructors to guide students to
focus on improving skills like aptitude, reasoning, and communication etc. apart from
technical skills to get placed.
8) In the proposed methodology, the next data mining technique, using which academic
data has been analyzed is K-Nearest Neighbor. Using this technique, the nearest
neighbor classes have been predicted for the attribute i.e. class performance. In K-
Nearest Neighbors technique, using K value, the nearest class for the upcoming group
of fresh students is determined which helps in: a) identifying group of those students
who are having good practical as well as good overall performance in the class, b)
strengthens the decision-making approach of instructors to monitor the capabilities of
the group, c) helps management of institute to adopt some new pedagogies to improve
student skills and placements, d) identifying those learners who are showing meager
performance in class, e) improving quality education. To get more accurate results, the
centroid value is increased in K-Means technique and followed nearest neighbor search
using distance metrics i.e. Minkowski, Chebychev, Euclidean Distance Vector etc. On
increasing the value of K-Nearest Neighbors, more accuracy in the prediction of each
class is obtained. The majority of the K nearest neighbors decides the class of any point.
52 | P a g e
RESEARCH METHODOLOGY FOLLOWED
MapReduce framework has been proposed. Hadoop distributed file system is used to
hold a large amount of data. The files are stored in a redundant fashion across multiple
machines which ensure their endurance to failure and parallel applications. Here, using
HDFS, tasks run over Map Reduce and output is obtained after aggregation of results.
The knowledge extracted using this technique has been implemented in order to obtain
the following results: a) guiding the students to choose and to focus on the right
course(s) based on their personal preferences, b) blending the concepts of data mining
and classification with those of big data, c) deriving right blend of courses for students
to pursue appropriate courses/trainings and to enhance their career prospects.
Machine learning is the need of the hour, as it is a fastest growing and revolutionary
part of the IT industry. In Machine learning, data analytics is done in a way that equips
coherent prototype building. Machine learning languages have inbuilt packages and
algorithms which emphasize, imbibe and train from data to find unknown observation
and meaningful information. The Python programming language is popular language of
machine learning because of following reasons: a) It is having a supportive multiplicity
and performance trade-off, b) is more perceptive than other languages, c) it consists of a
pattern of schema, has inbuilt libraries and packages which are very helpful in working
with machine learning systems, d) solves the complex set of machine learning tasks. But
in spite of all these powerful features, python programming language and its
contribution for educational data mining, analytics of educational data are still not
explored and utilized for improving the educational sector, learning analytics. In
proposed work, using Python, classification of educational dataset synthesized for
experimentation purpose has been performed by different classifiers and for that, a
validation dataset has been created, algorithms have been used to build the model and
finally, evaluation of data has been performed using these models. The best model
results are obtained and compared on the basis of their accuracy measures for
classifying the data. Apart from it, the results obtained have made the predictions as
follows: a) students overall performance in class, b) aptitude skills of class, c) students
attendance in class for a particular course.
53 | P a g e
RESEARCH METHODOLOGY FOLLOWED
Web-Based Data Mining Tools for Performing Feedback Analysis and Association Rule
Mining: As a part of the proposed methodology, web-based tools have been developed using
Asp.Net and php. Using Asp.Net, web-enabled association rule mining technique based tool
has been proposed which uses a SQL query mechanism for querying the discovered
knowledge in the form of association rules. The proposed web-based tool is helpful for
universities/institutions in providing students the appropriate guidance to opt for the right
course among the elective courses. This tool can be utilized a) to generate the combination of
elective courses mostly opted on the basis of feedback of students, b) to generate the
combination of elective courses best recommended on the basis of feedback from industry
experts, c) to help university/institute to adopt courses which are considered to be both
interesting and beneficial for students. Another tool has been developed in php with MySQL
for feedback analysis. The parameters of feedback are categorized as: a) teaching skills, b)
course content and c) infrastructure quality. The feedback is gathered from students/corporate
employees and through the proposed tool, results have been generated. The tool helps
management to obtain the following results: a) improving in-house training skills, b)
improving course content designed for trainings, c) improving pedagogies, d) improving
infrastructure quality.
4.2 Conclusion
This chapter contains the research methodology followed. The research methodology divided
into five phases has been described and discussion about these phases has been presented.
54 | P a g e