Project Final
Project Final
Project Final
Problem Statement:
This project deals with the analysis of learning achievements of students
of a University in various fields and maintains a database of their attendance,
academic and co-curricular activities. Nowadays, the results come out in the
form of CGPA. Hence, the performance of a student with respect to other
students i.e. where the student stands in the peer is not known. Also, the fields
in which the student is good and the ones in which the student has to put
more effort are not known. So, in this software, we work on the following
objectives:
Record the attendance, academic details and co-curricular
achievements of the students.
Analysis of the performance of a student on a relative basis.
Generating a detailed report which specifies about the
performance of the student in different fields.
To give suggestions to the student for undertaking new projects or in
undergoing trainings, etc.
Solution:
Assessment, as a dynamic process, produces data from which
reasonable conclusions are derived by stakeholders for decision making that
has impact on students' learning outcome. The data mining methodology,
while extracting useful and valid patterns from higher education database
environment, contribute to proactively ensuring students maximize their
academic output.
This project develops a web based software to deploy a simple student
performance assessment and monitoring system by the derivation of
performance prediction indicators within a teaching and learning
environment by mainly focusing on performance monitoring of students'
continuous assessments (tests), examination scores, involvement of students in
various projects and co-curricular activities in order to predict their final
achievement status upon graduation. Based on various Data Mining
Techniques (DMT), rules are derived that enable the classification of students
in their predicted classes. The deployment of the prototyped solution
integrates measuring, 'recycling' and reporting procedures in the new system
to optimize prediction accuracy.
Educational Data mining has large amount of data that has to be
organized in a consistent manner. To organize, analyze and classify the
students details, K-mean Clustering algorithm is used based on academic
and co-curricular records thereby forming three clusters based on students
performance as follows:
Low performance student
Average student
Smart student
The model is designed by collecting Students Personal and Academic
data from the senior students of the institution and thereby grouping the
students performance based on certain conditions as:
Best
Good
Average
Methodology:
There is a work methodology which governs a series of stages. The
methodology starts from the problem definition, then data collection from
questionnaire and Students Database. Attribute selection, Nominal
conversion, file conversion and WEKA tool implementation. Comparative
analysis of efficient classification algorithm is done to predict students
performance by creation of student model.
Data Collection and Preparation:
In this process, a questionnaire form is used to collect the real data
from the students that describe the relationship between learning behaviour
and their academic performance. The variables for judging the learning and
academic behaviour of students used in the questionnaire are the
Attendance, CGPA and Projects. These data are thereby recorded in excel
sheets for analysis.
Data Selection and Transformation:
From the available database, some of the information for the variables
is collected. From the data collected from Feedback forms and database
initially, attribute selection is done. In this step only those fields were selected
which were required for data mining. A few derived variables were selected
while some of the information for the variables was extracted from the
database. The process of attribute selection deals with selecting the most
appropriate attributes for classifying the data sets. By the analysis, attributes
of higher ranking are used for classifying the training dataset.
The attributes are:
CGPA
Attendance
Seminars
Paper Presentations
Projects
Internships
Technical Competitions
Technology Stack:
Client side
Server Side
The server side can be developed in such a way that it can work on
the Windows 7, Windows 8, Windows 8.1 and Windows 10 platforms. The back
end programming will be done with the Java programming language. The
PL/SQL or MS Access database will be accessed and modified by the SQL
queries used in the Java code with the help of Sun or Oracle JDBC Driver. For
complex querying, Weka is put to use alongwith Java and RStudio is also for
the working of Weka. The Spring Web Model-View-Controller (MVC) which is
designed around a dispatcher servlet can be put to use to dispatch requests
to handlers, to work with configurable handler mappings, to view resolution,
theme resolution, locale, time zone and as a support to upload files.
The front end i.e. graphical user interface on the client side that is
platform independent can be developed by using the HTML5 and JavaScript
and can be styled using the Cascading Style Sheets (CSS).
Use Cases:
Database Creation:
Updating and maintaining the database:
Viewing performance:
Functional Modules:
Database Management:
Collecting historic data of Alumni
Collecting data of current student community
Defining structure of data and identifying various attributes
Defining Constraints
Loading data
Removing redundancies and spurious tuples
Generating Data cubes depending on materialization using R studio
Analysis of Data:
Classification of data using k-mean clustering algorithm
Deriving patterns from various clusters
Generating reports stating the areas in which student should improve
his performance
Querying:
Presenting personal data of student on his request
Presenting overall performance of student including academics,
projects, presentation
Presenting the details about the performance of the student in a
particular category.
Allowing authorized Administrator and professors to update data.
Flow Chart:
Dependencies:
There are several algorithms that are to be used for the prediction of
students performance. Some of them and the reasons for using them are:
Clustering:
Clustering can be said as identification of similar classes of objects. By
using clustering techniques, we can further identify dense and sparse regions
in object space and can discover overall distribution pattern and correlations
among data attributes. It can also be used for effective means of
distinguishing groups or classes of object.
Decision Tree:
Decision tree models are easily understood because of their reasoning
process and can be directly converted into a set of IF-THEN rules
Neural Network:
It can detect all possible interactions between predictors variables. It
could also do a complete detection without having any doubt even in
complex nonlinear relationship between dependent and independent
variables.
Naive Bayes:
It analyzes each and every one of the attributes of the data to show
their importance and independency.
K - Nearest Neighbour:
This method takes very less time and classifies the students into slow,
good, average and excellent learners. It also gives good accuracy in
estimating the detailed pattern of learners progression.
Support Vector Machine:
This method is suitable for small datasets, has good generalization
ability, gives very high accuracy and is faster than all other methods.