Student Grade Prediction Python Full Document
Student Grade Prediction Python Full Document
INTRODUCTION
STUDENT GRADE PREDICTION USING C-4.5
ALGORITHM
ABSTRACT
A system is designed to predict the final grade of the students’ based on the grades
scored by him/her during his/her previous course and years. In order to predict the
grade of the student it needs some data to be analyzed and hence grade is
predicted. Input is students’ basic information and their previous academic
information using which students’ grade is predicted.
Here system will generate a report where he/she will get grade prediction
using C4.5 algorithm. This system can be used in schools, colleges and other
educational institutes.
CHAPTER-2
SYSTEM ANALYSIS
EXISTING SYSTEM
An ability of student performance is essential in education environment,
which is influenced by many qualitative attributes like Student Identity,
gender, age, Specialty, Lower class Grade, higher Class Grade, Extra
knowledge or skill, Resource, Attendance, Time spend to study, Class Test
Grade (Internal), Seminar Performance, Lab Work, Quiz, Over all previous
Semester exam marks are included for forming the data set.
The existing system can’t represent students’ performance in grade wise.
DIS-ADVANTAGES
PROPOSED SYSTEM
The proposed system can overcome all the limitations of the existing system, such
as students’ performance is represented in terms of grades. Here, we are going to
propose the system by using which the user will collect the data about the student.
In our project, we mainly consider the subject grades of a student obtained in
previous semesters.
ADVANTAGES
User friendly.
MODULES DESCRIPTION
Admin
Grade and SGPA prediction
Graph
ADMIN
Admin collects the each and individual data of the students based on the two
parameters internal marks and external marks.
Based on the data collected by the admin the grade and SGPA of the student is
predicted.
GRAPH
After the data is collected by the admin and the grade, SGPA is predicted. Based
on the grade the graph is generated.
ARCHITECTURE DIAGRAM
Predict student
SGPA
C4.5 DECISION TREE
C4.5 is an algorithm used to generate a decision tree developed by Ross
Quinlan. It is an extension of Quinlan's earlier ID3 algorithm. The decision
trees generated by C4.5 can be used for classification, and for this reason, it
is often referred to as a statistical classifier.
Handles both continuous and discrete attributes - In order to handle
continuous attributes, C4.5 creates a threshold and then splits the list into
those whose attribute value is above the threshold and those that are less
than or equal to it.
Handles training data with missing attribute values - C4.5 allows attribute
values to be marked as ? for missing. Missing attribute values are simply not
used in gain and entropy calculations.
CHAPTER-3
REQUIREMENT
PRELIMINARY INVESTIGATION
The first and foremost strategy for development of a project starts from the thought
of designing a mail enabled platform for a small firm in which it is easy and
convenient of sending and receiving messages, there is a search engine ,address
book and also including some entertaining games. When it is approved by the
organization and our project guide the first activity, ie. preliminary investigation
begins. The activity has three parts:
Request Clarification
Feasibility Study
Request Approval
REQUEST CLARIFICATION
After the approval of the request to the organization and project guide, with an
investigation being considered, the project request must be examined to determine
precisely what the system requires.
Here our project is basically meant for users within the company whose systems
can be interconnected by the Local Area Network(LAN). In today’s busy schedule
man need everything should be provided in a readymade manner. So taking into
consideration of the vastly use of the net in day to day life, the corresponding
development of the portal came into existence.
FEASIBILITY ANALYSIS
Operational Feasibility
Economic Feasibility
Technical Feasibility
Operational Feasibility
Operational Feasibility deals with the study of prospects of the system to be
developed. This system operationally eliminates all the tensions of the Admin and
helps him in effectively tracking the project progress. This kind of automation will
surely reduce the time and energy, which previously consumed in manual work.
Based on the study, the system is proved to be operationally feasible.
Economic Feasibility
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS
Processor - Core I 3
RAM - 4 GB (min)
Hard Disk - 320 GB
SOFTWARE REQUIREMENTS
1) PIP
2) Json
3) Chefboost
4) Pandas
5) Numpy
CHAPTER-4
DESIGN
Introduction to UML
UML is a method for describing the system architecture in detail using the blue
print. UML represents a collection of best engineering practice that has proven
successful in the modeling of large and complex systems. The UML is very
important parts of developing object oriented software and the software
development process. The UML uses mostly graphical notations to express the
design of software projects. Using the helps UML helps project teams
communicate explore potential designs and validate the architectural design of the
software.
Use Case Diagram
Use case diagram represents the functionality of the system. Use case focus on the
behavior of the system from external point of view. Actors are external entities that
interact with the system.
Use cases:
Actors:
An actor is a person, organization, or external system that plays a role in one or
more interactions with the system.
System boundary boxes (optional):
A rectangle is drawn around the use cases, called the system boundary box, to
indicate the scope of system. Anything within the box represents functionality that
is in scope and anything outside the box is not.
The “user model view” encompasses a problem and solution from the
preservative of those individuals whose problem the solution addresses. The view
presents the goals and objectives of the problem owners and their requirements of
the solution. This view is composed of “use case diagrams”. These diagrams
describe the functionality provided by a system to external integrators. These
diagrams contain actors, use cases, and their relationships.
Class Diagram
Class-based Modeling, or more commonly class-orientation, refers to the style of
object-oriented programming in which inheritance is achieved by defining classes
of objects; as opposed to the objects themselves (compare Prototype-based
programming).
The most popular and developed model of OOP is a class-based model, as opposed
to an object-based model. In this model, objects are entities that combine state (i.e.,
data), behavior (i.e., procedures, or methods) and identity (unique existence among
all other objects). The structure and behavior of an object are defined by a class,
which is a definition, or blueprint, of all objects of a specific type. An object must
be explicitly created based on a class and an object thus created is considered to be
an instance of that class. An object is similar to a structure, with the addition of
method pointers, member access control, and an implicit data member which
locates instances of the class (i.e. actual objects of that class) in the class hierarchy
(essential for runtime inheritance features).
Sequence Diagram
Objects calling methods on themselves use messages and add new activation
boxes on top of any others to indicate a further level of processing. When an object
is destroyed (removed from memory), an X is drawn on top of the lifeline, and the
dashed line ceases to be drawn below it (this is not the case in the first example
though). It should be the result of a message, either from the object itself, or
another.
COLLOBORATION DIAGRAM:
A Sequence diagram is dynamic, and, more importantly, is time ordered. A
Collaboration diagram is very similar to a Sequence diagram in the purpose it
achieves; in other words, it shows the dynamic interaction of the objects in a
system. A distinguishing feature of a Collaboration diagram is that it shows the
objects and their association with other objects in the system apart from how they
interact with each other. The association between objects is not represented in a
Sequence diagram.
Activity Diagram
Objects have behaviors and states. The state of an object depends on its current
activity or condition. A state chart diagram shows the possible states of the object
and the transitions that cause a change in state. A state diagram, also called a state
machine diagram or state chart diagram, is an illustration of the states an object can
attain as well as the transitions between those states in the Unified Modeling
Language. A state diagram resembles a flowchart in which the initial state is
represented by a large black dot and subsequent states are portrayed as boxes with
rounded corners.
login
There may be one or two horizontal lines through a box, dividing it into stacked
sections. In that case, the upper section contains the name of the state, the middle
section (if any) contains the state variables and the lower section contains the
actions performed in that state. If there are no horizontal lines through a box, only
the name of the state is written inside it. External straight lines, each with an arrow
at one end, connect various pairs of boxes. These lines define the transitions
between states. The final state is portrayed as a large black dot with a circle around
it. Historical states are denoted as circles with the letter H inside.
Component Diagram
COMPONENT LEVEL CLASS DESIGN
What is a Component?
This section defines the term component and discusses the differences
between object oriented, traditional, and process related views of component level
design. Object Management Group OMG UML defines a component as “… a
modular, deployable, and replaceable part of a system that encapsulates
implementation and exposes a set of interfaces.”
user database
Deployment Diagram:
Purpose:
UML is mainly designed to focus on software artifacts of a system. But these two
diagrams are special diagrams used to focus on software components and hardware
components.
So most of the UML diagrams are used to handle logical components but
deployment diagrams are made to focus on hardware topology of a system.
Deployment diagrams are used by the system engineers.
Performance
Scalability
Maintainability
Portability
Nodes
Monitor
Modem
Caching server
Server
So the following deployment diagram has been drawn considering all the points
mentioned above:
INPUT DESIGN
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and
those steps are necessary to put transaction data in to a usable form for processing
can be achieved by inspecting the computer to read data from a written or printed
document or it can occur by having people keying the data directly into the system.
The design of input focuses on controlling the amount of input required,
controlling the errors, avoiding delay, avoiding extra steps and keeping the process
simple. The input is designed in such a way so that it provides security and ease of
use with retaining the privacy. Input Design considered the following things:
OBJECTIVES
with the help of screens. Appropriate messages are provided as when needed
so that the user will not be in maize of instant. Thus the objective of input
design is to create an input layout that is easy to follow.
OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and presents
the information clearly. In any system results of processing are communicated to
the users and to other system through outputs. In output design it is determined
how the information is to be displaced for immediate need and also the hard copy
output. It is the most important and direct source information to the user. Efficient
and intelligent output design improves the system’s relationship to help user
decision-making.
Efficiency: Specifies how well the software utilizes scarce resources: CPU cycles,
disk space, memory, bandwidth etc. All of the above mentioned resources can be
effectively used by performing most of the validations at client side and reducing
the workload on server by using JSP instead of CGI which is being implemented
now.
Portability: Portability specifies the ease with which the software can be
installed on all necessary platforms, and the platforms on which it is expected to
run. By using appropriate server versions released for different platforms our
project can be easily operated on any operating system, hence can be said highly
portable.
Scalability: Software that is scalable has the ability to handle a wide variety of
system configuration sizes. The nonfunctional requirements should specify the
ways in which the system may be expected to scale up (by increasing hardware
capacity, adding machines etc.). Our system can be easily expandable. Any
additional requirements such as hardware or software which increase the
performance of the system can be easily added. An additional server would be
useful to speed up the application.
Integrity: Integrity requirements define the security attributes of the system,
restricting access to features or data to certain users and protecting the privacy of
data entered into the software. Certain features access must be disabled to normal
users such as adding the details of files, searching etc which is the sole
responsibility of the server. Access can be disabled by providing appropriate logins
to the users for only access.
Usability: Ease-of-use requirements address the factors that constitute the capacity
of the software to be understood, learned, and used by its intended users. Hyper
links will be provided for each and every service the system provides through
which navigation will be easier. A system that has high usability coefficient makes
the work of the user easier.
Python is an object-oriented language that allows users to manage and control data
structures or objects to create and run programs. Everything in Python is, in fact,
first class. All objects, data types, functions, methods, and classes take equal
position in Python. Programming languages are created to satisfy the needs of
programmers and users for an effective tool to develop applications that impact
lives, lifestyles, economy, and society. They help make lives better by increasing
productivity, enhancing communication, and improving efficiency. Languages die
and become obsolete when they fail to live up to expectations and are replaced and
superseded by languages that are more powerful. Python is a programming
language that has stood the test of time and has remained relevant across industries
and businesses and among programmers, and individual users. It is a living,
thriving, and highly useful language that is highly recommended as a first
programming language for those who want to dive into and experience
programming. Advantages of Using Python Here are reasons why you would
prefer to learn and use Python over other high-level languages:
Readability
Python programs use clear, simple, and concise instructions that are easy to read
even by those who have no substantial programming background. Programs
written in Python are, therefore, easier to maintain, debug, or enhance.
Higher productivity
Codes used in Python are considerably shorter, simpler, and less verbose than
other high-level programming languages such as Java and C++. In addition, it has
well-designed built-in features and standard library as well as access to third party
modules and source libraries.
Python is relatively easy to learn. Many find Python a good first language for
learning programming because it uses simple syntax and shorter codes. Python
works on Windows, Linux/UNIX, Mac OS X, other operating systems and small
form devices. It also runs on microcontrollers used in appliances, toys, remote
controls, embedded devices, and other similar devices.
To install Python, you must first download the installation package of your
preferred version from this link: https://www.python.org/downloads/ On this page,
you will be asked to choose between the two latest versions for Python 2 and 3:
Python 3.5.1 and Python 2.7.11. Alternatively, if you are looking for a specific
release, you can scroll down the page to find download links for earlier versions.
You would normally opt to download the latest version, which is Python 3.5.1.
This was released on December 7, 2015. However, you may opt for the latest
version of Python 2, 2.7.11. Your preferences will usually depend on which
version will be most usable for your project. While Python 3 is the present and
future of the language, issues such as third-party utility or compatibility may
require you to download Python 2.
3.4.3 PyCharm
PyCharm is the most popular IDE for Python, and includes great features such as
excellent code completion and inspection with advanced debugger and support for
web programming and various frameworks. PyCharm is created by Czech
company, Jet brains which focusses on creating integrated development
environment for various web development languages like JavaScript and PHP.
PyCharm offers some of the best features to its users and developers in the
following aspects
• Advanced debugging.
• Support for web programming and frameworks such as Django and Flask.
Features of PyCharm
Besides, a developer will find PyCharm comfortable to work with because of the
features mentioned below −
Code Completion
SQLAlchemy as Debugger
You can set a breakpoint, pause in the debugger and can see the SQL
representation of the user expression for SQL Language code.
When coding in Python, queries are normal for a developer. You can check the last
commit easily in PyCharm as it has the blue sections that can define the difference
between the last commit and the current one.
You can run .py files outside PyCharm Editor as well marking it as code coverage
details elsewhere in the project tree, in the summary section etc.
Package Management
All the installed packages are displayed with proper visual representation. This
includes list of installed packages and the ability to search and add new packages.
Local History
Local History is always keeping track of the changes in a way that complements
like Git. Local history in PyCharm gives complete details of what is needed to
rollback and what is to be added.
Refactoring
Refactoring is the process of renaming one or more files at a time and PyCharm
includes various shortcuts for a smooth refactoring process.
Perl.
Installation on Windows
Double-click the executable file which is downloaded; the following window will
open. Select Customize installation and proceed.
Pycharm
Besides, a developer will find PyCharm comfortable to work with because of the
features mentioned below −
Code Completion
SQLAlchemy as Debugger
You can set a breakpoint, pause in the debugger and can see the SQL
representation of the user expression for SQL Language code.
When coding in Python, queries are normal for a developer. You can check the
last commit easily in PyCharm as it has the blue sections that can define the
difference between the last commit and the current one.
Code Coverage in Editor
You can run .py files outside PyCharm Editor as well marking it as code coverage
details elsewhere in the project tree, in the summary section etc.
Package Management
All the installed packages are displayed with proper visual representation. This
includes list of installed packages and the ability to search and add new packages.
Local History
Local History is always keeping track of the changes in a way that complements
like Git. Local history in PyCharm gives complete details of what is needed to
rollback and what is to be added.
Refactoring
Refactoring is the process of renaming one or more files at a time and PyCharm
includes various shortcuts for a smooth refactoring process.
Jupyter notebook
The Jupyter Notebook is an open source web application that you can use to create
and share documents that contain live code, equations, visualizations, and text.
Jupyter Notebook is maintained by the people at Project Jupyter.
Jupyter Notebooks are a spin-off project from the IPython project, which used to
have an IPython Notebook project itself. The name, Jupyter, comes from the core
supported programming languages that it supports: Julia, Python, and R. Jupyter
ships with the IPython kernel, which allows you to write your programs in Python,
but there are currently over 100 other kernels that you can also use.
The Jupyter Notebook is not included with Python, so if you want to try it out, you
will need to install Jupyter.
There are many distributions of the Python language. This article will focus on just
two of them for the purposes of installing Jupyter Notebook. The most popular is
CPython, which is the reference version of Python that you can get from
their website. It is also assumed that you are using Python .
Open the executable file and check the add Python 3.x to PATH. Then click
the install now button. It will show the installation progress.
Now, the Python3.x is installed. Open the command prompt and type python
-V .
PIP
JSON
Python has a built-in package called json, which can be used to worwith JSON
data. JSON is a syntax for storing and exchanging data. It is text, written with Java
Script Object Notation. We can convert JSON string to Python and vice-versa.
Ex: json.dumps()
Chefboost
Chef boost is a lightweight gradient boosting, random forest enabled decision tree
framework including regular C4.5,ID3,CART and regression tree algorithms with
categorical features support. It is lines of code to build decision trees with Chef
boost.
Basically, we just need to pass the dataset as pandas data frame and tree
configurations after importing chef boost. We just need to put the target label to the
right. Besides, chef boost handles both numeric and nominal features and target
values in contrast to its alternatives.
Pandas
Pandas is a opensource library that allows to perform data manipulation in Python.
Pandas library is built on top of Numpy, meaning Pandas needs Numpy to operate.
Pandas provide an easy way to create, manipulate and wrangle the data. Panda is
also an elegant solution for time series data.
NumPy
NumPy is a Python package which stands for ‘Numerical Python’. It is the core
library for scientific computing, which contains a power n-dimensional array
objects. NumPy array can also be used as an efficient multi-dimensional container
for generic data.
1)Less memory
2)Fast
3)convenient
CHAPTER-7
CODE
CHAPTER-8
TESTING
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way
to check the functionality of components, subassemblies, assemblies and/or a
finished product It is the process of exercising software with the intent of ensuring
that the
Software system meets its requirements and user expectations and does not fail in
an unacceptable manner. There are various types of test. Each test type addresses a
specific testing requirement.
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid
outputs. All decision branches and internal code flow should be validated. It is the
testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that
relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined
inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to determine
if they actually run as one program. Testing is event driven and is more concerned
with the basic outcome of screens or fields. Integration tests demonstrate that
although the components were individually satisfaction, as shown by successfully
unit testing, the combination of components is correct and consistent. Integration
testing is specifically aimed at exposing the problems that arise from the
combination of components.
Functional test
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system
documentation, and user manuals.
System Test
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration oriented system integration test.
System testing is based on process descriptions and flows, emphasizing pre-driven
process links and integration points.
Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test phase of
the software lifecycle, although it is not uncommon for coding and unit testing to
be conducted as two distinct phases.
Features to be tested
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
SYSTEM TESTING
TESTING METHODOLOGIES
The following are the Testing Methodologies:
o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.
Unit Testing
During this testing, each module is tested individually and the module
interfaces are verified for the consistency with design specification. All important
processing path are tested for the expected results. All error handling paths are also
tested.
Integration Testing
Integration testing addresses the issues associated with the dual problems of
verification and program construction. After the software has been integrated a set
of high order tests areconducted. The main objective in this testing process is to
take unit tested modules and builds a program structure that has been dictated by
design.
2. Bottom-up Integration
This method begins the construction and testing with the modules at the
lowest level in the program structure. Since the modules are integrated from the
bottom up, processing required for modules subordinate to a given level is always
available and the need for stubs is eliminated. The bottom up integration strategy
may be implemented with the following steps:
The low-level modules are combined into clusters into clusters that
perform a specific Software sub-function.
A driver (i.e.) the control program for testing is written to coordinate test
case input and output.
The cluster is tested.
Drivers are removed and clusters are combined moving upward in the
program structure
The bottom up approaches tests each module individually and then each module is
module is integrated with a main module and tested for functionality.
User Acceptance of a system is the key factor for the success of any system.
The system under consideration is tested for user acceptance by constantly keeping
in touch with the prospective system users at the time of developing and making
changes wherever required. The system developed provides a friendly user
interface that can easily be understood even by a person who is new to the system.
Output Testing
After performing the validation testing, the next step is output testing of the
proposed system, since no system could be useful if it does not produce the
required output in the specified format. Asking the users about the format required
by them tests the outputs generated or displayed by the system under consideration.
Hence the output format is considered in 2 ways – one is on screen and another in
printed format.
Validation Checking
Text Field:
The text field can contain only the number of characters lesser than or equal
to its size. The text fields are alphanumeric in some tables and alphabetic in other
tables. Incorrect entry always flashes and error message.
Numeric Field:
The numeric field can contain only numbers from 0 to 9. An entry of any
character flashes an error messages. The individual modules are checked for
accuracy and what it has to perform. Each module is subjected to test run along
with sample data. The individually tested modules are integrated into a single
system. Testing involves executing the real data information is used in the
program the existence of any program defect is inferred from the output. The
testing should be planned so that all the requirements are individually tested.
A successful test is one that gives out the defects for the inappropriate data
and produces and output revealing the errors in the system.
Taking various kinds of test data does the above testing. Preparation of test
data plays a vital role in the system testing. After preparing the test data the system
under study is tested using that test data. While testing the system by using test
data errors are again uncovered and corrected by using above testing steps and
corrections are also noted for future use.
Live test data are those that are actually extracted from organization files.
key in a set of data from their normal activities. Then, the systems person uses this
analysts extract a set of live data from the files and have them entered themselves.
The most effective test programs use artificial test data generated by persons other
than those who wrote the programs. Often, an independent team of testers
formulates a testing plan, using the systems specifications.
The package “Virtual Private Network” has satisfied all the requirements specified
as per software requirement specification and was accepted.
USER TRAINING
MAINTAINENCE
This covers a wide range of activities including correcting code and design errors.
To reduce the need for maintenance in the long run, we have more accurately
defined the user’s requirements during the process of system development.
Depending on the requirements, this system has been developed to satisfy the
needs to the largest possible extent. With development in technology, it may be
possible to add many more features based on the requirements in future. The
coding and designing is simple and easy to understand which will make
maintenance easier.
TESTING STRATEGY :
A strategy for system testing integrates system test cases and design techniques
into a well-planned series of steps that results in the successful construction of
software. The testing strategy must co-operate test planning, test case design, test
execution, and the resultant data collection and evaluation .A strategy for software
testing must accommodate low-level tests that are necessary to verify that a
small source code segment has been correctly implemented as well as high
level tests that validate major system functions against user requirements.
SYSTEM TESTING:
Software once validated must be combined with other system elements (e.g.
Hardware, people, database). System testing verifies that all the elements are
proper and that overall system function performance is achieved. It also tests to
find discrepancies between the system and its original objective, current
specifications and system documentation.
UNIT TESTING:
In unit testing different are modules are tested against the specifications produced
during the design for the modules. Unit testing is essential for verification of the
code produced during the coding phase, and hence the goals to test the internal
logic of the modules. Using the detailed design description as a guide, important
Conrail paths are tested to uncover errors within the boundary of the modules. This
testing is carried out during the programming stage itself. In thistype of testing
step, each module was found to be working satisfactorily as regards to the expected
output from the module.
5. R.S.J.D Baker and K.Yacef, ―The State of Educational Data Mining in 2009: A
Review and Future Visions‖ , Journal of Educational Data Mining, 1, Vol 1, No 1,
2009.