Binder 1
Binder 1
A Project report submitted in fulfillment of the requirements for the award of the Degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
P. RENUKA (19NP5A0507)
A. SUPRIYA (18NP1A0551)
K. CHANDANA (18NP1A0578)
CH. DIVYANJALI (18NP1A0553)
CERTIFICATE
This is to certify that the dissertation entitled “CAMPUS PLACEMENT
PREDICTION USING SUPERVISED MACHINE LEARNING TECHNIQUES” is a
bonafide work done by P. RENUKA (19NP5A0507), A. SUPRIYA (18NP1A0551),
K. CHANDANA (18NP1A0578), CH.DIVYANJALI (18NP1A0553) under my guidance
and supervision and is submitted to JAWAHARLAL NEHRU TECHNOLOGICAL
UNIVERSITY, KAKINADA in fulfillment for the award of the Degree of “Bachelor of
Technology” in CSE is a record of confined work carried out by her under my guidance
and supervision during the academic year 2021-2022 and it has been found worthy of
acceptance according to the requirements of the university.
i
ACKNOWLDGEMENT
We wish to express our sincere thanks to various personalities who were responsible
for the successful completion of the main project.
We thank our chairman Sri. B. S. APPARAO and our secretary Sri B. SRI KRISHNA
for providing the necessary infrastructure required for our project.
We thank our principal Dr. CH. CHENCHAMMA for providing the necessary
infrastructure required for our project.
We are grateful to Dr. A.C. P. RANJANI, Head of the Department, for providing the
necessary facilities for completing the project in specified time.
We express our deep felt gratitude to Dr. A. C. P. RANJANI, with her valuable
guidance and unstinting encouragement enabled us to accomplish our project successfully in
time.
Our special thanks to Mrs. K. NAGA BHAVANI, librarian and the entire library staff
of Vijaya Institute of Technology for Women for providing the necessary library facilities.
We also express our gratitude to system administrator and other lab assistants for their
support in executing the project.
We express our earnest thanks to all other faculty members of CSE for extending their
helping hands and valuable suggestions when in need.
PROJECT MEMBERS:
P. RENUKA (19NP5A0507)
A. SUPRIYA (18NP1A0551)
K. CHANDANA (18NP1A0578)
CH. DIVYANJALI (18NP1A0553)
ii
DECLARATION
PROJECT MEMBERS:
P. RENUKA (19NP5A0507)
A. SUPRIYA (18NP1A0551)
K. CHANDANA (18NP1A0578)
CH. DIVYANJALI (18NP1A0553)
iii
ABSTRACT
In this study, the objective is to analyze previous year's student's data and use it to
predict the placement chance of the current students. This model is proposed with an algorithm
to predict the placement chance of students, and also suitable data pre-processing methods were
applied. This proposed model is also compared with other traditional classification algorithms
such as Decision tree, Support Vector Machine and Random Forest with respect to accuracy.
From the results obtained it is found that the proposed algorithm performs significantly better
in comparison with the other algorithms mentioned. For this, we trained each of the algorithms
with the data set that we acquired and tested it against some test data to find the accuracy of
the algorithms. For each algorithm, we can easily obtain the True Positive, True Negative, False
Positive and False Negative. With these four values, it was a matter of finding the accuracy
using the accuracy equation.
iv
TABLE OF CONTENTS
CERTIFICATE i
ACKNOWLDGEMENT ii
DECLARATION iii
ABSTRACT iv
TABLE OF CONTENT v-vi
LIST OF FIGURES vii
1. INTRODUCTION 1
2. LITERATURE SURVEY 3
3. SYSTEM ANALYSIS 5
3.1 Existing System 5
3.1.1 Limitations of Existing system 5
3.2 Proposed System 6
3.2.1 Advantages of Proposed System 6
3.3 Algorithms 6
3.4 Functional Requirements 7
3.5 Non-Functional Requirements 8
3.6 Software Requirements 10
3.7 Hardware Requirements 10
4. SYTEM STUDY 11
4.1 Economical Feasibility 14
4.2 Technical Feasibility 14
4.3 Social feasibility 15
5. SYSTEM DESIGN 16
5.1 System architecture 16
5.2 UML Diagrams 17
5.2.1 Use Case Diagram 18
5.2.2 Sequence Diagram 22
5.2.3 Collaboration Diagram 24
5.2.4 Class Diagram 25
v
6. SOFTWARE ENVIRONMENT 31
6.1 Python introduction 31
6.2 Anaconda 43
6.2.1 Overview 43
6.2.2 Anaconda Navigator 44
6.2.3 Anaconda Cloud 45
6.3 Jupyter Notebook 45
7. SYSTEM IMPLEMENTATIONS 49
7.1 MODULES 49
7.1.1 Data Gathering 49
7.1.2 Pre processing 49
7.1.3 Processing 50
7.1.4 Interpretation 50
7.1.5 Weak Student Analysis 51
7.1.6 Interface 51
7.2 CODING 52
8. TESTING 58
8.1 Testing Methodologies 58
8.2 Testing Activities 58
8.3 Types of Testing 59
8.3.1 Black Box Testing 59
8.3.2 White Box Testing 60
9. OUTPUT SCREENS 62
10. CONCLUSION 71
12. BIBLIOGRAPHY 73
vi
LIST OF FIGURES
Fig 2 Architecture
vii
CHAPTER 1
INTRODUCTION
Campus Placement Prediction Using Supervised Machine Learning Techniques
1. INTRODUCTION
Placements are considered to be very important for each and every college. The
basic success of the college is measured by the campus placement of the students. Every
student takes admission to the colleges by seeing the percentage of placements in the
college. Hence, in this regard the approach is about the prediction and analyses for the
placement necessity in the colleges that helps to build the colleges as well as students to
improve their placements. In Placement Prediction, system predicts the probability of an
undergraduate students getting placed in a company by applying classification algorithms
such as Decision tree, Support Vector Machine and Random Forest. The main objective
of this model is to predict whether the student he/she gets placed or not in campus
recruitment drives. For this the data consider is the academic history of student like
overall percentage, CGPA, Aptitude skills, communication, technical skills. The
algorithms are applied on the previous year’s data of the student.
The Training and the Placement activity in college is one of the important
activities in the life of student. Therefore, it is very important to make a process hassle
free so that, students would be able to get required information as and when they require.
Also, with the help of the good system it would be easy for staff of the Training and
Placement cell to update students easily and the work would be less. The “College
placement Prediction using Machine Learning” is developed to override the problems
prevailing in practicing manual system. This software is supported to eliminate and, in
some cases, reduce the hardships faced by the existing system. Moreover, this system is
designed for a need of company to carry out operations in smooth and effective manner.
Majority of the companies are focusing on campus recruitment to fill up their positions.
The companies identify talented and qualified professionals before they have completed
their education. This method is best way to work on a right resource at the right time to
get good companies at beginning of their career. Every organization, whether big or
small, has challenges to overcome and managing the information of placement, training,
placement cells, technical skill. Every training and placement management system has
different training needs; therefore, we design exclusive employee management system
that are adapted to your managerial requirements.
This is designed to assist in strategic planning and will help you ensure that your
organization is equipped with the right level of information and details of your future
goals. Also, for those busy executives who are always on the go, our systems come with
remote access features, which will allow you to manage your workforce anytime. These
systems will ultimately allow you to manage resources. Students studying in final year of
engineering focus on getting employed in reputed companies. The prediction of
placement status that B.E students are most likely to achieve will help students to put in a
harder work to make appropriate progress. It will also help Faculty as well as placement
cell in an institution to provide proper care towards improvement of students in a duration
of course. A high placement rate is the key entity in building the reputation of an
educational institution. It will also help the placement cell in an institute to provide proper
care towards improvement of students. This system has the significant place in the
educational system of any higher learning institution.
2. LITERATURE SURVEY
programming skills, communication skills, analytical skills and team work is considered
which is tested by companies during recruitment process. Data of past two batches are
taken for this system.
3. SYSTEM ANALYSIS
Academic performance is not only the parameter for judging the student, but it
requires other parameters Updating Records is another tedious task, hence sorting and
searching problems arises.
The placement officer has to find out the eligible students by looking at the excel
sheet. He/she has to see the marks of every student and their eligibility.
3.3 Algorithm
SVM: SVM stands for Support Vector Machine. It is also a supervised machine learning
algorithm that can be used for both classification and regression problems. However, it is
mostly used for classification problems. A point in the n-dimensional space is a data item
where the value of each feature is the value of a particular coordinate. Here, n is the
number of features you have. After plotting the data item, we perform classification by
finding the hyper-plane that differentiates the two classes very well. Now the problem lies
in finding which hyper-plane to be chosen such that it is the right one. Scikit-learn is a
library in Python which can be used to implement various machine learning algorithms
and SVM too can be used using the scikit-learn library.
RFA: We have a plethora of classification algorithms at our disposal, including, but not
limited to, SVM, Logistic regression, decision trees and Naive Bayes classifier, just to
name a few. But, in the hierarchy of classifiers, the Random Forest Classifier sits near the
top. The random forest classifier is a group of individual decision trees and so, we shall
look into how decision trees work. in a random forest, each individual tree with different
properties and classification rules would try to find an appropriate class label for the
problem. Each tree would give out its own answer. A voting is done within the random
forest to see which class label received the most votes. The class label with the most votes
would be considered the final class label for the problem. This provides a more accurate
model for class label prediction. It can balance errors in data sets where classes are
imbalanced. Large data sets with higher dimensionality can be handled. It can handle
thousands of input variables and could identify the most significant variables and as such,
it is a good dimensionality reduction method
Decision Tree Classifier: A Decision Tree is a classifier that exemplifies the use of
tree-like structure. It gains knowledge on classification. The decision node or non-
leaf node indicates certain test. The outcomes of these tests are signified either of the
branches of that decision node. Each target class is denoted as a leaf node of DT.
Starting from the beginning of the corresponding nodes of the tree is traversed
through the tree until a leaf node is reached. In this way classification result from a
decision tree is obtained.
other specific functionality of the project is to provide the information to the user.
The following are the Functional requirements of our system:
1. We are providing one query then we will get efficient result.
2. The search of query is based on major intention of user.
3. We are having the effective ranking methodology.
4. A novel framework to exploit the user’s social activities for personalized image search,
such as annotations and the participation of interest groups.
1. On Demand Service
2. Pay-As-Use
3. VM Pricing Model
4. Elasticity 5. Flexibility
The pay-as you-use, which contains two meanings. First, according to the
customer resource demand such as CPU, memory, etc., the physical machines are
dynamically segmented using virtualization technologies and provided to customers in the
form of virtual machines (VMs), and customers pay according to the amount of resources
they actually consumed. Second, the VMs can be dynamically allocated and deal located
at any time, and customers should pay based on how long the resources are actually used.
our system must be available always. If there are any cases of updations they must be
performed in a short interval of time without interrupting the normal services made
available to the users.
Efficiency: Specifies how well the software utilizes scarce resources: CPU cycles, disk
space, memory, bandwidth etc. All of the above mentioned resources can be effectively
used by performing most of the validations at client side and reducing the workload on
server by using JSP instead of CGI which is being implemented now.
Portability: Portability specifies the ease with which the software can be installed on all
necessary platforms, and the platforms on which it is expected to run. By using
appropriate server versions released for different platforms our project can be easily
operated on any operating system, hence can be said highly portable.
Scalability: Software that is scalable has the ability to handle a wide variety of system
configuration sizes. The nonfunctional requirements should specify the ways in which the
system may be expected to scale up (by increasing hardware capacity, adding machines
etc.). Our system can be easily expandable. Any additional requirements such as
hardware or software which increase the performance of the system can be easily added.
An additional server would be useful to speed up the application.
Integrity: Integrity requirements define the security attributes of the system, restricting
access to features or data to certain users and protecting the privacy of data entered into
the software. Certain features access must be disabled to normal users such as adding the
details of files, searching etc. which is the sole responsibility of the server. Access can be
disabled by providing appropriate logins to the users for only access.
Usability: Ease-of-use requirements address the factors that constitute the capacity of the
software to be understood, learned, and used by its intended users.Hyperlinks will be
provided for each and every service the system provides through which navigation will be
easier. A system that has high usability coefficient makes the work of the user easier.
Making the application form filling process through online and providing the
invigilation list information and examination hall list is given high priority compared to
other services and can be identified as the critical aspect of the system
1. Our system introduced user specific search performance.
2. The query related search is effective, it provides within short period results, so the
speed of system is very high.
The ranking optimization scheme is available for personalized image search system.
4. SYSTEM STUDY
What Is SDLC?
A software cycle deals with various parts and phases from planning to testing and
deploying software. All these activities are carried out in different ways, as per the needs.
Each way is known as a Software Development Lifecycle Model (SDLC). A software life
cycle model is either a descriptive or prescriptive
Spiral Model- We start from smaller module and keeps on building it like a spiral. It is
also called Component based development.
SDLC Methodology
AGILE MODEL:
Agile SDLC model is a combination of iterative and incremental process models
with focus on process adaptability and customer satisfaction by rapid delivery of working
software product. Agile Methods break the product into small incremental builds. These
builds are provided in iterations. Each iteration typically lasts from about one to three −
weeks. Every iteration involves cross functional teams working simultaneously on various
areas like
Planning
Requirements Analysis
Design
Coding
Unit Testing and
Acceptance Testing.
Feasibility Study
The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out.
This is to ensure that the proposed system is not a burden to the company. For
feasibility analysis, some understanding of the major requirements for the system is
essential.
5. SYSTEM DESIGN
System Design
Use-oriented techniques are widely used in software requirement analysis and
design.
Use cases and usage scenarios facilitate system understanding and provide a
common language for communication. This paper presents a scenario-based modeling
technique and discusses its applications. In this model, scenarios are organized
hierarchically and they capture the system functionality at various abstraction levels
including scenario groups, scenarios, and sub-scenarios. Combining scenarios or sub-
scenarios can form complex scenarios. Data are also separately identified, organized, and
attached to scenarios. This scenario model can be used to cross check with the UML
model. It can also direct systematic scenario-based testing including test case generation,
test coverage analysis with respect to requirements, and functional regression testing.
5.1 ARCHITECTURE:
The goal is for UML to become a regular dialect for making fashions of item
arranged PC programming. In its gift frame UML is contained two noteworthy
components: a Meta-show and documentation. Later on, a few type of method or system
can also likewise be brought to; or related with, UML.
The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted. Interaction
among actors is not shown on the use case diagram. If this interaction is essential to a
coherent description of the desired behavior, perhaps the system or use case boundaries
should be re-examined. Alternatively, interaction among actors can be part of the
assumptions used in the use case.
The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted.
Interaction among actors is not shown on the use case diagram. If this interaction
is essential to a coherent description of the desired behavior, perhaps the system or use
case boundaries should be re-examined. Alternatively, interaction among actors can be
part of the assumptions used in the use case.
Use cases:
A use case describes a sequence of actions that provide something of measurable
value to an actor and is drawn as a horizontal ellipse.
Actors:
An actor is a person, organization, or external system that plays a role in one or
more interactions with the system.
Include:
In one form of interaction, a given use case may include another. "Include is a
Directed Relationship between two use cases, implying that the behavior of the included
use case is inserted into the behavior of the including use case.
The first use case often depends on the outcome of the included use case. This is
useful for extracting truly common behaviors from multiple use cases into a single
description. The notation is a dashed arrow from the including to the included use case,
with the label "«include»". There are no parameters or return values. To specify the
location in a flow of events in which the base use case includes the behavior of another,
you simply write include followed by the name of use case you want to include, as in the
following flow for track order.
Extend:
In another form of interaction, a given use case (the extension) may extend
another. This relationship indicates that the behavior of the extension use case may be
inserted in the extended use case under some conditions. The notation is a dashed arrow
from the extension to the extended use case, with the label "«extend»". Modelers use the
«extend» relationship to indicate use cases that are "optional" to the base use case.
Generalization:
In the third form of relationship among use cases, a generalization/specialization
relationship exists. A given use case may have common behaviors, requirements,
constraints, and assumptions with a more general use case. In this case, describe them
once, and deal with it in the same way, describing any differences in the specialized
cases. The notation is a solid line ending in a hollow triangle drawn from the specialized
to the more general use case (following the standard generalization notation).
Associations:
Associations between actors and use cases are indicated in use case diagrams by
solid lines. An association exists whenever an actor is involved with an interaction
described by a use case. Associations are modeled as lines connecting use cases and
actors to one another, with an optional arrowhead on one end of the line. The arrowhead
is often used to indicating the direction of the initial invocation of the relationship or to
indicate the primary actor within the use case.
SEQUENCE DIAGRAMS:
A sequence diagram in Unified Modeling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in what order.
It is a construct of a Message Sequence Chart.
Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams. A sequence diagram shows, as parallel vertical lines (lifelines), different
processes or objects that live simultaneously, and, as horizontal arrows, the messages
exchanged between them, in the order in which they occur. This allows the specification
of simple runtime scenarios in a graphical manner. If the lifeline is that of an object, it
demonstrates a role. In order to display interaction, messages are used. These are
horizontal arrows with the message name written above them. Solid arrows with full
heads are synchronous calls, solid arrows with stick heads are asynchronous calls and
dashed arrows with stick heads are return messages. This definition is true as of UML 2,
considerably different from UML 1.x.
Objects calling methods on themselves use messages and add new activation
boxes on top of any others to indicate a further level of processing. When an object is
destroyed (removed from memory), an X is drawn on top of the lifeline, and the dashed
line ceases to be drawn below it (this is not the case in the first example though). It should
be the result of a message, either from the object itself, or another.
feature of a Collaboration diagram is that it shows the objects and their association with
other objects in the system apart from how they interact with each other. The association
between objects is not represented in a Sequence diagram.
Model, objects are entities that combine state (i.e., data), behavior (i.e.,
procedures, or methods) and identity (unique existence among all other objects). The
structure and behavior of an object are defined by a class, which is a definition, or
blueprint, of all objects of a specific type. An object must be explicitly created based on a
class and an object thus created is considered to be an instance of that class. An object is
similar to a structure, with the addition of method pointers, member access control, and
an implicit data member which locates instances of the class (i.e. actual objects of that
class) in the class hierarchy (essential for runtime inheritance features)
The class diagram is the main building block in the object oriented modeling. It is
used both for general conceptual modeling of the semantics of the application, and for
detailed modeling translating the models into programming code. The classes in a class
diagram represent both the main objects and or interactions in the application and the
objects to be programmed. In the class diagram these classes are represented with boxes
which contain the two parts:
● The upper part holds the name of the class.
● The middle part contains the attributes of the class.
● The lower part contains the operations of the class.
ACTIVITY DIAGRAM:
Activity diagram is another important diagram in UML to describe the dynamic
aspects of the system. Activity diagram is basically a flowchart to represent the flow from
one activity to another activity. The activity can be described as an operation of the
system.
Arrows run from the start towards the end and represent the order in which
activities happen. However, the join and split symbols in activity diagrams only resolve
this for simple cases; the meaning of the model is not clear when they are arbitrarily
combined with the decisions or loops.
6. SOFTWARE ENVIRONMENT
6.1 Python
Python is a general purpose, dynamic, high level and interpreted programming
language. It supports Object Oriented programming approach to develop applications. It
is simple and easy to learn and provides lots of high-level data structures.
Python is an easy to learn yet powerful and versatile scripting language which
makes it attractive for Application Development.
Python’s syntax and dynamic typing with its interpreted nature, makes it an ideal
language for scripting and rapid application development. Python supports multiple
programming pattern, including object oriented, imperative and functional or procedural
programming styles.
Python is not intended to work on special area such as web programming. That is
why it is known as multipurpose because it can be used with web, enterprise, 3D CAD
etc. We don't need to use data types to declare variable because it is dynamically typed so
we can write a=10 to assign an integer value in an integer variable.
Features of Python
Easy to learn and use-
Python is easy to learn and use. It is developer-friendly and high level
programming language.
Expressive Language –
Python language is more expressive means that it is more understandable and
readable.
Interpreted Language –
Python is an interpreted language i.e., interpreter executes the code line by line at
a time. This makes debugging easy and thus suitable for beginners.
Cross-platform language –
Python can run equally on different platforms such as Windows, Linux, Unix and
Macintosh etc. So, we can say that Python is a portable language.
Extensible-
It implies that other languages such as C/C++ can be used to compile the code and
thus it can be used further in our python code.
Integrated-
It can be easily integrated with languages like C, C++, JAVA etc.
Python Application-
Python is known for its general purpose nature that makes it applicable in almost
each domain of software development. Python as a whole can be used in any sphere of
development.
Double-click the executable file which is downloaded; the following window will
open. Select Customize installation and proceed.
The following window shows all the optional features. All the features need to be
installed and are checked by default; we need to click next to continue.
The following window shows a list of advanced options. Check all the options
which you want to install and click next. Here, we must notice that the first check-box
(install for all users) must be checked.
Now, try to run python on the command prompt. Type the command python in case of
python2 or python3 in case of python3. It will show an error as given in the below image.
It is because we haven't set the path.
To set the path of python, we need to the right click on "my computer" and go to
Properties → Advanced → Environment Variables.
Type PATH as the variable name and set the path to the installation directory of
the python shown in the below image.
Now, the path is set, we are ready to run python on our local system. Restart
CMD, and type python again. It will open the python interpreter shell where we can
execute the python statements
Web Application-
We can use Python to develop web applications. It provides libraries to handle
internet protocols such as HTML and XML, JSON, Email processing, request,
beautifulSoup, Feedparser etc. It also provides Frameworks such as Django, Pyramid,
Flask etc to design and develop web based applications. Some important developments
are: PythonWikiEngines, Pocoo, PythonBlogSoftware etc.
Desktop Appication-
Python provides Tk GUI library to develop user interface in python based
application. Some other useful toolkits wxWidgets, Kivy, pyqt that are usable on several
platforms. The Kivy is popular for writing multitouch applications.
Software Development-
Python is helpful for software development process. It works as a support
language and can be used for build control and management, testing etc.
Bussiness Application-
Python is used to build Business applications like ERP and e-commerce systems.
Tryton is a high level application platform.
3D CAD Applications-
To create CAD application Fandango is a real application which provides full
features of CAD.
Enterprise Applications-
Python can be used to create applications which can be used within an Enterprise
or an Organization. Some real time applications are: OpenErp, Tryton, Picalo etc
6.2.1 Overview
Anaconda distribution comes with more than 1,500 packages as well as the conda
package and virtual environment manager. It also includes a GUI, Anaconda Navigator, as
a graphical alternative to the command line interface (CLI).
The big difference between conda and the pip package manager is in how package
dependencies are managed, which is a significant challenge for Python data science and
the reason conda exists.
Custom packages can be made using the conda build command, and can be shared
withothers by uploading them to Anaconda Cloud,[9] PyPI or other repositories.
You can build new packages using the Anaconda Client command line interface
(CLI), then manually or automatically upload the packages to Cloud.
As a web application in which you can create and share documents that contain
live code, equations, visualizations as well as text, the Jupyter Notebook is one of the
ideal tools to help you to gain the data science skills you need.
The best practices and tips that will help you to make your notebook an added value
to any data science project!
(To practice panda’s data frames in Python, try this course on Pandas foundations.)
The Jupyter Notebook App produces these documents. We'll talk about this in a
bit. For now, you should know that "Jupyter" is a loose acronym meaning Julia, Python,
and R. These programming languages were the first target languages of the Jupyter
application, but nowadays, the notebook technology also supports many other languages.
And there you have it: the Jupyter Notebook.
As you just saw, the main components of the whole environment are, on the one
hand, the notebooks themselves and the application. On the other hand, you also have a
notebook kernel and a notebook dashboard.
Let's look at these components in more detail.
The dashboard of the application not only shows you the notebook documents that
you have made and can reopen but can also be used to manage the kernels: you can which
ones are running and shut them down if necessary.
Let's back up briefly to the late 1980s. Guido Van Rossum begins to work on
Python at the National Research Institute for Mathematics and Computer Science in the
Netherlands.
Fast forward two years: the IPython team had kept on working, and in
2007, they formulated another attempt at implementing a notebook-type system. By
October 2010, there was a prototype of a web notebook, and in the summer of 2011, this
prototype was incorporated, and it was released with 0.12 on December 21, 2011. In
subsequent years, the team got awards, such as the Advancement of Free Software for
Fernando Pérez on 23 of March 2013 and the Jolt Productivity Award, and funding
from the Alfred P. Sloan Foundations, among others.
Lastly, in 2014, Project Jupyter started as a spin-off project from IPython. IPython
is now the name of the Python backend, which is also known as the kernel. Recently, the
next generation of Jupyter Notebooks has been introduced to the community. It's called
JupyterLab.
After all this, you might wonder where this idea of notebooks originated or how
it came about to the creators.
A brief research into the history of these notebooks learns that Fernando Pérez and
Robert Kern were working on a notebook just at the same time as the Sage notebook was
a work in progress. Since the layout of the Sage notebook was based on the layout of
Google notebooks, you can also conclude that also Google used to have a notebook
feature around that time.
For what concerns the idea of the notebook, it seems that Fernando Pérez, as well
as William Stein, one of the creators of the Sage notebook, have confirmed that they were
avid users of the Mathematica notebooks and Maple worksheets. The Mathematica
notebooks were created as a front end or GUI in 1988 by Theodore Gray.
The concept of a notebook, which contains ordinary text and calculation and/or
graphics, was definitely not new.
Also, the developers had close contact with one another and this, together with
other failed attempts at GUIs for IPython and the use of "AJAX" = web applications,
which didn't require users to refresh the whole page every time you do something, were
two other motivations for the team of William Stein to start developing the Sage
notebooks.
7. SYSTEM IMPLEMENTATION
7.1Modules
7.1.1 Data gathering
The sample data has been collected from our college placement department which
consists of all the records of previous years students. The dataset collected consist of over
1000 instances of students.
Attribute selection: Some of the attributes in the initial dataset that was not
pertinent (relevant) to the experiment goal were ignored. The attributes name, roll
no, name, phone number are not used. The main attributes used for this study are
Technical skills, Communication, Aptitude, CGPA.
Cleaning missing values: In some cases, the dataset contain missing values. We
need to be equipped to handle the problem when we come across them.
Obviously, you could remove the entire line of data but what if you're
inadvertently removing crucial information after all we might not need to try to do
that. One in every of the foremost common plan to handle the matter is to require
a mean of all the values of the same column and have it to replace the missing
data. The library used for the task is called SCIKIT Learn preprocessing. It
contains a class called Imputer which will help us take care of the missing data.
Training and Test data: Splitting the Dataset into Training set and Test Set Now
the next step is to split our dataset into two. Training set and a Test set. We will
train our machine learning models on our training set, i.e., our machine learning
models will try to understand any correlations in our training set and then we will
test the models on our test set to examine how accurately it will predict. A general
rule of the thumb is to assign 80% of the dataset to training set and therefore the
remaining 20% to test set.
Feature Scaling: The final step of data preprocessing is feature scaling. But what
is it? It is a method used to standardize the range of independent variables or
features of data. But why is it necessary? A lot of machine learning models are
based on Euclidean distance. If, for example, the values in one column (x) is much
higher than the value in another column (y), (x2-x1) squared will give a far greater
value than (y2-y1) squared. So clearly, one square distinction dominates over the
other square distinction. In the machine learning equations, the square difference
with the lower value in comparison to the far greater value will almost be treated
as if it does not exist. We do not want that to happen. That is why it’s necessary to
transform all our variables into the same scale.
7.1.3 Processing:
Classification of data is a two-phase process. In phase one which is called training
phase a classifier is built using training set of tuples. The second phase is the
classification phase, where the testing set of tuples is used for validating the model and
the performance of the model is analyzed.
7.1.4 Interpretation:
The data set used for is further splitted into two sets consisting of two third as
training set and one third as testing set. Algorithms applied random forest shown the best
results. The efficiency of the approaches is compared in terms of the accuracy. The
accuracy of the prediction model/classifier is defined as the total number of correctly
predicted/classified instances.
Department of CSE, VITW 50
Campus Placement Prediction Using Supervised Machine Learning Techniques
7.1.6 Interface:
The data set used for is further splitted into two sets consisting of two third as
training set and one third as testing set. Algorithms applied random forest shown the best
results. The efficiency of the approaches is compared in terms of the accuracy. The
accuracy of the prediction model/classifier is defined as the total number of correctly
predicted/classified instances.
Based on the interpretation data, weak student analysis will be processed based on
the attributes which we have considered during the analysis. Based on the attributes, we
will provide a detailed analysis on weak students on what skills they are good at and
where they have to improve their skill.
7.2 CODE
import pandas as pd
import numpy as np
df = pd.read_excel('placement.xlsx')#,encoding='iso-8859-1')
df.head()
df.shape
df.info()
df.columns
df['Programming'].mean()
x=df['Programming']
y=df['Technical skills']
import seaborn as sns
sns.jointplot(x=x, y=y, data=df);
#def find_std_mean_e_sub():
import matplotlib.pyplot as plt
df['Aptitude_stddev']=[s-df['Aptitude'].mean() for s in df['Aptitude']]
df['Aptitude_stddev'].mean()
vals=df['Aptitude_stddev'].value_counts().keys().tolist()
counts=df['Aptitude_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean Aptitude Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
#def find_std_mean_e_sub():
df['Technicalskills_stddev']=[s-df['Technical skills'].mean() for s in df['Technical skills']]
df['Technicalskills_stddev'].mean()
vals=df['Technicalskills_stddev'].value_counts().keys().tolist()
counts=df['Technicalskills_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean 'Technicalskills_stddev' Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
#def find_std_mean_e_sub():
df['Programming_stddev']=[s-df['Programming'].mean() for s in df['Programming']]
df['Programming_stddev'].mean()
vals=df['Programming_stddev'].value_counts().keys().tolist()
counts=df['Programming_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean programming skills_stddev' Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
#def find_std_mean_e_sub():
df['Communication_stddev']=[s-df['Communication '].mean() for s in df['Communication
']]
df['Communication_stddev'].mean()
vals=df['Communication_stddev'].value_counts().keys().tolist()
counts=df['Communication_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean 'Communication students stddev'
Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
df.columns
df['sum of individual student deviations']=df[['Aptitude_stddev', 'Technicalskills_stddev',
'Programming_stddev', 'Communication_stddev']].sum(axis=1)
vals=df['sum of individual student deviations'].value_counts().keys().tolist()
counts=df['sum of individual student deviations'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students shown actual deviation from sum of individual student deviations
Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
Department of CSE, VITW 53
Campus Placement Prediction Using Supervised Machine Learning Techniques
df['Average_of_ind_score_dev'].describe()
#selecting values above mean and the range we set
df['select_50_above_mean']=[s>mu*0.50 for s in df['Average_of_ind_score_dev']]
#values fall in rage of
print('select range,max,min ',mu*0.50 ,x2,mu)
df['select_50_above_mean'].value_counts()
df['selected']=df.iloc[:,-1:]*1
df.columns
import numpy as np
df['unselected']=np.logical_not(df['selected'].astype(int))
df['unselected']=df['unselected']*1
df.iloc[:,-2:]
#Basic requirements
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn import metrics
df1=pd.read_csv('fromscore.csv')
X=df1
y=df1['is_selected']
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=0.2,random_state=100)
#----------------------------------------------------------------------------------------------------
#SupportvectorClassifier
from sklearn.svm import SVC
model = SVC(gamma='scale')
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
y_pred = model.predict(X_test)
print("classcification model SVM accuracy(in %):", metrics.accuracy_score(Y_test,
y_pred) * 100)
Department of CSE, VITW 55
Campus Placement Prediction Using Supervised Machine Learning Techniques
8. TESTING
Unit Testing:
Unit testing is the first level of testing and is often performed by the developers
themselves. It is the process of ensuring individual components of a piece of software at
the code level are functional and work as they were designed to. Developers in a test-
driven environment will typically write and run the tests prior to the software or feature
being passed over to the test team. Unit testing can be conducted manually, but
automating the process will speed up delivery cycles and expand test coverage.
Integration Testing:
After each unit is thoroughly tested, it is integrated with other units to create
modules or components that are designed to perform specific tasks or activities. These are
then tested as group through integration testing to ensure whole segments of an
application behave as expected (i.e, the interactions between units are seamless).
System Testing:
System testing is a black box testing method used to evaluate the completed and
integrated system, as a whole, to ensure it meets specified requirements. The functionality
of the software is tested from end-to-end and is typically conducted by a separate testing
team than the development team before the product is pushed into production.
Unit Testing:
Unit Testing is done on individual modules as they are completed and become
executable. It is confined only to the designer's requirements.
Integrating Testing
Integration testing ensures that software and subsystems work together a whole. It
tests the interface of all the modules to make sure that the modules behave properly when
integrated together. In this case the communication between the device and Google
Translator Service.
System Testing
Involves in-house testing in an emulator of the entire system before delivery to the
user. It's aim is to satisfy the user the system meets all requirements of the Client through
any standard browser's specifications.
Acceptance Testing
It is a pre-delivery testing in which entire system is tested in a real androiddevice
on real world data and usage to find errors.
Test Approach
Testing can be done in two ways
Bottom up approach
Top down approach
Bottom up Approach
Testing can be performed starting from smallest and lowest level modules and
proceeding one at a time. For each module in bottom up testing a short program executes
the module and provides the needed data so that the module is asked to perform the way it
will when embedded within the larger system.
Validation
The system has been tested and implemented successfully and thus ensured that
all the requirements as listed in the software requirements specification are completely
fulfilled. In case of erroneous input corresponding error messages are displayed.
9. SCREENSHOTS
10. CONCLUSION
12. BIBLIOGRAPHY
References:
1. Mangasuli Sheetal B, Prof. Savita Bakare “Prediction of Campus Placement Using
Data Mining AlgorithmFuzzy logic and K nearest neighbour” International Journal of
Advanced Research in Computer and Communication Engineering Vol. 5, Issue 6,
June2016.
2. Ajay Shiv Sharma, Swaraj Prince, Shubham Kapoor, Keshav Kumar “PPS-Placement
prediction system using logistic regression” IEEE international conference on
MOOC,innovation and Technology in Education(MITE), December 2014.
3. Jai Ruby, Dr. K. David “Predicting the Performance of Students in Higher Education
Using Data Mining Classification Algorithms - A Case Study” International Journal
for Research in Applied Science & Engineering Technology (IJRASET) Vol. 2,Issue
11,November 2014.
4. Ankita A Nichat, Dr.Anjali B Raut “Predicting and Analysis of Student Performance
Using Decision Tree Technique” International Journal of Innovative Research in
Computer and Communication Engineering V0l. 5, Issue 4, April 2017.
5. Oktariani Nurul Pratiwi “Predicting Student Placement Class using Data Mining”
IEEE International Conference 2013.
6. Ajay Kumar Pal and Saurabh Pal, “Classification Model of Prediction for Placement
of Students”, I. J. Modern Education and Computer Science, 2013, 11, 49-56.
7. Ravi Tiwari and Awadhesh Kumar Sharma, “A Data Mining Model to Improve
Placement”, International Journal of Computer Applications (0975 – 8887) Volume
120 – No.12, June 2015.
8. Ms.sonal patil, Mr.Mayur Agrawal, Ms.Vijaya R. Baviskar “Efficient Processing of
Decision Tree using ID3 & improved C4.5 Algorithm”, International Journal of
Computer Science and Information Technologies, Vol. 6 (2) , 2015, 1956-1961.