0% found this document useful (0 votes)
21 views93 pages

Binder 1

Uploaded by

Prudhvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views93 pages

Binder 1

Uploaded by

Prudhvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

CAMPUS PLACEMENT PREDICTION USING SUPERVISED

MACHINE LEARNING TECHNIQUES

A Project report submitted in fulfillment of the requirements for the award of the Degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by
P. RENUKA (19NP5A0507)
A. SUPRIYA (18NP1A0551)
K. CHANDANA (18NP1A0578)
CH. DIVYANJALI (18NP1A0553)

Under the Esteemed Guidance of


Dr. A. C. P. RANJANI, M.Tech, Ph.D
Associate Professor and HOD
Dept. of Computer Science & Engineering

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


VIJAYA INSTITUTE OF TECHNOLOGY FOR WOMEN
(Affiliated to JNTUK Kakinada, Approved by A.I.C.T.E,
An ISO 9001:2015 certified institution New Delhi)
ENIKEPADU, VIJAYAWADA – 521108
2021 - 2022
(Affiliated to JNTU Kakinada, Approved by A.I.C.T.E, New Delhi)
Enikepadu, Vijayawada – 521108

CERTIFICATE
This is to certify that the dissertation entitled “CAMPUS PLACEMENT
PREDICTION USING SUPERVISED MACHINE LEARNING TECHNIQUES” is a
bonafide work done by P. RENUKA (19NP5A0507), A. SUPRIYA (18NP1A0551),
K. CHANDANA (18NP1A0578), CH.DIVYANJALI (18NP1A0553) under my guidance
and supervision and is submitted to JAWAHARLAL NEHRU TECHNOLOGICAL
UNIVERSITY, KAKINADA in fulfillment for the award of the Degree of “Bachelor of
Technology” in CSE is a record of confined work carried out by her under my guidance
and supervision during the academic year 2021-2022 and it has been found worthy of
acceptance according to the requirements of the university.

SIGNATURE OF PROJECT GUIDE SIGNATURE OF HOD


Dr. A. C. P. RANJANI, M.Tech, Ph.D Dr. A. C. P. RANJANI, M.Tech, Ph.D

SIGNATURE OF EXTERNAL EXAMINER

i
ACKNOWLDGEMENT

We wish to express our sincere thanks to various personalities who were responsible
for the successful completion of the main project.

We thank our chairman Sri. B. S. APPARAO and our secretary Sri B. SRI KRISHNA
for providing the necessary infrastructure required for our project.

We thank our principal Dr. CH. CHENCHAMMA for providing the necessary
infrastructure required for our project.

We are grateful to Dr. A.C. P. RANJANI, Head of the Department, for providing the
necessary facilities for completing the project in specified time.

We express our deep felt gratitude to Dr. A. C. P. RANJANI, with her valuable
guidance and unstinting encouragement enabled us to accomplish our project successfully in
time.

Our special thanks to Mrs. K. NAGA BHAVANI, librarian and the entire library staff
of Vijaya Institute of Technology for Women for providing the necessary library facilities.

We also express our gratitude to system administrator and other lab assistants for their
support in executing the project.

We express our earnest thanks to all other faculty members of CSE for extending their
helping hands and valuable suggestions when in need.

PROJECT MEMBERS:
P. RENUKA (19NP5A0507)
A. SUPRIYA (18NP1A0551)
K. CHANDANA (18NP1A0578)
CH. DIVYANJALI (18NP1A0553)

ii
DECLARATION

Here by declared that this project work entitled “CAMPUS PLACEMENT


PREDICTION USING SUPERVISED MACHINE LEARNING TECHNIQUES” is a
genuine work carried out by us, for the fulfilment of Bachelor of Technology to the Dept. of
Computer Science & Engineering during the academic year 2021-2022 under the supervision
of my project guide Dr. A. C. P. RANJANI and submitted to the Dept. of CSE, Vijaya Institute
of Technology for women and that it has not formed the basis for the award of any
Degree/Diploma or other similar title to any candidate of the university.

PROJECT MEMBERS:
P. RENUKA (19NP5A0507)
A. SUPRIYA (18NP1A0551)
K. CHANDANA (18NP1A0578)
CH. DIVYANJALI (18NP1A0553)

iii
ABSTRACT

A placement predictor is to be designed to calculate the possibility of a student being


placed in a company, subject to the criterion of the company. The placement predictor takes
many parameters which can be used to assess the skill level of the student. While some
parameters are taken from the university level, others are obtained from tests conducted in the
placement management system itself. Combining these data points, the predictor is to
accurately predict if the student will or will not be placed in a company. Data from past students
are used for training the predictor.

In this study, the objective is to analyze previous year's student's data and use it to
predict the placement chance of the current students. This model is proposed with an algorithm
to predict the placement chance of students, and also suitable data pre-processing methods were
applied. This proposed model is also compared with other traditional classification algorithms
such as Decision tree, Support Vector Machine and Random Forest with respect to accuracy.
From the results obtained it is found that the proposed algorithm performs significantly better
in comparison with the other algorithms mentioned. For this, we trained each of the algorithms
with the data set that we acquired and tested it against some test data to find the accuracy of
the algorithms. For each algorithm, we can easily obtain the True Positive, True Negative, False
Positive and False Negative. With these four values, it was a matter of finding the accuracy
using the accuracy equation.

iv
TABLE OF CONTENTS
CERTIFICATE i
ACKNOWLDGEMENT ii
DECLARATION iii
ABSTRACT iv
TABLE OF CONTENT v-vi
LIST OF FIGURES vii

1. INTRODUCTION 1

2. LITERATURE SURVEY 3

3. SYSTEM ANALYSIS 5
3.1 Existing System 5
3.1.1 Limitations of Existing system 5
3.2 Proposed System 6
3.2.1 Advantages of Proposed System 6
3.3 Algorithms 6
3.4 Functional Requirements 7
3.5 Non-Functional Requirements 8
3.6 Software Requirements 10
3.7 Hardware Requirements 10

4. SYTEM STUDY 11
4.1 Economical Feasibility 14
4.2 Technical Feasibility 14
4.3 Social feasibility 15

5. SYSTEM DESIGN 16
5.1 System architecture 16
5.2 UML Diagrams 17
5.2.1 Use Case Diagram 18
5.2.2 Sequence Diagram 22
5.2.3 Collaboration Diagram 24
5.2.4 Class Diagram 25
v
6. SOFTWARE ENVIRONMENT 31
6.1 Python introduction 31
6.2 Anaconda 43
6.2.1 Overview 43
6.2.2 Anaconda Navigator 44
6.2.3 Anaconda Cloud 45
6.3 Jupyter Notebook 45

7. SYSTEM IMPLEMENTATIONS 49
7.1 MODULES 49
7.1.1 Data Gathering 49
7.1.2 Pre processing 49
7.1.3 Processing 50
7.1.4 Interpretation 50
7.1.5 Weak Student Analysis 51
7.1.6 Interface 51
7.2 CODING 52

8. TESTING 58
8.1 Testing Methodologies 58
8.2 Testing Activities 58
8.3 Types of Testing 59
8.3.1 Black Box Testing 59
8.3.2 White Box Testing 60

9. OUTPUT SCREENS 62

10. CONCLUSION 71

11. FUTURE SCOPE 72

12. BIBLIOGRAPHY 73

vi
LIST OF FIGURES

Fig 1 Agile model

Fig 2 Architecture

Fig 3 Use Case Diagram

Fig 4 Sequence Diagram

Fig 5 Collaboration Diagram

Fig 6 Class Diagram

Fig 7 Activity Diagram

Fig 8 State Chart Diagram

Fig 9 White Box Testing

Fig 10 Black Box Testing

vii
CHAPTER 1
INTRODUCTION
Campus Placement Prediction Using Supervised Machine Learning Techniques

1. INTRODUCTION

Placements are considered to be very important for each and every college. The
basic success of the college is measured by the campus placement of the students. Every
student takes admission to the colleges by seeing the percentage of placements in the
college. Hence, in this regard the approach is about the prediction and analyses for the
placement necessity in the colleges that helps to build the colleges as well as students to
improve their placements. In Placement Prediction, system predicts the probability of an
undergraduate students getting placed in a company by applying classification algorithms
such as Decision tree, Support Vector Machine and Random Forest. The main objective
of this model is to predict whether the student he/she gets placed or not in campus
recruitment drives. For this the data consider is the academic history of student like
overall percentage, CGPA, Aptitude skills, communication, technical skills. The
algorithms are applied on the previous year’s data of the student.

The Training and the Placement activity in college is one of the important
activities in the life of student. Therefore, it is very important to make a process hassle
free so that, students would be able to get required information as and when they require.
Also, with the help of the good system it would be easy for staff of the Training and
Placement cell to update students easily and the work would be less. The “College
placement Prediction using Machine Learning” is developed to override the problems
prevailing in practicing manual system. This software is supported to eliminate and, in
some cases, reduce the hardships faced by the existing system. Moreover, this system is
designed for a need of company to carry out operations in smooth and effective manner.
Majority of the companies are focusing on campus recruitment to fill up their positions.
The companies identify talented and qualified professionals before they have completed
their education. This method is best way to work on a right resource at the right time to
get good companies at beginning of their career. Every organization, whether big or
small, has challenges to overcome and managing the information of placement, training,
placement cells, technical skill. Every training and placement management system has
different training needs; therefore, we design exclusive employee management system
that are adapted to your managerial requirements.

Department of CSE, VITW 1


Campus Placement Prediction Using Supervised Machine Learning Techniques

This is designed to assist in strategic planning and will help you ensure that your
organization is equipped with the right level of information and details of your future
goals. Also, for those busy executives who are always on the go, our systems come with
remote access features, which will allow you to manage your workforce anytime. These
systems will ultimately allow you to manage resources. Students studying in final year of
engineering focus on getting employed in reputed companies. The prediction of
placement status that B.E students are most likely to achieve will help students to put in a
harder work to make appropriate progress. It will also help Faculty as well as placement
cell in an institution to provide proper care towards improvement of students in a duration
of course. A high placement rate is the key entity in building the reputation of an
educational institution. It will also help the placement cell in an institute to provide proper
care towards improvement of students. This system has the significant place in the
educational system of any higher learning institution.

Department of CSE, VITW 2


CHAPTER 2
LITERATURE SURVEY
Campus Placement Prediction Using Supervised Machine Learning Techniques

2. LITERATURE SURVEY

“Data Mining Approach for Predicting Student and Institution's Placement


Percentage”, Professor. Ashok M Assistant Professor Apoorva A ,2016 International
Conference on Computational Systems and Information Systems for Sustainable
Solutions. In this paper author has used the data mining technique for the prediction of the
student’s placement. For the prediction of student's placement author has divided the data
into the two segments, first segment is the training segment which is historic data of
passed out students. Another segment consists of current data of students, based on the
historic data author has designed the algorithm for calculating the placement chances.
Author has used the various data mining algorithms such as decision tree, Naive Bayes,
neural network and the prosed algorithm were applied, and decision are made with the
help of confusion matrix.

Student Placement Analyzer: A Recommendation System Using Machine Learning,


Senthil Kumar Thangavel, Divya Bharathi P, Abijith Sankar, International Conference on
Advanced Computing and Communication Systems (ICACCS -2017), Jan. 06 - 07, 2017,
Coimbatore, INDIA In this paper author is concern about the challenges face by any
institute regarding the placement. The placement prediction is very complex when the
number of the entities increases in any institute. With the help of machine learning this
complex problem of prediction can be easily solved. In this paper all the academic record
of student is taken into consideration. Various classification and data making algorithms
are used such as Naïve Bayes, Decision Tree, SVM and Regressions. After the prediction
of the students can be placed in of the given category that is core company, dream
company or support services.

"A Placement Prediction System Using K-Nearest Neighbors Classifier", Animesh


Giri, M Vignesh V Bhagavath, Bysani Pruthvi, Naini Dubey, Second International
Conference on Cognitive Computing and Information Processing (CCIP), 2016 The
placement prediction system predicts the probability of students getting placed in various
companies by applying K-Nearest Neighbors classification. The result obtained is also
compared with the results obtained from other machine learning models like Logistic
Regression and SVM. The academic history of student along with their skill sets like

Department of CSE, VITW 3


Campus Placement Prediction Using Supervised Machine Learning Techniques

programming skills, communication skills, analytical skills and team work is considered
which is tested by companies during recruitment process. Data of past two batches are
taken for this system.

"Class Result Prediction using Machine Learning", Pushpa S K, Associate


Professor, Manjunath T N, Professor and Head, Mrunal T V, Amartya Singh, C Suhas,
International Conference on Smart Technology for Smart Nation, 2017 In this paper, the
result of a class is predicted using machine learning. Performance of students in past
semester along with scores of internal examinations of the current semester is considered
to predict whether the student passes or fails in the current semester before attempting the
final examination. The author uses SVM, Naive Bayes, Random Forest Classifier and
Gradient Boosting to compute the result. Boosting is an ensemble learning algorithm
which combines various learning algorithm to obtain better predictive performance.

Student Placement Analyzer: A Recommendation System Using Machine Learning,


Apoorva Rao R, Deeksha K C, Vishal Prajwal R, Vrushak K, Nandini, JARIIE-ISSN(O)-
2395-4396 Now-a-days institutions are facing many challenges regarding student
placements. For educational institutions it is much difficult task to keep record of every
single student and predict the placement of student manually. To overcome these
challenges, concept of machine learning and various algorithms are explored to predict
the result of class students. For this purpose, training data set is historical data of past
students and this is used to train the model. This software system predicts placement
status in 5 categories viz dream company, core company, mass recruiter, not eligible and
not interested in placements. This system is also helpful to weaker students. Institutions
can provide extra care towards weaker students so that they can improve their
performance. By use Naïve Bayes algorithm all the data will be monitor and appropriate
decision will be provided.

Department of CSE, VITW 4


CHAPTER 3
SYSTEM ANALYSIS
Campus Placement Prediction Using Supervised Machine Learning Techniques

3. SYSTEM ANALYSIS

3.1 Existing system: -


The main objective of this project is prediction. For prediction take lots of data is
required. In Existing System, programming approach is used for prediction. This
approach is known as expert system.

The present system generally considers academic performances as single


parameter to judge whether a student can be placed or not during the campus placements.
For calculating the probability of a student getting placed by using some data mining
algorithms, sometimes gives a probability of more than 100% which is not feasible and
denotes a wrong interpretation to student.

Academic performance is not only the parameter for judging the student, but it
requires other parameters Updating Records is another tedious task, hence sorting and
searching problems arises.

The placement officer has to find out the eligible students by looking at the excel
sheet. He/she has to see the marks of every student and their eligibility.

Limitations of existing system: -


 Much Time Taken
 More Programming
 Work on less data
 Overload an Interpreter
 Not suited large datasets

Department of CSE, VITW 5


Campus Placement Prediction Using Supervised Machine Learning Techniques

3.2 Proposed system


Placement Prediction system predicts the probability of the undergraduate
students getting placed in a company by applying classification algorithms such as
Decision tree, Support Vector Machine and Random Forest. The main objective of this
model is to predict whether the student he/she gets placed or not in campus recruitment
drives. For this the data consider is the academic history of student like overall
percentage, CGPA, Technical skills score, Aptitude score, communication skills score.
The algorithms are applied on the previous year’s data of the student.

3.2.1 Advantages of Proposed System: -


 ML Approach
 Recommended large datasets
 Supervised Machine learning
 Strong classification algorithm

3.3 Algorithm
SVM: SVM stands for Support Vector Machine. It is also a supervised machine learning
algorithm that can be used for both classification and regression problems. However, it is
mostly used for classification problems. A point in the n-dimensional space is a data item
where the value of each feature is the value of a particular coordinate. Here, n is the
number of features you have. After plotting the data item, we perform classification by
finding the hyper-plane that differentiates the two classes very well. Now the problem lies
in finding which hyper-plane to be chosen such that it is the right one. Scikit-learn is a
library in Python which can be used to implement various machine learning algorithms
and SVM too can be used using the scikit-learn library.

RFA: We have a plethora of classification algorithms at our disposal, including, but not
limited to, SVM, Logistic regression, decision trees and Naive Bayes classifier, just to
name a few. But, in the hierarchy of classifiers, the Random Forest Classifier sits near the
top. The random forest classifier is a group of individual decision trees and so, we shall
look into how decision trees work. in a random forest, each individual tree with different
properties and classification rules would try to find an appropriate class label for the
problem. Each tree would give out its own answer. A voting is done within the random

Department of CSE, VITW 6


Campus Placement Prediction Using Supervised Machine Learning Techniques

forest to see which class label received the most votes. The class label with the most votes
would be considered the final class label for the problem. This provides a more accurate
model for class label prediction. It can balance errors in data sets where classes are
imbalanced. Large data sets with higher dimensionality can be handled. It can handle
thousands of input variables and could identify the most significant variables and as such,
it is a good dimensionality reduction method

Decision Tree Classifier: A Decision Tree is a classifier that exemplifies the use of
tree-like structure. It gains knowledge on classification. The decision node or non-
leaf node indicates certain test. The outcomes of these tests are signified either of the
branches of that decision node. Each target class is denoted as a leaf node of DT.
Starting from the beginning of the corresponding nodes of the tree is traversed
through the tree until a leaf node is reached. In this way classification result from a
decision tree is obtained.

3.4 Functional Requirements


In software engineering, a functional requirement defines a function of a software
system or its component. A function is described as a set of inputs, the behavior, and
outputs (see also software). Functional requirements may be calculations, technical
details, data manipulation and processing and other specific functionality that define what
a system is supposed to accomplish. Behavioral requirements describing all the cases
where the system uses the functional requirements are captured in use cases. Generally,
functional requirements are expressed in the form “system shall do <requirement>”.The
plan for implementing functional requirements is detailed in the system design. In
requirements engineering, functional requirements specify particular results of a system.
Functional requirements drive the application architecture of a system. A requirements
analyst generates use cases after gathering and validating a set of functional requirements.
The hierarchy of functional requirements is: user/stakeholder request -> feature -> use
case -> business rule.

Functional requirements drive the application architecture of a system. A


requirements analyst generates use cases after gathering and validating a set of functional
requirements. Functional requirements may be technical details, data manipulation and

Department of CSE, VITW 7


Campus Placement Prediction Using Supervised Machine Learning Techniques

other specific functionality of the project is to provide the information to the user.
The following are the Functional requirements of our system:
1. We are providing one query then we will get efficient result.
2. The search of query is based on major intention of user.
3. We are having the effective ranking methodology.
4. A novel framework to exploit the user’s social activities for personalized image search,
such as annotations and the participation of interest groups.
1. On Demand Service
2. Pay-As-Use
3. VM Pricing Model
4. Elasticity 5. Flexibility
The pay-as you-use, which contains two meanings. First, according to the
customer resource demand such as CPU, memory, etc., the physical machines are
dynamically segmented using virtualization technologies and provided to customers in the
form of virtual machines (VMs), and customers pay according to the amount of resources
they actually consumed. Second, the VMs can be dynamically allocated and deal located
at any time, and customers should pay based on how long the resources are actually used.

3.5 Non-Functional Requirements


The major non-functional requirements of the system are as follows
In systems engineering and requirements engineering, a non-functional
requirement is a requirement that specifies criteria that can be used to judge the operation
of a system, rather than specific behaviors.

The project non-functional requirements include the following.


 Updating Work status.
 Problem resolution.
 Error occurrence in the system.
 Customer requests.

Availability: A system’s “availability” or “uptime” is the amount of the time that is


operational and available for use. It is related to the server providing the service to the
users in displaying images. As our system will be used by thousands of users at any time

Department of CSE, VITW 8


Campus Placement Prediction Using Supervised Machine Learning Techniques

our system must be available always. If there are any cases of updations they must be
performed in a short interval of time without interrupting the normal services made
available to the users.

Efficiency: Specifies how well the software utilizes scarce resources: CPU cycles, disk
space, memory, bandwidth etc. All of the above mentioned resources can be effectively
used by performing most of the validations at client side and reducing the workload on
server by using JSP instead of CGI which is being implemented now.

Flexibility: If the organization intends to increase or extend the functionality of the


software after it is deployed, that should be planned from the beginning; it influences
choices made during the design, development, testing and deployment of the system. New
modules can be easily integrated to our system without disturbing the existing modules or
modifying the logical database schema of the existing applications.

Portability: Portability specifies the ease with which the software can be installed on all
necessary platforms, and the platforms on which it is expected to run. By using
appropriate server versions released for different platforms our project can be easily
operated on any operating system, hence can be said highly portable.

Scalability: Software that is scalable has the ability to handle a wide variety of system
configuration sizes. The nonfunctional requirements should specify the ways in which the
system may be expected to scale up (by increasing hardware capacity, adding machines
etc.). Our system can be easily expandable. Any additional requirements such as
hardware or software which increase the performance of the system can be easily added.
An additional server would be useful to speed up the application.

Integrity: Integrity requirements define the security attributes of the system, restricting
access to features or data to certain users and protecting the privacy of data entered into
the software. Certain features access must be disabled to normal users such as adding the
details of files, searching etc. which is the sole responsibility of the server. Access can be
disabled by providing appropriate logins to the users for only access.

Department of CSE, VITW 9


Campus Placement Prediction Using Supervised Machine Learning Techniques

Usability: Ease-of-use requirements address the factors that constitute the capacity of the
software to be understood, learned, and used by its intended users.Hyperlinks will be
provided for each and every service the system provides through which navigation will be
easier. A system that has high usability coefficient makes the work of the user easier.

Performance: The performance constraints specify the timing characteristics of the


software.

Making the application form filling process through online and providing the
invigilation list information and examination hall list is given high priority compared to
other services and can be identified as the critical aspect of the system
1. Our system introduced user specific search performance.
2. The query related search is effective, it provides within short period results, so the
speed of system is very high.
The ranking optimization scheme is available for personalized image search system.

3.6 Software Requirements


Operating system : Windows 10.
Coding Language : Python
IDE : Anaconda
Dataset : Student Dataset

3.7 Hardware Requirements


System : I3 processor
Ram : 4 GB
HDD : 1 TB

Department of CSE, VITW 10


CHAPTER 4
SYSTEM STUDY
Campus Placement Prediction Using Supervised Machine Learning Techniques

4. SYSTEM STUDY

Study of the System


The optimal multiserver configuration and VM pricing problem of cloud
brokers for profit maximization is formulated and a heuristic algorithm combining a
brute force search with the partial derivation method is proposed to calculate the
numerical solutions for the optimization problem.

Software Model or Architecture Analysis:


Structured project management techniques (SDLC)enhance management’s control
over projects by dividing complex tasks into manageable sections. A software life cycle
model is either a descriptive or prescriptive characterization of how software is or should
be developed. But none of the SDLC models discuss the key issues like Change
management, Incident management and Release management processes within the
SDLC process, but, it is addressed in the overall project management. In the proposed
hypothetical model, the concept of user- developer interaction in the conventional SDLC
model has been converted into a three-dimensional model which comprises of the user
processes under the overall project management is missing of key technical issues
pertaining to software development process that is, these issues are talked in the project
management at the surface level but not at the ground level.

What Is SDLC?
A software cycle deals with various parts and phases from planning to testing and
deploying software. All these activities are carried out in different ways, as per the needs.
Each way is known as a Software Development Lifecycle Model (SDLC). A software life
cycle model is either a descriptive or prescriptive

The SDLC models:


The Linear model (Waterfall) - Separate and distinct phases of specification and
development. - All activities in linear fashion. - Next phase starts only when first one is
complete.

Department of CSE, VITW 11


Campus Placement Prediction Using Supervised Machine Learning Techniques

Evolutionary development- Specification and development are interleaved (Spiral,


incremental, prototype based, Rapid Application development). - Incremental Model
(Waterfall in iteration), RAD (Rapid Application Development) - Focus is on developing
quality product in less time.

Spiral Model- We start from smaller module and keeps on building it like a spiral. It is
also called Component based development.

Formal systems development- A mathematical system model is formally transformed to


an implementation.

Agile Methods. - Inducing flexibility into development.

Reuse-based development- The system is assembled from existing components.

The General Model


Software life cycle models describe phases of the software cycle and the order in
which those phases are executed. There are tons of models, and many companies adopt
their own, but all have very similar patterns. Each phase produces deliverables required
by the next phase in the life cycle. Requirements are translated into design. Code is
produced during implementation that is driven by the design. Testing verifies the
deliverable of the implementation phase against requirements.

SDLC Methodology
AGILE MODEL:
Agile SDLC model is a combination of iterative and incremental process models
with focus on process adaptability and customer satisfaction by rapid delivery of working
software product. Agile Methods break the product into small incremental builds. These
builds are provided in iterations. Each iteration typically lasts from about one to three −
weeks. Every iteration involves cross functional teams working simultaneously on various
areas like
 Planning
 Requirements Analysis

Department of CSE, VITW 12


Campus Placement Prediction Using Supervised Machine Learning Techniques

 Design
 Coding
 Unit Testing and
 Acceptance Testing.

Department of CSE, VITW 13


Campus Placement Prediction Using Supervised Machine Learning Techniques

Feasibility Study
The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out.

This is to ensure that the proposed system is not a burden to the company. For
feasibility analysis, some understanding of the major requirements for the system is
essential.

Three key considerations involved in the feasibility analysis are


4.1 ECONOMICAL FEASIBILITY
4.2 TECHNICAL FEASIBILITY
4.3 SOCIAL FEASIBILITY

4.1 ECONOMICAL FEASIBILITY


Development of this application is highly economically feasible. The organization
needed not spend much money for the development of the system already available. The
only thing is to be done is making an environment for the development with an effective
supervision. If we are doing so, we can attain the maximum usability of the corresponding
resources. Even after the development, the organization will not be in condition to invest
more in the organization. Therefore, the system is economically feasible.

4.2 TECHNICAL FEASIBILITY


We can strongly say that it is technically feasible, since there will not be much
difficulty in getting required resources for the development and maintaining the system as
well. All the resources needed for the development of the software as well as the
maintenance of the same is available in the organization here we are utilizing the
resources which are available already.

Department of CSE, VITW 14


Campus Placement Prediction Using Supervised Machine Learning Techniques

4.3 SOCIAL FEASIBILITY


Whatever we think need not be feasible. It is wise to think about the feasibility of
any problem we undertake. Feasibility is the study of impact, which happens in the
organization by the development of a system. The impact can be either positive or
negative. When the positives nominate the negatives, then the system is considered
feasible. Here the feasibility study can be performed in two ways such as technical
feasibility and Economical Feasibility.

Department of CSE, VITW 15


CHAPTER 5
SYSTEM DESIGN
Campus Placement Prediction Using Supervised Machine Learning Techniques

5. SYSTEM DESIGN
System Design
Use-oriented techniques are widely used in software requirement analysis and
design.

Use cases and usage scenarios facilitate system understanding and provide a
common language for communication. This paper presents a scenario-based modeling
technique and discusses its applications. In this model, scenarios are organized
hierarchically and they capture the system functionality at various abstraction levels
including scenario groups, scenarios, and sub-scenarios. Combining scenarios or sub-
scenarios can form complex scenarios. Data are also separately identified, organized, and
attached to scenarios. This scenario model can be used to cross check with the UML
model. It can also direct systematic scenario-based testing including test case generation,
test coverage analysis with respect to requirements, and functional regression testing.

5.1 ARCHITECTURE:

5.1 System Architecture

Department of CSE, VITW 16


Campus Placement Prediction Using Supervised Machine Learning Techniques

5.2 UML Diagrams


UML represents Unified Modeling Language. UML is an institutionalized
universally useful showing dialect in the subject of article situated programming
designing. The fashionable is overseen, and become made by way of, the Object
Management Group.

The goal is for UML to become a regular dialect for making fashions of item
arranged PC programming. In its gift frame UML is contained two noteworthy
components: a Meta-show and documentation. Later on, a few type of method or system
can also likewise be brought to; or related with, UML.

The Unified Modeling Language is a popular dialect for indicating, Visualization,


Constructing and archiving the curios of programming framework, and for business
demonstrating and different non-programming frameworks.

Department of CSE, VITW 17


Campus Placement Prediction Using Supervised Machine Learning Techniques

5.2.1 USE CASE DIAGRAM:


A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases.

The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted. Interaction
among actors is not shown on the use case diagram. If this interaction is essential to a
coherent description of the desired behavior, perhaps the system or use case boundaries
should be re-examined. Alternatively, interaction among actors can be part of the
assumptions used in the use case.

A use case diagram in the Unified Modeling Language (UML) is a type of


behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of actors,
their goals (represented as use cases), and any dependencies between those use cases.

The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted.

Interaction among actors is not shown on the use case diagram. If this interaction
is essential to a coherent description of the desired behavior, perhaps the system or use
case boundaries should be re-examined. Alternatively, interaction among actors can be
part of the assumptions used in the use case.

Use cases:
A use case describes a sequence of actions that provide something of measurable
value to an actor and is drawn as a horizontal ellipse.

Actors:
An actor is a person, organization, or external system that plays a role in one or
more interactions with the system.

Department of CSE, VITW 18


Campus Placement Prediction Using Supervised Machine Learning Techniques

System boundary boxes:


A rectangle is drawn around the use cases, called the system boundary box, to
indicate the scope of system. Anything within the box represents functionality that is in
scope and anything outside the box is not.
Four relationships among use cases are used often in practice.

Include:
In one form of interaction, a given use case may include another. "Include is a
Directed Relationship between two use cases, implying that the behavior of the included
use case is inserted into the behavior of the including use case.

The first use case often depends on the outcome of the included use case. This is
useful for extracting truly common behaviors from multiple use cases into a single
description. The notation is a dashed arrow from the including to the included use case,
with the label "«include»". There are no parameters or return values. To specify the
location in a flow of events in which the base use case includes the behavior of another,
you simply write include followed by the name of use case you want to include, as in the
following flow for track order.

Extend:
In another form of interaction, a given use case (the extension) may extend
another. This relationship indicates that the behavior of the extension use case may be
inserted in the extended use case under some conditions. The notation is a dashed arrow
from the extension to the extended use case, with the label "«extend»". Modelers use the
«extend» relationship to indicate use cases that are "optional" to the base use case.

Generalization:
In the third form of relationship among use cases, a generalization/specialization
relationship exists. A given use case may have common behaviors, requirements,
constraints, and assumptions with a more general use case. In this case, describe them
once, and deal with it in the same way, describing any differences in the specialized
cases. The notation is a solid line ending in a hollow triangle drawn from the specialized
to the more general use case (following the standard generalization notation).

Department of CSE, VITW 19


Campus Placement Prediction Using Supervised Machine Learning Techniques

Associations:
Associations between actors and use cases are indicated in use case diagrams by
solid lines. An association exists whenever an actor is involved with an interaction
described by a use case. Associations are modeled as lines connecting use cases and
actors to one another, with an optional arrowhead on one end of the line. The arrowhead
is often used to indicating the direction of the initial invocation of the relationship or to
indicate the primary actor within the use case.

Identified Use Cases


The “user model view” encompasses the problem and solution from the
preservative of those individuals for whose problem the solution addresses. The view
presents the goals and objectives of the problem owners and their requirements of the
solution. This view is composed of “use case diagrams”. These diagrams describe the
functionality provided by a system to external integrators. These diagrams contain actors,
use cases, and their relationships.

Department of CSE, VITW 20


Campus Placement Prediction Using Supervised Machine Learning Techniques

Fig: 2 Use case Diagram

Department of CSE, VITW 21


Campus Placement Prediction Using Supervised Machine Learning Techniques

SEQUENCE DIAGRAMS:
A sequence diagram in Unified Modeling Language (UML) is a kind of
interaction diagram that shows how processes operate with one another and in what order.
It is a construct of a Message Sequence Chart.

Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams. A sequence diagram shows, as parallel vertical lines (lifelines), different
processes or objects that live simultaneously, and, as horizontal arrows, the messages
exchanged between them, in the order in which they occur. This allows the specification
of simple runtime scenarios in a graphical manner. If the lifeline is that of an object, it
demonstrates a role. In order to display interaction, messages are used. These are
horizontal arrows with the message name written above them. Solid arrows with full
heads are synchronous calls, solid arrows with stick heads are asynchronous calls and
dashed arrows with stick heads are return messages. This definition is true as of UML 2,
considerably different from UML 1.x.

Activation boxes, or method-call boxes, are opaque rectangles drawn on top of


lifelines to represent that processes are being performed in response to the message
(Execution Specifications in UML).

Objects calling methods on themselves use messages and add new activation
boxes on top of any others to indicate a further level of processing. When an object is
destroyed (removed from memory), an X is drawn on top of the lifeline, and the dashed
line ceases to be drawn below it (this is not the case in the first example though). It should
be the result of a message, either from the object itself, or another.

A message sent from outside the diagram can be represented by a message


originating from a filled-in circle (found message in UML) or from a border of sequence
diagram.

A Sequence diagram is dynamic, and, more importantly, is time ordered. A


Collaboration diagram is very similar to a Sequence diagram in the purpose it achieves; in
other words, it shows the dynamic interaction of the objects in a system. A distinguishing

Department of CSE, VITW 22


Campus Placement Prediction Using Supervised Machine Learning Techniques

feature of a Collaboration diagram is that it shows the objects and their association with
other objects in the system apart from how they interact with each other. The association
between objects is not represented in a Sequence diagram.

A Collaboration diagram is easily represented by modeling objects in a system


and representing the associations between the objects as links. The interaction between
the objects is denoted by arrows. To identify the sequence of invocation of these objects,
a number is placed next to each of these arrows.

Fig: 3 Sequence Diagram

Department of CSE, VITW 23


Campus Placement Prediction Using Supervised Machine Learning Techniques

5.2.3 COLLABORATION DIAGRAM:


A sophisticated modeling tool can easily convert a collaboration diagram into a
sequence diagram and the vice versa. Hence, the elements of a Collaboration diagram are
essentially the same as that of a Sequence diagram.

A collaboration diagram shows an interaction organized around the objects in the


interaction and their links to each other. Unlike a sequence diagram, a collaboration
diagram shows the relationship among the objects. On the other hand, a collaboration
diagram does not show time as a sequence dimension, so sequence number determine the
sequence of messages and the concurrent threads.

Fig: 4 Collaboration Diagram

Department of CSE, VITW 24


Campus Placement Prediction Using Supervised Machine Learning Techniques

5.2.4 CLASS DIAGRAM:


In software engineering, a class diagram in the Unified Modeling Language
(UML) is a type of static structure diagram that describes the structure of a system by
showing the system's classes, their attributes, operations (or methods), and the
relationships among the classes. It explains which class contains information.

In software engineering, a class diagram in the Unified Modeling Language


(UML) is a type of static structure diagram that describes the structure of a system by
showing the system's classes, their attributes, operations (or methods), and the
relationships among the classes. It explains which class contains information.

Model, objects are entities that combine state (i.e., data), behavior (i.e.,
procedures, or methods) and identity (unique existence among all other objects). The
structure and behavior of an object are defined by a class, which is a definition, or
blueprint, of all objects of a specific type. An object must be explicitly created based on a
class and an object thus created is considered to be an instance of that class. An object is
similar to a structure, with the addition of method pointers, member access control, and
an implicit data member which locates instances of the class (i.e. actual objects of that
class) in the class hierarchy (essential for runtime inheritance features)

In software engineering, a class diagram in the Unified Modeling Language


(UML) is a type of static structure diagram that describes the structure of a system by
showing the system's classes, their attributes, and the relationships between the classes.

The class diagram is the main building block in the object oriented modeling. It is
used both for general conceptual modeling of the semantics of the application, and for
detailed modeling translating the models into programming code. The classes in a class
diagram represent both the main objects and or interactions in the application and the
objects to be programmed. In the class diagram these classes are represented with boxes
which contain the two parts:
● The upper part holds the name of the class.
● The middle part contains the attributes of the class.
● The lower part contains the operations of the class.

Department of CSE, VITW 25


Campus Placement Prediction Using Supervised Machine Learning Techniques

Fig: 5 Class Diagram

Department of CSE, VITW 26


Campus Placement Prediction Using Supervised Machine Learning Techniques

ACTIVITY DIAGRAM:
Activity diagram is another important diagram in UML to describe the dynamic
aspects of the system. Activity diagram is basically a flowchart to represent the flow from
one activity to another activity. The activity can be described as an operation of the
system.

An activity diagram shows the overall flow of control.


Activity diagrams are constructed from a limited repertoire of shapes, connected
with arrows. The most important shape types:
 Rounded rectangles represent activities;
 Diamonds represent decisions;
 Bars represent the start (split) or end (join) of concurrent activities;
 A black circle represents the start (initial state) of the workflow;
 An encircled black circle represents the end (final state).

Arrows run from the start towards the end and represent the order in which
activities happen. However, the join and split symbols in activity diagrams only resolve
this for simple cases; the meaning of the model is not clear when they are arbitrarily
combined with the decisions or loops.

Department of CSE, VITW 27


Campus Placement Prediction Using Supervised Machine Learning Techniques

Fig: 11 Activity Diagram

Department of CSE, VITW 28


Campus Placement Prediction Using Supervised Machine Learning Techniques

STATE CHART DIAGRAM:


Objects have behaviors and states. The state of an object depends on its current
activity or condition. A state chart diagram shows the possible states of the object and the
transitions that cause a change in state. A state diagram, also called a state machine
diagram or state chart diagram, is an illustration of the states an object can attain as well
as the transitions between those states in the Unified Modeling Language. A state diagram
resembles a flowchart in which the initial state is represented by a large black dot and
subsequent states are portrayed as boxes with rounded corners. There may be one or two
horizontal lines through a box, dividing it into stacked sections. In that case, the upper
section contains the name of the state, the middle section (if any) contains the
state variables and the lower section contains the actions performed in that state. If there
are no horizontal lines through a box, only the name of the state is written inside it.
External straight lines, each with an arrow at one end, connect various pairs of boxes.
These lines define the transitions between states. The final state is portrayed as a large
black dot with a circle around it. Historical states are denoted as circles with the letter H
inside.

Department of CSE, VITW 29


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 30


CHAPTER 6
SOFTWARE ENVIRONMENT
Campus Placement Prediction Using Supervised Machine Learning Techniques

6. SOFTWARE ENVIRONMENT

6.1 Python
Python is a general purpose, dynamic, high level and interpreted programming
language. It supports Object Oriented programming approach to develop applications. It
is simple and easy to learn and provides lots of high-level data structures.

Python is an easy to learn yet powerful and versatile scripting language which
makes it attractive for Application Development.

Python’s syntax and dynamic typing with its interpreted nature, makes it an ideal
language for scripting and rapid application development. Python supports multiple
programming pattern, including object oriented, imperative and functional or procedural
programming styles.

Python is not intended to work on special area such as web programming. That is
why it is known as multipurpose because it can be used with web, enterprise, 3D CAD
etc. We don't need to use data types to declare variable because it is dynamically typed so
we can write a=10 to assign an integer value in an integer variable.

Python makes the development and debugging fast because there is no


compilation step included in python development and edit-test-debug cycle is very fast.

Features of Python
Easy to learn and use-
Python is easy to learn and use. It is developer-friendly and high level
programming language.

Expressive Language –
Python language is more expressive means that it is more understandable and
readable.

Department of CSE, VITW 31


Campus Placement Prediction Using Supervised Machine Learning Techniques

Interpreted Language –
Python is an interpreted language i.e., interpreter executes the code line by line at
a time. This makes debugging easy and thus suitable for beginners.

Cross-platform language –
Python can run equally on different platforms such as Windows, Linux, Unix and
Macintosh etc. So, we can say that Python is a portable language.

Free and open source –


Python language is freely available at the official site. The source-code is also
available. Therefore, it is open source.

Object oriented language-


Python supports the object oriented language and concepts of classes and objects
comes into existence.

Extensible-
It implies that other languages such as C/C++ can be used to compile the code and
thus it can be used further in our python code.

Large standard library-


Python has a large and broad library and provides rich set of modules and
functions for rapid application development.

GUI Programming support-


Graphical user interfaces can be developed using Python.

Integrated-
It can be easily integrated with languages like C, C++, JAVA etc.

Python Application-
Python is known for its general purpose nature that makes it applicable in almost
each domain of software development. Python as a whole can be used in any sphere of
development.

Department of CSE, VITW 32


Campus Placement Prediction Using Supervised Machine Learning Techniques

How to Install Python (Environment Set-up)


Download the latest release of Python. In this process, we will install Python 3.6.7
on our Windows operating system.

Double-click the executable file which is downloaded; the following window will
open. Select Customize installation and proceed.

Department of CSE, VITW 33


Campus Placement Prediction Using Supervised Machine Learning Techniques

The following window shows all the optional features. All the features need to be
installed and are checked by default; we need to click next to continue.

Fig: 4.2 Python Installation Step 2

The following window shows a list of advanced options. Check all the options
which you want to install and click next. Here, we must notice that the first check-box
(install for all users) must be checked.

Department of CSE, VITW 34


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 35


Campus Placement Prediction Using Supervised Machine Learning Techniques

Now, try to run python on the command prompt. Type the command python in case of
python2 or python3 in case of python3. It will show an error as given in the below image.
It is because we haven't set the path.

Department of CSE, VITW 36


Campus Placement Prediction Using Supervised Machine Learning Techniques

To set the path of python, we need to the right click on "my computer" and go to
Properties → Advanced → Environment Variables.

Department of CSE, VITW 37


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 38


Campus Placement Prediction Using Supervised Machine Learning Techniques

Add the new path variable in the user variable section

Department of CSE, VITW 39


Campus Placement Prediction Using Supervised Machine Learning Techniques

Type PATH as the variable name and set the path to the installation directory of
the python shown in the below image.

Department of CSE, VITW 40


Campus Placement Prediction Using Supervised Machine Learning Techniques

Now, the path is set, we are ready to run python on our local system. Restart
CMD, and type python again. It will open the python interpreter shell where we can
execute the python statements

Department of CSE, VITW 41


Campus Placement Prediction Using Supervised Machine Learning Techniques

Web Application-
We can use Python to develop web applications. It provides libraries to handle
internet protocols such as HTML and XML, JSON, Email processing, request,
beautifulSoup, Feedparser etc. It also provides Frameworks such as Django, Pyramid,
Flask etc to design and develop web based applications. Some important developments
are: PythonWikiEngines, Pocoo, PythonBlogSoftware etc.

Desktop Appication-
Python provides Tk GUI library to develop user interface in python based
application. Some other useful toolkits wxWidgets, Kivy, pyqt that are usable on several
platforms. The Kivy is popular for writing multitouch applications.

Software Development-
Python is helpful for software development process. It works as a support
language and can be used for build control and management, testing etc.

Sciencific and Numeric-


Python is popular and widely used in scientific and numeric computing. Some
useful library and package are SciPy, Pandas, IPython etc. SciPy is group of packages of
engineering, science and mathematics.

Bussiness Application-
Python is used to build Business applications like ERP and e-commerce systems.
Tryton is a high level application platform.

Console Based Application-


We can use Python to develop console based applications. For example: IPython.

Audio or Video based applications-


Python is awesome to perform multiple tasks and can be used to develop
multimedia applications. Some of real applications are: TimPlayer, cplay etc.

Department of CSE, VITW 42


Campus Placement Prediction Using Supervised Machine Learning Techniques

3D CAD Applications-
To create CAD application Fandango is a real application which provides full
features of CAD.

Enterprise Applications-
Python can be used to create applications which can be used within an Enterprise
or an Organization. Some real time applications are: OpenErp, Tryton, Picalo etc

6.2 Anaconda (Python distribution)


Anaconda is a free and open-source distribution of the Python and R
programming languages for scientific computing (data science, machine learning
applications, large-scale data processing, predictive analytics, etc.), that aims to simplify
package management and deployment. Package versions are managed by the package
management system conda. The Anaconda distribution includes data-science packages
suitable for Windows, Linux, and MacOS.

6.2.1 Overview
Anaconda distribution comes with more than 1,500 packages as well as the conda
package and virtual environment manager. It also includes a GUI, Anaconda Navigator, as
a graphical alternative to the command line interface (CLI).

The big difference between conda and the pip package manager is in how package
dependencies are managed, which is a significant challenge for Python data science and
the reason conda exists.

When pip installs a package, it automatically installs any dependent Python


packages without checking if these conflict with previously installed packages. It will
install a package and any of its dependencies regardless of the state of the existing
installation. Because of this, a user with a working installation of, for example, Google
Tensorflow, can find that it stops working having used pip to install a different package
that requires a different version of the dependent numpy library than the one used by
Tensorflow. In some cases, the package may appear to work but produce different results
in detail.

Department of CSE, VITW 43


Campus Placement Prediction Using Supervised Machine Learning Techniques

In contrast, conda analyses the current environment including everything currently


installed, and, together with any version limitations specified (e.g. the user may wish to
have Tensorflow version 2,0 or higher), works out how to install a compatible set of
dependencies, and shows a warning if this cannot be done.

Open source packages can be individually installed from the Anaconda


repository[8], Anaconda Cloud (anaconda.org), or your own private repository or mirror,
using the conda install command. Anaconda Inc compiles and builds all the packages in
the Anaconda repository itself, and provides binaries for Windows 32/64 bit, Linux 64
bit and MacOS 64-bit. Anything available on PyPI may be installed into a conda
environment using pip, and conda will keep track of what it has installed itself and what
pip has installed.

Custom packages can be made using the conda build command, and can be shared
withothers by uploading them to Anaconda Cloud,[9] PyPI or other repositories.

The default installation of Anaconda2 includes Python 2.7 and Anaconda3


includes Python 3.7. However, it is possible to create new environments that include any
versionof Python packaged with Anoconda3.

6.2.2 Anaconda Navigator


Anaconda Navigator is a desktop graphical user interface (GUI) included in
Anaconda distribution that allows users to launch applications and manage conda
packages, environments and channels without using command-line commands. Navigator
can search for packages on Anaconda Cloud or in a local Anaconda Repository, install
them in an environment, run the packages and update them. It is available for Windows,
macOS and Linux.

The following applications are available by default in Navigator:JupyterLab


Jupiter NotebookQtConsole Spider
Glue Orange RStudio
Visual Studio CodeConda
Main article: Conda (package manager)

Department of CSE, VITW 44


Campus Placement Prediction Using Supervised Machine Learning Techniques

Conda is an open source, cross-platform, language-agnostic package manager and


environment management system that installs, runs, and updates packages and their
dependencies. It was created for Python programs, but it can package and distribute
software for any language (e.g., R), including multi-language projects. The conda
package and environment manager is included in all versions of Anaconda, Miniconda,
and Anaconda Repository.

6.2.3 Anaconda Cloud


Anaconda Cloud is a package management service by Anaconda where you can
find, access, store and share public and private notebooks, environments, and conda and
PyPI packages. [20] Cloud hosts useful Python packages, notebooks and environments
for a wide variety of applications. You do not need to log in or to have a Cloud
account, to search for public packages, download and install them.

You can build new packages using the Anaconda Client command line interface
(CLI), then manually or automatically upload the packages to Cloud.

6.3 Jupyter Notebook


This tutorial explains how to install, run, and use Jupyter Notebooks for data
science, including tips, best practices, and examples.

As a web application in which you can create and share documents that contain
live code, equations, visualizations as well as text, the Jupyter Notebook is one of the
ideal tools to help you to gain the data science skills you need.

This tutorial will cover the following topics:


 A basic overview of the Jupyter Notebook App and its components,
 The history of Jupyter Project to show how it's connected to IPython,
 An overview of the three most popular ways to run your notebooks: with thehelp
of a Python distribution, with pip or in a Dockers container,
 A practical introduction to the components that were covered in the first section,
complete with examples of Pandas Data Frames, an explanation on how to make your
notebook documents magical, and answers to frequently asked questions, such as
"How to toggle between Python 2 and 3?", and

Department of CSE, VITW 45


Campus Placement Prediction Using Supervised Machine Learning Techniques

 The best practices and tips that will help you to make your notebook an added value
to any data science project!
(To practice panda’s data frames in Python, try this course on Pandas foundations.)

What Is A Jupyter Notebook?


In this case, "notebook" or "notebook documents" denote documents that contain
both code and rich text elements, such as figures, links, equations, ... Because of the mix
of code and text elements, these documents are the ideal place to bring together an
analysis description, and its results, as well as, they can be executed perform the data
analysis in real time.

The Jupyter Notebook App produces these documents. We'll talk about this in a
bit. For now, you should know that "Jupyter" is a loose acronym meaning Julia, Python,
and R. These programming languages were the first target languages of the Jupyter
application, but nowadays, the notebook technology also supports many other languages.
And there you have it: the Jupyter Notebook.

As you just saw, the main components of the whole environment are, on the one
hand, the notebooks themselves and the application. On the other hand, you also have a
notebook kernel and a notebook dashboard.
Let's look at these components in more detail.

What Is The Jupyter Notebook App?


As a server-client application, the Jupyter Notebook App allows you to edit and
run your notebooks via a web browser. The application can be executed on a PC without
Internet access, or it can be installed on a remote server, where you can access it through
the Internet.

Its two main components are the kernels and a dashboard.


A kernel is a program that runs and introspects the user’s code. The Jupyter
Notebook App has a kernel for Python code, but there are also kernels available for other
programming languages.

The dashboard of the application not only shows you the notebook documents that
you have made and can reopen but can also be used to manage the kernels: you can which
ones are running and shut them down if necessary.

Department of CSE, VITW 46


Campus Placement Prediction Using Supervised Machine Learning Techniques

The History of IPython and Jupyter Notebooks


To fully understand what the Jupyter Notebook is and what functionality it has to
offer you need to know how it originated.

Let's back up briefly to the late 1980s. Guido Van Rossum begins to work on
Python at the National Research Institute for Mathematics and Computer Science in the
Netherlands.

Fast forward two years: the IPython team had kept on working, and in
2007, they formulated another attempt at implementing a notebook-type system. By
October 2010, there was a prototype of a web notebook, and in the summer of 2011, this
prototype was incorporated, and it was released with 0.12 on December 21, 2011. In
subsequent years, the team got awards, such as the Advancement of Free Software for
Fernando Pérez on 23 of March 2013 and the Jolt Productivity Award, and funding
from the Alfred P. Sloan Foundations, among others.

Lastly, in 2014, Project Jupyter started as a spin-off project from IPython. IPython
is now the name of the Python backend, which is also known as the kernel. Recently, the
next generation of Jupyter Notebooks has been introduced to the community. It's called
JupyterLab.

After all this, you might wonder where this idea of notebooks originated or how
it came about to the creators.

A brief research into the history of these notebooks learns that Fernando Pérez and
Robert Kern were working on a notebook just at the same time as the Sage notebook was
a work in progress. Since the layout of the Sage notebook was based on the layout of
Google notebooks, you can also conclude that also Google used to have a notebook
feature around that time.

For what concerns the idea of the notebook, it seems that Fernando Pérez, as well
as William Stein, one of the creators of the Sage notebook, have confirmed that they were
avid users of the Mathematica notebooks and Maple worksheets. The Mathematica
notebooks were created as a front end or GUI in 1988 by Theodore Gray.

Department of CSE, VITW 47


Campus Placement Prediction Using Supervised Machine Learning Techniques

The concept of a notebook, which contains ordinary text and calculation and/or
graphics, was definitely not new.

Also, the developers had close contact with one another and this, together with
other failed attempts at GUIs for IPython and the use of "AJAX" = web applications,
which didn't require users to refresh the whole page every time you do something, were
two other motivations for the team of William Stein to start developing the Sage
notebooks.

Department of CSE, VITW 48


CHAPTER 7
SYSTEM IMPLEMENTATION
Campus Placement Prediction Using Supervised Machine Learning Techniques

7. SYSTEM IMPLEMENTATION

7.1Modules
7.1.1 Data gathering
The sample data has been collected from our college placement department which
consists of all the records of previous years students. The dataset collected consist of over
1000 instances of students.

7.1.2 Pre processing


Data preprocessing is a technique that is used to convert raw data into a clean
dataset. The data is gathered from different sources is in raw format which is not feasible
for the analysis. Pre-processing for this approach takes 4 simple yet effective steps.

 Attribute selection: Some of the attributes in the initial dataset that was not
pertinent (relevant) to the experiment goal were ignored. The attributes name, roll
no, name, phone number are not used. The main attributes used for this study are
Technical skills, Communication, Aptitude, CGPA.
 Cleaning missing values: In some cases, the dataset contain missing values. We
need to be equipped to handle the problem when we come across them.
Obviously, you could remove the entire line of data but what if you're
inadvertently removing crucial information after all we might not need to try to do
that. One in every of the foremost common plan to handle the matter is to require

Department of CSE, VITW 49


Campus Placement Prediction Using Supervised Machine Learning Techniques

a mean of all the values of the same column and have it to replace the missing
data. The library used for the task is called SCIKIT Learn preprocessing. It
contains a class called Imputer which will help us take care of the missing data.

 Training and Test data: Splitting the Dataset into Training set and Test Set Now
the next step is to split our dataset into two. Training set and a Test set. We will
train our machine learning models on our training set, i.e., our machine learning
models will try to understand any correlations in our training set and then we will
test the models on our test set to examine how accurately it will predict. A general
rule of the thumb is to assign 80% of the dataset to training set and therefore the
remaining 20% to test set.

 Feature Scaling: The final step of data preprocessing is feature scaling. But what
is it? It is a method used to standardize the range of independent variables or
features of data. But why is it necessary? A lot of machine learning models are
based on Euclidean distance. If, for example, the values in one column (x) is much
higher than the value in another column (y), (x2-x1) squared will give a far greater
value than (y2-y1) squared. So clearly, one square distinction dominates over the
other square distinction. In the machine learning equations, the square difference
with the lower value in comparison to the far greater value will almost be treated
as if it does not exist. We do not want that to happen. That is why it’s necessary to
transform all our variables into the same scale.

7.1.3 Processing:
Classification of data is a two-phase process. In phase one which is called training
phase a classifier is built using training set of tuples. The second phase is the
classification phase, where the testing set of tuples is used for validating the model and
the performance of the model is analyzed.

7.1.4 Interpretation:
The data set used for is further splitted into two sets consisting of two third as
training set and one third as testing set. Algorithms applied random forest shown the best
results. The efficiency of the approaches is compared in terms of the accuracy. The
accuracy of the prediction model/classifier is defined as the total number of correctly
predicted/classified instances.
Department of CSE, VITW 50
Campus Placement Prediction Using Supervised Machine Learning Techniques

7.1.5 Weak Student Analysis:


Based on the interpretation data, weak student analysis will be processed based on the
attributes which we have considered during the analysis. Based on the attributes, we will
provide an detailed analysis on weak students on what skills they are good at and where
they have to improve their skills.

7.1.6 Interface:
The data set used for is further splitted into two sets consisting of two third as
training set and one third as testing set. Algorithms applied random forest shown the best
results. The efficiency of the approaches is compared in terms of the accuracy. The
accuracy of the prediction model/classifier is defined as the total number of correctly
predicted/classified instances.

Based on the interpretation data, weak student analysis will be processed based on
the attributes which we have considered during the analysis. Based on the attributes, we
will provide a detailed analysis on weak students on what skills they are good at and
where they have to improve their skill.

Department of CSE, VITW 51


Campus Placement Prediction Using Supervised Machine Learning Techniques

7.2 CODE
import pandas as pd
import numpy as np
df = pd.read_excel('placement.xlsx')#,encoding='iso-8859-1')
df.head()
df.shape
df.info()
df.columns
df['Programming'].mean()
x=df['Programming']
y=df['Technical skills']
import seaborn as sns
sns.jointplot(x=x, y=y, data=df);
#def find_std_mean_e_sub():
import matplotlib.pyplot as plt
df['Aptitude_stddev']=[s-df['Aptitude'].mean() for s in df['Aptitude']]
df['Aptitude_stddev'].mean()
vals=df['Aptitude_stddev'].value_counts().keys().tolist()
counts=df['Aptitude_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean Aptitude Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
#def find_std_mean_e_sub():
df['Technicalskills_stddev']=[s-df['Technical skills'].mean() for s in df['Technical skills']]
df['Technicalskills_stddev'].mean()
vals=df['Technicalskills_stddev'].value_counts().keys().tolist()
counts=df['Technicalskills_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean 'Technicalskills_stddev' Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")

Department of CSE, VITW 52


Campus Placement Prediction Using Supervised Machine Learning Techniques

plt.show()
#def find_std_mean_e_sub():
df['Programming_stddev']=[s-df['Programming'].mean() for s in df['Programming']]
df['Programming_stddev'].mean()
vals=df['Programming_stddev'].value_counts().keys().tolist()
counts=df['Programming_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean programming skills_stddev' Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
#def find_std_mean_e_sub():
df['Communication_stddev']=[s-df['Communication '].mean() for s in df['Communication
']]
df['Communication_stddev'].mean()
vals=df['Communication_stddev'].value_counts().keys().tolist()
counts=df['Communication_stddev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students show actual deviation from mean 'Communication students stddev'
Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
df.columns
df['sum of individual student deviations']=df[['Aptitude_stddev', 'Technicalskills_stddev',
'Programming_stddev', 'Communication_stddev']].sum(axis=1)
vals=df['sum of individual student deviations'].value_counts().keys().tolist()
counts=df['sum of individual student deviations'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students shown actual deviation from sum of individual student deviations
Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
Department of CSE, VITW 53
Campus Placement Prediction Using Supervised Machine Learning Techniques

df['Average_of_ind_score_dev']=[s/4 for s in df['sum of individual student deviations']]


vals=df['Average_of_ind_score_dev'].value_counts().keys().tolist()
counts=df['Average_of_ind_score_dev'].value_counts().tolist()
plt.bar(vals,counts)
plt.title("students shown actual deviation from Average_of_ind_score_dev student
deviations Score")
plt.xlabel("Standard Deviations")
plt.ylabel("no of students")
plt.show()
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# if using a Jupyter notebook, inlcude:
%matplotlib inline
mu=df['Average_of_ind_score_dev'].mean()
sigma=df['Average_of_ind_score_dev'].std()
x1=df['Average_of_ind_score_dev'].min()
x2=df['Average_of_ind_score_dev'].max()
# calculate the z-transform
z1 = ( x1 - mu ) / sigma
z2 = ( x2 - mu ) / sigma
x = np.arange(z1, z2, 0.001) # range of x in spec
y = norm.pdf(x,0,1)# mean = 0, stddev = 1, since Z-transform was calculated
# build the plot
fig, ax = plt.subplots(figsize=(9,6))
plt.style.use('fivethirtyeight')
ax.fill_between(x,y,0, alpha=0.3, color='b')
ax.set_xlim([-4,4])
ax.set_xlabel('# of Standard Deviations Outside the Mean')
ax.set_yticklabels([])
ax.set_title('Normal Gaussian Curve')
plt.show()
mu
x1,x2
Department of CSE, VITW 54
Campus Placement Prediction Using Supervised Machine Learning Techniques

df['Average_of_ind_score_dev'].describe()
#selecting values above mean and the range we set
df['select_50_above_mean']=[s>mu*0.50 for s in df['Average_of_ind_score_dev']]
#values fall in rage of
print('select range,max,min ',mu*0.50 ,x2,mu)
df['select_50_above_mean'].value_counts()
df['selected']=df.iloc[:,-1:]*1
df.columns
import numpy as np
df['unselected']=np.logical_not(df['selected'].astype(int))
df['unselected']=df['unselected']*1
df.iloc[:,-2:]
#Basic requirements
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn import metrics
df1=pd.read_csv('fromscore.csv')
X=df1
y=df1['is_selected']
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=0.2,random_state=100)
#----------------------------------------------------------------------------------------------------
#SupportvectorClassifier
from sklearn.svm import SVC
model = SVC(gamma='scale')
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
y_pred = model.predict(X_test)
print("classcification model SVM accuracy(in %):", metrics.accuracy_score(Y_test,
y_pred) * 100)
Department of CSE, VITW 55
Campus Placement Prediction Using Supervised Machine Learning Techniques

print("Accuracy: %.2f%%" % (result*100.0))


#------------------------------------------------------------------------------------------------------
#RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier
model1=RandomForestClassifier()
model1.fit(X_train,Y_train)
res1=model1.score(X_test,Y_test)
y_pred1= model1.predict(X_test)
print("classcification model random forest accuracy(in %):",
metrics.accuracy_score(Y_test, y_pred1) * 100)
print("Accuracy: %.2f%%" % (res1*100.0))
sns.set(rc={'axes.facecolor':'cyan', 'figure.facecolor':'w'})
plt.figure(figsize=[15,7])
sns.lineplot(x=["SVC","RFC"],y=[89,100] ,markers=True, dashes=True,palette="Set1",
lw=8,markersize=10)
import seaborn as sns
#sns.set_theme(style="whitegrid")
plt.figure(figsize=[15,7])
ax = sns.barplot(x=["SVC","RFC"],y=[89,100])
print("not selected:\n",df.unselected.value_counts())
print("reversing values \n")
print("selected candidate:\n",df.selected.value_counts())
uns=df['unselected'].to_numpy()
uns.shape
df.columns
subj_dev=df[['Aptitude_stddev','Technicalskills_stddev',
'Programming_stddev','Communication_stddev']].to_numpy()
subj_dev.shape
subj_mat=subj_dev.transpose()
#df_std_sub=df[['std_skilltest_indv_C','std_skilltest_indv_P','std_skilltest_indv_A','std_s
killtest_indv_T']]
#df_std_sub.shape
print("transpose matrix 'subj_mat ': ",subj_mat.shape)
print("unselected student matrix shape 'uns ':]",uns.shape)
Department of CSE, VITW 56
Campus Placement Prediction Using Supervised Machine Learning Techniques

print("reason students not selected due to marginal deviation in test skills")


#select unsele,ct
subj_mat.shape
uns=uns.reshape(484,1)
uns.shape
weak_std= subj_dev*uns
weak_std.shape
#weak_std=weak_std.transpose()
weak_std.shape
df.to_csv('final_dataset.csv')
#read for weaker subject classification
df_s=pd.read_csv('final_dataset.csv')
df_s.columns
dd_s=df[['Aptitude_stddev','Technicalskills_stddev', 'Programming_stddev',
'Communication_stddev']]
weak_df1=pd.DataFrame(weak_std,columns=['Aptitude_stddev','Technicalskills_stddev',
'Programming_stddev', 'Communication_stddev'])
weak_df1.loc[16]
a=int(input("enter student roll no:"))
weak_df1.loc[a]
print("overall statistics of student profile")
inp=int(input("enter user number:"))
df_s.loc[inp]

Department of CSE, VITW 57


CHAPTER 8
TESTING
Campus Placement Prediction Using Supervised Machine Learning Techniques

8. TESTING

8.1 Testing Methodologies


Software testing methodologies are the various strategies or approaches used to
test an application to ensure it behaves and look as expected. These encompass
everything from front to back-end testing, including unit and system testing. This article
is designed to highlight the myriad of testing techniques used by quality assurance
professionals.

Unit Testing:
Unit testing is the first level of testing and is often performed by the developers
themselves. It is the process of ensuring individual components of a piece of software at
the code level are functional and work as they were designed to. Developers in a test-
driven environment will typically write and run the tests prior to the software or feature
being passed over to the test team. Unit testing can be conducted manually, but
automating the process will speed up delivery cycles and expand test coverage.

Integration Testing:
After each unit is thoroughly tested, it is integrated with other units to create
modules or components that are designed to perform specific tasks or activities. These are
then tested as group through integration testing to ensure whole segments of an
application behave as expected (i.e, the interactions between units are seamless).

System Testing:
System testing is a black box testing method used to evaluate the completed and
integrated system, as a whole, to ensure it meets specified requirements. The functionality
of the software is tested from end-to-end and is typically conducted by a separate testing
team than the development team before the product is pushed into production.

8.2 Testing Activities


In order to make sure that the system does not have errors, the different levels of
testing strategies that are applied at differing phases of software development are:

Department of CSE, VITW 58


Campus Placement Prediction Using Supervised Machine Learning Techniques

Unit Testing:
Unit Testing is done on individual modules as they are completed and become
executable. It is confined only to the designer's requirements.

8.3 Types of Testing


Each module can be tested using the following two Strategies

8.3.1 Black Box Testing


In this strategy some test cases are generated as input conditions that fully execute
all functional requirements for the program. This testing has been uses to find errors in
the following categories:
 Incorrect or missing functions
 Interface errors
 Errors in data structure or external database access
 Performance errors
 Initialization and termination errors.

Department of CSE, VITW 59


Campus Placement Prediction Using Supervised Machine Learning Techniques

8.3.2 White Box testing


In this the test cases are generated on the logic of each module by drawing flow
graphs of that module and logical decisions are tested on all the cases. It has been uses to
generate the test cases in the following cases guarantee that all independent paths have
been executed.
 Execute all logical decisions on their true and false Sides.
 Execute all loops at their boundaries and within their operational bound.
 Execute internal data structures to ensure their validity.

Integrating Testing
Integration testing ensures that software and subsystems work together a whole. It
tests the interface of all the modules to make sure that the modules behave properly when
integrated together. In this case the communication between the device and Google
Translator Service.

System Testing
Involves in-house testing in an emulator of the entire system before delivery to the
user. It's aim is to satisfy the user the system meets all requirements of the Client through
any standard browser's specifications.

Department of CSE, VITW 60


Campus Placement Prediction Using Supervised Machine Learning Techniques

Acceptance Testing
It is a pre-delivery testing in which entire system is tested in a real androiddevice
on real world data and usage to find errors.

Test Approach
Testing can be done in two ways
 Bottom up approach
 Top down approach

Bottom up Approach
Testing can be performed starting from smallest and lowest level modules and
proceeding one at a time. For each module in bottom up testing a short program executes
the module and provides the needed data so that the module is asked to perform the way it
will when embedded within the larger system.

Top down approach


This type of testing starts from upper level modules. Since the detailed activities
usually performed in the lower level routines are not provided stubs are written. A stub is
a module shell called by upper level module and that when reached properly will return a
message to the calling module indicating that proper interaction occurred. No attempt is
made to verify the correctness of the lower level module.

Validation
The system has been tested and implemented successfully and thus ensured that
all the requirements as listed in the software requirements specification are completely
fulfilled. In case of erroneous input corresponding error messages are displayed.

Department of CSE, VITW 61


CHAPTER 9
OUTPUT SCREENS
Campus Placement Prediction Using Supervised Machine Learning Techniques

9. SCREENSHOTS

Department of CSE, VITW 62


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 63


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 64


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 65


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 66


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 67


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 68


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 69


Campus Placement Prediction Using Supervised Machine Learning Techniques

Department of CSE, VITW 70


CHAPTER 10
CONCLUSION
Campus Placement Prediction Using Supervised Machine Learning Techniques

10. CONCLUSION

Placement prediction system is a system which predicts the placement status of


final year B-Tech students. For data analysis and prediction different machine learning
algorithms are used in the python environment. We analyze the accuracy of different
algorithms a work has been analyzed and predicted using the classification algorithms
Support vector machine and the Random Forest algorithm to validate the approaches. The
algorithms are applied on the data set and attributes used to build the model. The accuracy
obtained after analysis for Support Vector machine is 84% and for the Random Forest is
86%. Hence, from the above said analysis and prediction it’s better if the Random Forest
algorithm is used to predict the placement results.

Department of CSE, VITW 71


CHAPTER 11
FUTURE SCOPE
Campus Placement Prediction Using Supervised Machine Learning Techniques

11. FUTURE SCOPE

The future enhancements of the project are to focus on adding some


more parameters to predict more well-organized placement status. We can
also enhance the project by predicting some solutions or suggestions for
the output generated by the system.

Department of CSE, VITW 72


CHAPTER 12
BIBLIOGRAPHY
Campus Placement Prediction Using Supervised Machine Learning Techniques

12. BIBLIOGRAPHY
References:
1. Mangasuli Sheetal B, Prof. Savita Bakare “Prediction of Campus Placement Using
Data Mining AlgorithmFuzzy logic and K nearest neighbour” International Journal of
Advanced Research in Computer and Communication Engineering Vol. 5, Issue 6,
June2016.
2. Ajay Shiv Sharma, Swaraj Prince, Shubham Kapoor, Keshav Kumar “PPS-Placement
prediction system using logistic regression” IEEE international conference on
MOOC,innovation and Technology in Education(MITE), December 2014.
3. Jai Ruby, Dr. K. David “Predicting the Performance of Students in Higher Education
Using Data Mining Classification Algorithms - A Case Study” International Journal
for Research in Applied Science & Engineering Technology (IJRASET) Vol. 2,Issue
11,November 2014.
4. Ankita A Nichat, Dr.Anjali B Raut “Predicting and Analysis of Student Performance
Using Decision Tree Technique” International Journal of Innovative Research in
Computer and Communication Engineering V0l. 5, Issue 4, April 2017.
5. Oktariani Nurul Pratiwi “Predicting Student Placement Class using Data Mining”
IEEE International Conference 2013.
6. Ajay Kumar Pal and Saurabh Pal, “Classification Model of Prediction for Placement
of Students”, I. J. Modern Education and Computer Science, 2013, 11, 49-56.
7. Ravi Tiwari and Awadhesh Kumar Sharma, “A Data Mining Model to Improve
Placement”, International Journal of Computer Applications (0975 – 8887) Volume
120 – No.12, June 2015.
8. Ms.sonal patil, Mr.Mayur Agrawal, Ms.Vijaya R. Baviskar “Efficient Processing of
Decision Tree using ID3 & improved C4.5 Algorithm”, International Journal of
Computer Science and Information Technologies, Vol. 6 (2) , 2015, 1956-1961.

Department of CSE, VITW 73

You might also like