0% found this document useful (0 votes)
127 views59 pages

Mini Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views59 pages

Mini Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

STUDENT PERFORMANCE ANALYSIS BASED

ON MACHINE LEARNING
A Project report Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, KAKINADA
in partial fulfillment of the requirement
For the award of Degree of
BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE&ENGINEERING
BY
K.SAI KALPANA—(19KQ5A0501)

B.MAHESH BABU—(18KQ1A0537)

G.TULASI RAM—(18KQ1A0549)

Under the Esteemed Guidance of


Mr B.Srinivasulu.M.Tech,(Ph.D)
Assistant Professor
Dept of CSE

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PACE INSTITUTE OF TECHNOLOGY & SCIENCES


(Approved by A.I.C.T.E., New Delhi & Govt of Andhra Pradesh, Affiliated to JNTU Kakinada)
ACCREDITED BY NAAC WITH ‘A’ GRADE & NBA(An ISO 9001:2015 Certified Institution)
NH-5,Near Valluramma Temple,ONGOLE-523272,Contact No:08592-201167,www.pace.ac.in

2018-2022
PACE INSTITUTE OF TECHNOLOGY & SCIENCES
(Approved by A.I.C.T.E., New Delhi & Govt of Andhra Pradesh, Affiliated to JNTU Kakinada)
ACCREDITED BY NAAC WITH ‘A’ GRADE & NBA (An ISO 9001:2015 Certified Institution)
NH-5,Near Valluramma Temple,ONGOLE-523272,Contact No:08592-201167,www.pace.ac.in

CERTIFICATE

This is to certify that the project entitled “STUDENT PERFORMANCE ANALYSIS BASED
ON MACHINE LEARNING” is a bonafide work done by KONKALA SAI KALPANA
(Regd No 19KQ5A0501), BADUKURI MAHESH BABU (Regd No 18KQ1A0537), GORANTLA
TULASI RAM (Regd No 18KQ1A0549) , submitted in the partial fulfillment of the requirements for the award
of Degree of Bachelor of Technology in Computer Science and Engineering during the academic years 2017-2021.
The results embodied in this project report have not been submitted to any other University or Institute for the award
of any other degree or diploma.

Internal Guide Head of Department


Mr B.Srinivasulu,M.Tech,(Ph.D) Dr M Srinivasa Rao, M.Tech.,Ph.D
Assistant Professor Associate Professor & HOD
Dept of CSE Dept of CSE

Internal Examinar External Examiner


PACE INSTITUTE OF TECHNOLOGY & SCIENCES
(Approved by A.I.C.T.E., New Delhi & Govt of Andhra Pradesh, Affiliated to JNTU Kakinada)
ACCREDITED BY NAAC WITH ‘A’ GRADE & NBA (An ISO 9001:2015 Certified Institution)
NH-5,Near Valluramma Temple,ONGOLE-523272,Contact No:08592-201167,www.pace.ac.in

DECLARATION

We KONKALA SAI KALPANA (Regd No 19KQ5A0501), BADUKURI MAHESH BABU (Reg


NO18KQ1A0537),GORANTLA TULASI RAM(Regd No 18KQ1A0549) are hereby declare that
the project report titled “STUDENT PERFORMANCE ANALYSIS BASED ON MACHINE
LEARNING under the guidance of Mr B.SRINIVASULU..M.Tech..(Ph.D), Assistant Professor,
Computer Science & Engineering is submitted in partial fulfillment of the requirements for the
award of the Degree of Bachelor of Technology in Computer Science and Engineering.
This is a record of bonafide work carried out by us and the results embodied in this project
report have not been reproduced or copied from any source. The results embodied in this
project report have not been submitted to any other University or Institute for the award of
any other degree or diploma.

K.SAI KALPANA— (19KQ5A0501)


PLACE:-
DATE:-
B.MAHESH BABU — (18KQ1A0537)

G.TULASI RAM — (18KQ1A0549)


PACE INSTITUTE OF TECHNOLOGY & SCIENCES
(Approved by A.I.C.T.E., New Delhi & Govt of Andhra Pradesh, Affiliated to JNTU
Kakinada)
ACCREDITED BY NAAC WITH ‘A’ GRADE & NBA (An ISO 9001:2015 Certified Institution)
NH-5,Near Valluramma Temple,ONGOLE-523272,Contact No:08592-
201167,www.pace.ac.in

ACKNOWLEDGEMENT

We are ineffably indebted to our honorable chairman Er.M.VenuGopal Rao,


B.E,MBA,D.M.M and our honorable Secretary & Correspondent Er.M.Sridhar, B.E
and our beloved Principal Dr.M.Sreenivasan,M.S, Ph.D, for their kind support and
help.
We would like to extend my sincere and heartfelt thanks to Dr.M.Srinivasa
Rao, M.Tech.,Ph.D. Head of Computer Science & Engineering Department, who
has given valuable advice and suggestions and has encouraged us by providing all the
facilities necessary to complete this Seminar Report successfully.
We take this opportunity to express my deep respect and profound gratitude to
our esteemed guide MR B.Srinivasulu M.Tech..,Ph.D, Assistant Professor,
Computer Science & Engineering , PACE Institute Of Technology & Sciences,
Valluru, for his exemplary guidance, monitoring and constant encouragement
throughout the course of this Seminar Report.
We are thankful to all the Professors and Faculty Members in the department
for their teachings and academic support and thanks to Technical Staff and Non-
teaching staff in the department for their support.
We also acknowledge with a deep sense of reverence, my gratitude towards
my parents, members of my family and friends for their constant support and
encouragement.

K.SAI KALPANA(19KQ5A0501)
B.MAHESHBABU(18KQ1A0537)

G.TULASI RAM(18KQ1A0549)
INDEX
TITLES PAGE NO:

CONTENTS

1. ABSTRACT 07

2. SYSTEM ANALYSIS 08-09


2.1 EXISTING SYSTEM

2.2 PROPOSED SYSTEM


3. LITERATURE SURVEY 10-11

4. SYSTEM LOW LEVEL DESIGN 12-13


4.1 MODULES

5. DATA FLOW DIAGRAM 14-16

6. SYSTEM DESIGN 17-18


6.1 UML DIAGRAMS

6.1.1 USE CASE DIAGRAM

6.1.2 CLASS DIAGRAM

6.1.3 SEQUENCE DIAGRAM

6.1.4 ACTIVITY DIAGRAM

7. ALGORITHM 19-35

8. SOFTWARE REQUIREMENTS 36-46


8.1 ANACONDA NAVIGATOR
8.2 PYTHON
8.3 NUMPY

9. IMPLEMENTATION 47-52

9.1 SAMPLE CODE


10. RESULTS AND SCREENSHOTS 53-59
9.1 RESULTS

9.2 SCREENSHOTS

11. FUTURE WORK 60

12. CONCLUSION 60
1. ABSTRACT

Performance analysis of outcome based on learning is a system which will strive for excellence at

different levels and diverse dimensions in the field of student’s interests. This paper proposes a complete EDM

framework in a form of a rule based recommender system that is not developed to analyze and predict the student’s

performance only, but also to exhibit the reasons behind it. The proposed framework analyzes the students’

demographic data, study related and psychological characteristics to extract all possible knowledge from students,

teachers and parents. Seeking the highest possible accuracy in academic performance prediction using a set of powerful

data mining techniques. The framework succeeds to highlight the student’s weak points and provide appropriate

recommendations. The realistic case study that has been conducted on 200 students proves the outstanding

performance of the proposed framework in comparison with the existing ones.


2. SYSTEM ANALYSIS

2.1 EXISTING SYSTEM :-

The previous predictive models only focused on using the student’s demographic data like gender, age,

family status, family income and qualifications. In addition to the study related attributes including the homework and

study hours as well as the previous achievements and grades. These previous work were only limited to provide the

prediction of the academic success or failure, without illustrating the reasons of this prediction. Most of the previous

researches have focused to gather more than 40 attributes in their data set to predict the student’s academic

performance. These attributes were from the same type of data category whether demographic, study related attributes

or both, that lead to lack of diversity of predicting rules.

Disadvantage:

● As a result, these generated rules did not fully extract the knowledge for the reasons behind the student’s

dropout.

● Apart from the previously mentioned work, there were previous statistical analysis models from the perspective

of educational psychology that conducted a couple of studies to examine the correlation between the mental health

and theacademic performance.

● The type of the recommendations was too brief, they missed illustrating the methodologies to apply them.
2.2 PROPOSED SYSTEM:-

The proposed framework firstly focuses on merging the demographic and study related attributes with

the educational psychology fields, by adding the student’s psychological characteristics to the previously used data set

(i.e., the students’ demographic data and study related ones). After surveying the previously used factors for predicting

the student’s academic performance, we picked the most relevant attributes based on their rationale and correlation

with the academic performance.

Advantage:

● The proposal aims to analyze student’s demographic data, study related details and psychological characteristics in

terms of final state to figure whether the student is on the right track or struggling or even failing. In addition to

extensive comparison of our proposed model with the other previous related models.

System Architecture:
3. LITERATURE SURVEY

Title: Learning patterns of university student retentiAuthor: Nandeshwar, T. Menzies and A. Nelson on.
Learning predictors for student retention is very difficult. After reviewing the literature, it is evident that there is

considerable room for improvement in the current state of the art. As shown in this paper, improvements are

possible if we explore a wide range of learning methods; take care when selecting attributes; assess the efficacy of

the learned theory not just by its median performance, but also by the variance in that performance; study the delta

of student factors between those who stay and those who are retained. Using these techniques, for the goal of

predicting if students will remain for the first three years of an undergraduate degree, the following factors were

found to be informative: family background and family's social-economic status, high school GPA and test scores.

Author: G. Kesavaraj and S. Sukumaran


Title: A study on classification techniques in data mining
Data mining is a process of inferring knowledge from such huge data. Data Mining has three major components

Clustering or Classification, Association Rules and Sequence Analysis. By simple definition, in

classification/clustering analyze a set of data and generate a set of grouping rules which can be used to classify

future data. Data mining is the process is to extract information from a data set and transform it into an

understandable structure. It is the computational process of discovering patterns in large data sets involving

methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The actual

data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously

unknown interesting patterns. Data mining involves six common classes of tasks. Anomaly detection, Association

rule learning, Clustering, Classification, Regression, Summarization. Classification is a major technique in data

mining and widely used in various fields. Classification is a data mining (machine learning) technique used to

predict group membership for data instances. In this paper, we present the basic classification techniques. Several

major kinds of classification method including decision tree induction, Bayesian networks, k-nearest neighbor

classifier, the goal of this study is to provide a comprehensive review of different classification techniques in data

mining.
Author: Salam Ismaeel, Ali Miri et al
Title: Using the Extreme Learning Machine (ELM) technique for heart disease diagnosis
One of the most important applications of machine learning systems is the diagnosis of heart disease which affect

the lives of millions of people. Patients suffering from heart disease have lot of independent factors such as age,

sex, serum cholesterol, blood sugar, etc. in common which can be used very effectively for diagnosis. In this paper

an Extreme Learning Machine (ELM) algorithm is used to model these factors. The proposed system can replace

a costly medical checkup with a warning system for patients of the probable presence of heart disease. The system

is implemented on real data collected by the Cleveland Clinic Foundation where around 300 patient’s information

has been collected. Simulation results show this architecture has about 80% accuracy in determining heart disease.

Author: Shadab Adam Pattekari and Asma Parveen


Title: Prediction System for heart disease using Naïve Bayes
The main objective of this research is to develop an Intelligent System using data mining modeling technique,

namely, Naive Bayes. It is implemented as web based application in this user answers the predefined questions. It

retrieves hidden data from stored database and compares the user values with trained data set. It can answer

complex queries for diagnosing heart disease and thus assist healthcare practitioners to make intelligent clinical

decisions which traditional decision support systems cannot. By providing effective treatments, it also helps to

reduce treatment costs.

Author: N. Oliver and F. F. Mangas


Title: Health Gear: a real-time wearable system for monitoring and analyzing physiological signals

We present Health Gear, a real-time wearable system for monitoring, visualizing and analyzing physiological

signals. Health Gear consists of a set of noninvasive physiological sensors wirelessly connected via Bluetooth to a

cell phone which stores, transmits and analyzes the physiological data, and presents it to the user in an intelligible

way. In this paper, we focus on an implementation of HealthGear using a blood oximeter to monitor the user's

blood oxygen level and pulse while sleeping. We also describe two different algorithms for automatically detecting

sleep apnea events, and illustrate the performance of the overall system in a sleep study with 20 volunteers.
4. SYSTEM LOW LEVEL DESIGN

4.1 MODULES

• DATA COLLECTION

• DATA PRE-PROCESSING

• FEATURE EXTRATION

• EVALUATION MODEL

DATA COLLECTION

Data used in this paper is a set of student details in the school records. This step is concerned with selecting

the subset of all available data that you will be working with. ML problems start with data preferably, lots of data

(examples or observations) for which you already know the target answer. Data for which you already know the target

answer is called labelled data.

DATA PRE-PROCESSING

Organize your selected data by formatting, cleaning and sampling from it. Three common data pre-processing

steps are:

1. Formatting

2. Cleaning

3. Sampling

Formatting:The data you have selected may not be in a format that is suitable for you to work with. The data may

be in a relational database and you would like it in a flat file, or the data may be in a proprietary file format and

you would like it in a relational database or a text file.

Cleaning: Cleaning data is the removal or fixing of missing data. There may be data instances that are incomplete

and do not carry the data you believe you need to address the problem. These instances may need to be removed.
Additionally, there may be sensitive information in some of the attributes and these attributes may need to be

anonym zed or removed from the data entirely.

Sampling: There may be far more selected data available than you need to work with. More data can result in

much longer running times for algorithms and larger computational and memory requirements. You can take a

smaller representative sample of the selected data that may be much faster for exploring and prototyping solutions

before considering the whole dataset.

FEATURE EXTRATION

Next thing is to do Feature extraction is an attribute reduction process. Unlike feature selection, which ranks the

existing attributes according to their predictive significance, feature extraction actually transforms the attributes.

The transformed attributes, or features, are linear combinations of the original attributes. Finally, our models are

trained using Classifier algorithm. We use classify module on Natural Language Toolkit library on Python. We use

the labelled dataset gathered. The rest of our labelled data will be used to evaluate the models. Some machine

learning algorithms were used to classify pre-processed data. The chosen classifiers were Random forest. These

algorithms are very popular in text classification tasks.

EVALUATION MODEL Model

Evaluation is an integral part of the model development process. It helps to find the best model that represents our

data and how well the chosen model will work in the future. Evaluating model performance with the data used for

training is not acceptable in data science because it can easily generate overoptimistic and over fitted models. There

are two methods of evaluating models in data science, HoldOut and Cross-Validation to avoid over fitting, both

methods use a test set (not seen by the model) to evaluate model performance. Performance of each classification

model is estimated base on its averaged. The result will be in the visualized form. Representation of classified data

in the form of graphs. Accuracy is defined as the percentage of correct predictions for the test data. It can be

calculated easily by dividing the number of correct predictions by the number of total predictions.
5. DATA FLOW DIAGRAMS
D at abet

featu re
Exrract'o n

Ay or ith m
E DA

dataset

Trai n
Algorithm

predñ:t
Reu It
6. SYSTEM DESIGN

6.1 UML DIAGRAMS

6.1.1 USE CASE DIAGRAM

6.1.2 CLASS DIAGRAM


6.1.3 SEQUENCE DIAGRAM

6.1.4 ACTIVITY DIAGRAM


ALGORITHM
Logistic regression:- Logistic regression is a supervised learning classification algorithm used to predict the

probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would

be only two possible classes. In simple words, the dependent variable is binary in nature having data coded as either 1

(stands for success/yes) or 0 (stands for failure/no). Mathematically, a logistic regression model predicts P(Y=1) as a

function of X. It is one of the simplest ML algorithms that can be used for various classification problems such as

spam detection, Diabetes prediction, cancer detection etc.

Types of Logistic Regression

Generally, logistic regression means binary logistic regression having binary target variables, but there can be two

more categories of target variables that can be predicted by it. Based on those number of categories, Logistic regression

can be divided into following types –

Binary or Binomial :In such a kind of classification, a dependent variable will have only two possible types either 1

and 0. For example, these variables may represent success or failure, yes or no, win or loss etc.

Multinomial :In such a kind of classification, dependent variable can have 3 or more possible unordered types or the

types having no quantitative significance. For example, these variables may represent “Type A” or “Type B” or “Type

C”.

Ordinal :In such a kind of classification, dependent variable can have 3 or more possible ordered types or the types

having a quantitative significance. For example, these variables may represent “poor” or “good”, “very good”,

“Excellent” and each category can have the scores like 0,1,2,3.
HOW logistic regression WORKS
The following are the basic steps involved in performing the random forest algorithm

Logistic regression uses an equation as the representation, very much like linear regression. Input values (x) are

combined linearly using weights or coefficient values (referred to as the Greek capital letter Beta) to predict an output

value (y).

ADVANTAGES OF USING logistic regression:-

● Logistic regression is easier to implement, interpret, and very efficient to train.

● It makes no assumptions about distributions of classes in feature space.

● It can easily extend to multiple classes(multinomial regression) and a natural probabilistic view of class predictions.

● It not only provides a measure of how appropriate a predictor (coefficient size) Is, but also its direction of association

(positive or negative)

● It is very fast at classifying unknown records.

● Good accuracy for many simple datasets and it performs well when the dataset is linearly separable.

● It can interpret model coefficients as indicators of feature importance.

● Logistic regression is less inclined to over-fitting but it can overfit in high dimensional datasets. One may consider

regularization (L1 and L2 ) techniques to avoid over fitting in these scenarios.

Domain Specification

MACHINE LEARNING
Machine Learning is a system that can learn from example through selfimprovement and without being explicitly
coded by programmer. The breakthrough comes with the idea that a machine can singularly learn from the data (i.e.,
example) to produce accurate results.
Machine learning combines data with statistical tools to predict an output. This output is then used by corporate to
makes actionable insights. Machine learning is closely related to data mining and Bayesian predictive modeling. The
machine receives data as input, use an algorithm to formulate answers.
A typical machine learning tasks are to provide a recommendation. For those who have a Netflix account, all
recommendations of movies or series are based on the user's historical data. Tech companies are using unsupervised
learning to improve the user experience with personalizing recommendation. Machine learning is also used for a
variety of task like fraud detection, predictive maintenance, portfolio optimization, automatize task and so on.

Machine Learning vs. Traditional Programming


Traditional programming differs significantly from machine learning. In traditional programming, a programmer
code all the rules in consultation with an expert in the industry for which software is being developed. Each rule is
based on a logical foundation; the machine will execute an output following the logical statement. When the system
grows complex, more rules need to be written. It can quickly become unsustainable to maintain.

How does Machine learning work?


Machine learning is the brain where all the learning takes place. The way the machine learns is similar to the human
being. Humans learn from experience. The more we know, the more easily we can predict. By analogy, when we face
an unknown situation, the likelihood of success is lower than the known situation. Machines are trained the same. To
make an accurate prediction, the machine sees an example. When we give the machine a similar example, it can figure
out the outcome. However, like a human, if its feed a previously unseen example, the machine has difficulties to
predict.
The core objective of machine learning is the learning and inference. First of all, the machine learns through the
discovery of patterns. This discovery is made thanks COMPUTER to the data. One crucial part of the data scientist is
to choose carefully which data to provide to the machine. The list of attributes used to solve a problem is called a
feature vector. You can think of a feature vector as a subset of data that is used to tackle a problem.
The machine uses some fancy algorithms to simplify the reality and transform this discovery into a model. Therefore,
the learning stage is used to describe the data and summarize it into a model.

For instance, the machine is trying to understand the relationship between the wage of an individual and the likelihood
to go to a fancy restaurant. It turns out the machine finds a positive relationship between wage and going to a high-
end restaurant: This is the model Inferring When the model is built, it is possible to test how powerful it is on never-
seen-before data. The new data are transformed into a features vector, go through the model and give a prediction.
This is all the beautiful part of machine learning. There is no need to update the rules or train again the model. You
can use the model previously trained to make inference on new data.

The life of Machine Learning programs is straightforward and can be summarized in the following points:
1. Define a question
2. Collect data
3. Visualize data
4. Train algorithm
5. Test the Algorithm
6. Collect feedback
7. Refine the algorithm
8. Loop 4-7 until the results are satisfying
9. Use the model to make a prediction
Once the algorithm gets good at drawing the right conclusions, it applies that knowledge to new sets of data.
Machine learning Algorithms and where they are used?

Machine learning can be grouped into two broad learning tasks: Supervised and Unsupervised. There are many other
algorithms
Supervised learning
An algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output.
For instance, a practitioner can use marketing expense and weather forecast as input data to predict the sales of cans.
You can use supervised learning when the output data is known. The algorithm will predict new data.
There are two categories of supervised learning:
● Classification task
● Regression task
Classification
Imagine you want to predict the gender of a customer for a commercial. You will start gathering data on the height,
weight, job, salary, purchasing basket, etc. from your customer database. You know the gender of each of your
customer, it can only be male or female. The objective of the classifier will be to assign a probability of being a male
or a female (i.e., the label) based on the information (i.e., features you have collected). When the model learned how
to recognize male or female, you can use new data to make a prediction. For instance, you just got new information
from an unknown customer, and you want to know if it is a male or female. If the classifier predicts male = 70%, it
means the algorithm is sure at 70% that this customer is a male, and 30% it is a female.
The label can be of two or more classes. The above example has only two classes, but if a classifier needs to predict
object, it has dozens of classes (e.g., glass, table, shoes, etc. each object represents a class)
Regression
When the output is a continuous value, the task is a regression. For instance, a financial analyst may need to forecast
the value of a stock based on a range of feature like equity, previous stock performances, macroeconomics index.
The system will be trained to estimate the price of the stocks with the lowest possible error.
Unsupervised learning

In unsupervised learning, an algorithm explores input data without being given an explicit output variable (e.g.,

explores customer demographic data to identify patterns) You can use it when you do not know how to classify the

data, and you want the algorithm to find patterns and classify the data for you
Application of Machine learning

Augmentation:

● Machine learning, which assists humans with their day-to-day tasks, personally or commercially without having

complete control of the output. Such machine learning is used in different ways such as Virtual Assistant, Data

analysis, software solutions. The primary user is to reduce errors due to human bias.

Automation:

● Machine learning, which works entirely autonomously in any field without the need for any human intervention.

For example, robots performing the essential process steps in manufacturing plants. Finance Industry

● Machine learning is growing in popularity in the finance industry. Banks are mainly using ML to find patterns

inside the data but also to prevent fraud.


Government organization

● The government makes use of ML to manage public safety and utilities. Take the example of China with the

massive face recognition. The government uses Artificial intelligence to prevent jaywalker. Healthcare industry

● Healthcare was one of the first industry to use machine learning with image detection.

Marketing

● Broad use of AI is done in marketing thanks to abundant access to data. Before the age of mass data, researchers

develop advanced mathematical tools like Bayesian analysis to estimate the value of a customer. With the boom of

data, marketing department relies on AI to optimize the customer relationship and marketing campaign.

Example of application of Machine Learning in Supply Chain

Machine learning gives terrific results for visual pattern recognition, opening up many potential applications in

physical inspection and maintenance across the entire supply chain network. Unsupervised learning can quickly

search for comparable patterns in the diverse dataset. In turn, the machine can perform quality inspection throughout

the logistics hub, shipment with damage and wear.

For instance, IBM's Watson platform can determine shipping container damage. Watson combines visual and

systems-based data to track, report and make recommendations in real-time.

In past year stock manager relies extensively on the primary method to evaluate and forecast the inventory. When

combining big data and machine learning, better forecasting techniques have been implemented (an improvement of

20 to 30 % over traditional forecasting tools). In term of sales, it means an increase of 2 to 3 % due to the potential

reduction in inventory costs.

Example of Machine Learning Google Car

For example, everybody knows the Google car. The car is full of lasers on the roof which are telling it where it is

regarding the surrounding area. It has radar in the front, which is informing the car of the speed and motion of all the

cars around it. It uses all of that data to figure out not only how to drive the car but also to figure out and predict
what potential drivers around the car are going to do. What's impressive is that the car is processing almost a

gigabyte a second of data.

Deep Learning

Deep learning is a computer software that mimics the network of neurons in a brain. It is a subset of machine

learning and is called deep learning because it makes use of deep neural networks. The machine uses different layers

to learn from the data. The depth of the model is represented by the number of layers in the model. Deep learning is

the new state of the art in term of AI. In deep learning, the learning phase is done through a neural network.

Reinforcement Learning

Reinforcement learning is a subfield of machine learning in which systems are trained by receiving virtual

"rewards" or "punishments," essentially learning by trial and error. Google's DeepMind has used reinforcement

learning to beat a human champion in the Go games. Reinforcement learning is also used in video games to improve

the gaming experience by providing smarter bot.

One of the most famous algorithms are:

● Q-learning

● Deep Q network

● State-Action-Reward-State-Action (SARSA)

● Deep Deterministic Policy Gradient (DDPG)


Applications/ Examples of deep learning applications

AI in Finance: The financial technology sector has already started using AI to save time, reduce costs,
and add value. Deep learning is changing the lending industry by using more robust credit scoring. Credit
decision-makers can use AI for robust credit lending applications to achieve faster, more accurate risk
assessment, using machine intelligence to factor in the character and capacity of applicants.

Underwrite is a Fintech company providing an AI solution for credit makers company. underwrite.ai
uses AI to detect which applicant is more likely to pay back a loan. Their approach radically outperforms
traditional methods.

AI in HR: Under Armour, a sportswear company revolutionizes hiring and modernizes the candidate
experience with the help of AI. In fact, Under Armour Reduces hiring time for its retail stores by 35%.
Under Armour faced a growing popularity interest back in 2012. They had, on average, 30000 resumes
a month. Reading all of those applications and begin to start the screening and interview process was
taking too long. The lengthy process to get people hired and on-boarded impacted Under Armour's ability
to have their retail stores fully staffed, ramped and ready to operate.

At that time, Under Armour had all of the 'must have' HR technology in place such as transactional
solutions for sourcing, applying, tracking and onboarding but those tools weren't useful enough. Under
armour choose HireVue, an AI provider for HR solution, for both on-demand and live interviews. The
results were bluffing; they managed to decrease by 35% the time to fill. In return, the hired higher quality
staffs.

AI in Marketing: AI is a valuable tool for customer service management and personalization


challenges. Improved speech recognition in call-center management and call routing as a result of the
application of AI techniques allows a more seamless experience for customers.

For example, deep-learning analysis of audio allows systems to assess a customer's emotional tone. If
the customer is responding poorly to the AI chatbot, the system can be rerouted the conversation to
real, human operators that take over the issue.

Apart from the three examples above, AI is widely used in other sectors/industries.

Difference between Machine Learning and Deep Learning


Machine Learning Deep Learning
Data Excellent performances on a Excellent performance on a b
Dependencies small/medium dataset dataset

Hardware Work on a low-end machine. Requires powerful machin


dependencies preferably with GPU: DL performs
significant amount of matr
multiplication

Feature Need to understand the features that No need to understand the be


engineering represent the data feature that represents the data

Execution time From few minutes to hours Up to weeks. Neural Network nee
to compute a significant number
weights

Interpretability Some algorithms are easy to interpret Difficult to impossible


(logistic, decision tree), some are
almost impossible (SVM, XGBoost)

When to use ML or DL?

In the table below, we summarize the difference between machine learning and deep learning.
Machine learning Deep learning

Training dataset Small Large

Choose features Yes No

Number of algorithms Many Few

Training time Short Long

With machine learning, you need fewer data to train the algorithm than deep learning. Deep learning
requires an extensive and diverse set of data to identify the underlying structure. Besides, machine
learning provides a faster-trained model. Most advanced deep learning architecture can take days to a
week to train. The advantage of deep learning over machine learning is it is highly accurate. You do
not need to understand what features are the best representation of the data; the neural network learned
how to select critical features. In machine learning, you need to choose for yourself what features to
include in the model.

TensorFlow

the most famous deep learning library in the world is Google's TensorFlow. Google product uses
machine learning in all of its products to improve the search engine, translation, image captioning or
recommendations.

To give a concrete example, Google users can experience a faster and more refined the search with AI.
If the user types a keyword a the search bar, Google provides a recommendation about what could be
the next word.

Google wants to use machine learning to take advantage of their massive datasets to give users the best
experience. Three different groups use machine learning:

● Researchers
● Data scientists
● Programmers.
They can all use the same toolset to collaborate with each other and improve their efficiency.

Google does not just have any data; they have the world's most massive computer, so TensorFlow was
built to scale. TensorFlow is a library developed by the Google Brain Team to accelerate machine
learning and deep neural network research.

It was built to run on multiple CPUs or GPUs and even mobile operating systems, and it has several
wrappers in several languages like Python, C++ or Java.

In this tutorial, you will learn

TensorFlow Architecture

Tensorflow architecture works in three parts:

● Preprocessing the data


● Build the model
● Train and estimate the model

It is called Tensorflow because it takes input as a multi-dimensional array, also known as tensors. You
can construct a sort of flowchart of operations (called a Graph) that you want to perform on that input.
The input goes in at one end, and then
it flows through this system of multiple operations and comes out the other end as output.

This is why it is called TensorFlow because the tensor goes in it flows through a list of
operations, and then it comes out the other side.

Where can Tensorflow run?

TensorFlow can hardware, and software requirements can be classified into

Development Phase: This is when you train the mode. Training is usually done on your Desktop or
laptop.

Run Phase or Inference Phase: Once training is done Tensorflow can be run on many different platforms.
You can run it on

● Desktop running Windows, macOS or Linux


● Cloud as a web service
● Mobile devices like iOS and Android

You can train it on multiple machines then you can run it on a different machine, once you have the
trained model.

The model can be trained and used on GPUs as well as CPUs. GPUs were initially designed for video
games. In late 2010, Stanford researchers found that GPU was also very good at matrix operations and
algebra so that it makes them very fast for doing these kinds of calculations. Deep learning relies on a
lot of matrix multiplication. TensorFlow is very fast at computing the matrix multiplication because it
is written in C++. Although it is implemented in C++, TensorFlow can be accessed and controlled by
other languages mainly, Python.

Finally, a significant feature of TensorFlow is the TensorBoard. The TensorBoard enables to monitor
graphically and visually what TensorFlow is doing.

List of Prominent Algorithms supported by TensorFlow

● Linear regression: tf.estimator.LinearRegressor


● Classification:tf.estimator.LinearClassifier
● Deep learning classification: tf.estimator.DNNClassifier
● Deep learning wipe and deep: tf.estimator.DNNLinearCombinedClassifier
● Booster tree regression: tf.estimator.BoostedTreesRegressor
● Boosted tree classification: tf.estimator.BoostedTreesClassifier

PYTHON OVERVIEW

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is


designed to be highly readable. It uses English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other languages.

Python is Interpreted: Python is processed at runtime by the interpreter. You do not need
to compile your program before executing it. This is similar to PERL and PHP.
Python is Interactive: You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
Python is Object-Oriented: Python supports Object-Oriented style or
technique of programming that encapsulates code within objects.

Python is a Beginner's Language: Python is a great language for the beginner-level


programmers and supports the development of a wide range of applications from simple
text processing to WWW browsers to games.

History of Python

Python was developed by Guido van Rossum in the late eighties and early nineties at the National
Research Institute for Mathematics and Computer Science in the Netherlands
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, Unix shell, and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the GNU General Public
License (GPL).

Python is now maintained by a core development team at the institute, although Guido van Rossum
still holds a vital role in directing its progress.

Python Features
Python's features include:

Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax.
This allows the student to pick up the language quickly.

Easy-to-read: Python code is more clearly defined and visible to the eyes.

Easy-to-maintain: Python's source code is fairly easy-to-maintain.

A broad standard library: Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode: Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
Portable: Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
Extendable: You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
Databases: Python provides interfaces to all major commercial databases.

GUI Programming: Python supports GUI applications that can be created and ported to
many system calls, libraries, and windows systems, such as Windows MFC, Macintosh,
and the X Window system of Unix.
Scalable: Python provides a better structure and support for large programs than shell
scripting.

Apart from the above-mentioned features, Python has a big list of good features, few are listed below:

IT supports functional and structured programming methods as well as OOP.

It can be used as a scripting language or can be compiled to byte-code for building large
applications.
It provides very high-level dynamic data types and supports dynamic type checking.
IT supports automatic garbage collection.

It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
8. SOFTWARE REQUIREMENTS

SOFTWARE REQUIREMENTS

● Python
● Anaconda Navigator
● Python built-in modules

• Numpy
• Pandas
• Matplotlib
• Sklearn
• Seaborm

8.1 ANACONDA NAVIGATOR

Anaconda Navigator is a desktop graphical user interface (GUI) included in Anaconda distribution that
allows you to launch applications and easily manage conda packages, environments and channels
without using command-line
commands. Navigator can search for packages on Anaconda Cloud or in a local Anaconda Repository.
It is available for Windows, mac OS and Linux.

Why use Navigator?

In order to run, many scientific packages depend on specific versions of other packages. Data scientists
often use multiple versions of many packages, and use multiple environments to separate these different
versions.

The command line program conda is both a package manager and an environment manager, to help
data scientists ensure that each version of each package has all the dependencies it requires and works
correctly.

Navigator is an easy, point-and-click way to work with packages and environments without needing to
type conda commands in a terminal window. You can use it to find the packages you want, install them
in an environment, run the packages and update them, all inside Navigator.

WHAT APPLICATIONS CAN I ACCESS USING NAVIGATOR?

The following applications are available by default in Navigator:


● JupyterLab
● Jupyter Notebook
● QTConsole
● Spyder
● VSCode
● Glueviz
● Orange 3 App
● Rodeo
● RStudio
Advanced conda users can also build your own Navigator applications
How can I run code with Navigator?

The simplest way is with Spyder. From the Navigator Home tab, click Spyder, and write and execute
your code.

You can also use Jupyter Notebooks the same way. Jupyter Notebooks are an increasingly popular
system that combine your code, descriptive text, output, images and interactive interfaces into a single
notebook file that is edited, viewed and used in a web browser.
What’s new in 1.9?

● Add support for Offline Mode for all environment related actions.
● Add support for custom configuration of main windows links.
● Numerous bug fixes and performance enhancements.

8.2 PYTHON
Python
Python is a general-purpose, versatile and popular programming language. It's great as a first language
because it is concise and easy to read, and it is also a good language to have in any programmer's stack
as it can be used for everything from web development to soitware development and scientific
applications.
It has simple easy-to-use syntax, making it the perfect language for someone trying to learn
computer programming for the first time.

Features of Python

A simple language which is easier to learn, Python has a very simple and elegan
syntax. It's much easier to read and write Python programs compared to other languages like: C++,
Java, C#. Python makes programming fun and allows you to focus on the solution rather than syntax.
If you are a newbie, it's a great choice to start your journey with Python.

Free and open source


You can freely use and distribute Python, even for commercial use. Not only can you use and distribute
software’s written in it, you can even make changes to the Python's source code. Python has a large
community constantly improving it in each iteration.

Portability
You can move Python programs from one platform to another, and run it without any changes. It runs
seamlessly on almost all platforms including Windows, Mac OS X and Linux.
Extensible and Embeddable Suppose an application requires high performance. You can easily
combine pieces of C/C++ or other languages with Python code. This will give your application high
performance as well as scripting capabilities which other languages may not provide out of the box.

A high-level, interpreted language


Unlike C/C++, you don’t have to worry about daunting tasks like memory
management, garbage collection and so on. Likewise, when you run Python code, it automatically
converts your code to the language your computer understands. You don't need to worry about any
Lower level operations.

Large standard libraries to solve common tasks Python has a number of standard libraries which
makes life of a programmer much easier since you don't have to write all the code yourself. For
example: Need to connect MySQL database on a Web server You can use MySQLdb library using
import MySQL db Standard libraries in Python are well tested and used by hundreds of people. So you
can be sure that it won't break your application.

● Object-oriented
Everything in Python is an object. Object oriented programming (OOP) helps you
solve a complex problem intuitively.
With OOP, you are able to divide these complex problems into smaller sets by creating
object

Python
History and Versions:
Python is predominantly a dynamic typed programming language which was initiated by Guido van
Rossum in the year 1989. The major design philosophy that was given more importance was the
readability of the code and expressing an idea in fewer lines of code rather than the verbose way of
expressing things as in C++ and Java [K-8][K-9]. The other design philosophy that was worth
mentioning was that, there should be always a single way and a single obvious way to express a given task
which is contradictory to other languages such as C++, Perl etc. [K-10]. Python compiles to an
intermediary code and this in turn is interpreted by the Python Runtime Environment to the Native
Machine Code. The initial versions of Python were heavily inspired from lisp (for functional
programming constructs). Python had
heavily borrowed the module system, exception model and also keyword arguments from Modula-3
language [K-10]. Pythons’ developers strive not to entertain premature optimization, even though it
might increase the performance by a few basis points [K-9]. During its design, the creators had
conceptualized the language as being a very extensible language, and hence they had designed the
language to have a small core library which was extended by a huge standard library [K-7]. Thus as a
result, python is used as a scripting language as it can be easily embedded into any application, though
it can be used to develop a full-fledged application. The reference implementation of python is CPython.
There are also other implementations like Jython, Iron Python which can use python syntax as well as
can use any class of Java (Jython) or .Net class (Iron Python). Versions: Python has two versions 2.x
version and 3.x version. The 3.x version is a backward incompatible release was released to fix many
design issues which plagued the 2.x series. The latest in the 2.x series is 2.7.6 and the latest in 3.x series
is 3.4.0. 1.5.2 Paradigms: Python supports multi-paradigms such as: Object-Oriented, Imperative,
Functional, Procedural, and Reflective. In Object-Oriented Paradigm, Python supports most of the
OOPs concepts such as Inheritance (It also has support for Multiple Inheritance), Polymorphism but its
lack of support for encapsulation is a blatant omission as Python doesn’t have private, protected
members: all class members are public [K- 11]. Earlier Python 2.6 versions didn’t support some OOP’s
concepts such as Abstraction through Interfaces and Abstract Classes [K-19]. It also supports
Concurrent paradigm, but with Python we will not be able to make truly multitasking applications as the
inbuilt threading API is limited by GIL (Global Interpreter Lock) and hence applications that use the
threading API cannot run on multi-core parallelly [K-12].The only remedy is that, the user has to either
use the multi-processing module which would fork processes or use Interpreters that haven’t
implemented GIL such as Jython or Iron Python [K-12]. 1.5.3 Compilation, Execution and
Memory Management: Compilation, Execution and Memory Management: 21 A Comparative Studies
of Programming Languages (Comparative Studies of Six Programming Language) Just like the other
Managed Languages, Python compiles to an intermediary code and this in turn is interpreted by the
Python Runtime Environment to the Native Machine Code. The reference implementation (i.e.
CPython) doesn’t come with a JIT compiler because of which the execution speed is slow compared to
native programming languages [K-17]. We can use PyPy interpreter as it includes a JIT compiler rather
than using the Python interpreter that comes by default with the python language, if speed of execution
is one of the important factors [K-18]. The Python Runtime Environment also takes care of all the
allocation and deallocation of memory through the Garbage Collector. When a new object is created,
the GC allocates the necessary memory, and once the object goes out of its scope, the GC doesn’t
release memory immediately but instead it becomes eligible for Garbage Collection, which would
eventually release the memory. Typing Strategies: Python is a strongly dynamic typed language. Python
3 also supports optional static typing [K-20]. There are a few advantages in using a dynamic typed
language, the most prominent one would be that the code is more readable as there is less code (in other
words has less boiler-plate code). But the main disadvantage in having python as a dynamic
programming language is that there would be no way to guarantee that a particular piece of code would
run successfully for all the different data-types scenarios simply because it had run successfully with
one type. Basically, we don’t have any means to find out an error in the code, till the code has started
running. 1.5.4 Strengths and Weaknesses and Application Areas: Python is predominantly used as a
scripting language used in developing standalone applications that are being developed with Static-
Typed languages, because of the flexibility it provides due to its dynamic typed nature. Python favours
rapid application development, which qualifies it to be used for prototyping. To a certain
extent, Python is also used in developing websites. Due to its dynamic typing and of the presence of a
Virtual Machine, there is a considerable overhead which translates to way less performance when we
compare with native programming languages and hence it is not suited.

8.3 NUMPY
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides
a multidimensional array object, various derived objects (such as masked arrays and matrices), and an
assortment of routines for fast operations on arrays, including mathematical, logical, shape
manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical
operations, random simulation and much more. At the core of the NumPy package, is the ndarray object.
This encapsulates n-dimensional arrays of homogeneous data types, with many operations being
performed in compiled code for performance. There are several important differences between NumPy
arrays and the standard Python sequences: • NumPy arrays have a fixed size at creation, unlike Python
lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete
the original. • The elements in a NumPy array are all required to be of the same data type, and thus will
be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects,
thereby allowing for arrays of different sized elements. • NumPy arrays facilitate advanced
mathematical and other types of operations on large numbers of data. Typically, such operations are
executed more efficiently and with less code than is possible using Python’s built-in sequences. • A
growing plethora of scientific and mathematical Python-based packages are using NumPy arrays;
though these typically support Python-sequence input, they convert such input to NumPy arrays prior to
processing, and they often output NumPy arrays. In other words, in order to efficiently use much
(perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to
use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.
The points about sequence size and speed are particularly important in scientific computing. As a simple
example, consider the case of multiplying each element in a 1-D sequence with the corresponding
element in another sequence of
the same length. If the data are stored in two Python lists, a and b, we could iterate over each element:

The Numeric Python extensions (NumPy henceforth) is a set of extensions to the Python programming
language which allows Python programmers to efficiently manipulate large sets of objects organized in
grid-like fashion. These sets of objects are called arrays, and they can have any number of dimensions:
one dimensional arrays are similar to standard Python sequences, two-dimensional arrays are similar to
matrices from linear algebra. Note that one-dimensional arrays are also different from any other Python
sequence, and that two-dimensional matrices are also different from the matrices of linear algebra, in
ways which we will mention later in this text. Why are these extensions needed? The core reason is a
very prosaic one, and that is that manipulating a set of a million numbers in Python with the standard
data structures such as lists, tuples or classes is much too slow and uses too much space. Anything
which we can do in NumPy we can do in standard Python – we just may not be alive to see the program
finish. A more subtle reason for these extensions however is that the kinds of operations that
programmers typically want to do on arrays, while sometimes very complex, can often be decomposed
into a set of fairly standard operations. This decomposition has been developed similarly in many array
languages. In some ways, NumPy is simply the application of this experience tothe Python language –
thus many of the operations described in NumPy work the way they do because experience has shown
that way to be a good one, in a variety of contexts. The languages which were used to guide the
development of NumPy include the infamous APL family of languages, Basis, MATLAB, FORTRAN,
S and S+, and others. This heritage will be obvious to users of NumPy who already have experience
with these other languages. This tutorial, however, does not assume any such background, and all that
is expected of the reader is a reasonable working knowledge of the standard Python language. This
document is the “official” documentation for NumPy. It is both a tutorial and the most authoritative
source of information about NumPy with the exception of the source code. The tutorial material will
walk you through a set of manipulations of simple, small, arrays of numbers, as well as image files. This
choice was made because: • Aconcrete data set makes explaining the behavior of some functions much
easier to motivate than simply talking about abstract operations on abstract data sets; • Every reader
will at least an intuition as to the meaning of the data and organization of image files, and
The result of various manipulations can be displayed simply since the data set has
a natural graphical representation. All users of NumPy, whether interested in image processing or not,
are encouraged to follow the tutorial with a working NumPy installation at their side, testing the
examples, and, more importantly, transferring the understanding gained by working on images to their
specific domain. The best way to learn is by doing – the aim of this tutorial is to guide you along this
“doing.”
9. PYTHON ENVIRONMENT

Python is available on a wide variety of platforms including Linux and Mac OS X. Let's understand
how to set up our Python environment.

8. IMPLEMENTATION
8.1 SAMPLE CODE
Coding And Test Cases :

Code

import warnings
warnings.filterwarnings('ignore') import
pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score,confusion_matrix
df=pd.read_csv('xAPI-Edu-Data.csv')
df.head() df.shape
df.info()
df.dropna()
df.isnull().sum()
#EDA
sns.countplot(x="gender", order=['F','M'], data=df, palette="Set1") plt.show()
sns.countplot(x="gender", order=['F','M'], hue="Class", hue_order=['L','M','H'],
data=df, palette="muted")
plt.show()
df['NationalITy'].value_counts(normalize=True).plot(kind='bar') plt.show()
df['PlaceofBirth'].value_counts(normalize=True).plot(kind='bar') plt.show()
sns.countplot(y="NationalITy", data=df, palette="muted")
plt.show()
sns.countplot(y="NationalITy", hue="Class", hue_order=['L','M','H'], data=df,
palette="muted")
plt.show()
sns.countplot(x="Relation", order=['Mum','Father'], data=df, palette="Set1") plt.show()
sns.countplot(x="Relation", order=['Mum','Father'], hue="Class",
hue_order=['L','M','H'], data=df, palette="muted")
plt.show()
sns.countplot(x="StageID", data=df, palette="muted") plt.show()
sns.countplot(x="StageID", hue="Class", hue_order=['L','M','H'], data=df,
palette="muted")
plt.show()
sns.countplot(x="GradeID", data=df, palette="muted") plt.show()
sns.countplot(x="GradeID", hue="Class", hue_order=['L','M','H'], data=df,
palette="muted")
plt.show() plt.subplot(1,2,1)
sns.countplot(x="SectionID", order=['A','B','C'], data=df, palette="muted")
plt.subplot(1,2,2)
sns.countplot(x="SectionID", order=['A','B','C'], hue="Class",
hue_order=['L','M','H'], data=df, palette="muted") plt.show()
plt.subplot(1,2,1)
sns.countplot(y="Topic", data=df, palette="muted")
plt.subplot(1,2,2)
sns.countplot(y="Topic", hue="Class", hue_order=['L','M','H'], data=df,
palette="muted")
plt.show()
sns.countplot(x="ParentschoolSatisfaction", data=df, palette="muted") plt.show()
sns.countplot(x="ParentschoolSatisfaction", hue="Class", hue_order=['L','M','H'], data=df,
palette="muted")
plt.show() plt.figure(figsize=(8, 8))
sns.countplot('Class', data=df)
plt.title('Balanced Classes')
plt.show()

#Pre-processing
from sklearn import preprocessing
le=preprocessing.LabelEncoder()
df['LGender'] = le.fit_transform(df['gender'])#.values.reshape(-1,1).ravel())
df['LNationalITy'] = le.fit_transform(df['NationalITy'])
df['LPlaceofBirth'] = le.fit_transform(df['PlaceofBirth'])
df['LStageID'] = le.fit_transform(df['StageID'])
df['LGradeID'] = le.fit_transform(df['GradeID'])
df['LSectionID'] = le.fit_transform(df['SectionID'])
df['LTopic'] = le.fit_transform(df['Topic']) df['LSemester']
= le.fit_transform(df['Semester']) df['LRelation'] =
le.fit_transform(df['Relation'])
df['LParentschoolSatisfaction'] = le.fit_transform(df['ParentschoolSatisfaction'])
df['LParentAnsweringSurvey'] = le.fit_transform(df['ParentAnsweringSurvey'])
df['LStudentAbsenceDays'] = le.fit_transform(df['StudentAbsenceDays']) df['LClass'] =
le.fit_transform(df['Class'])
df.head(1) df=df.drop(["gender"],axis=1)
df=df.drop(["NationalITy"],axis=1)
df=df.drop(["PlaceofBirth"],axis=1)
df=df.drop(["StageID"],axis=1)
df=df.drop(["GradeID"],axis=1)
df=df.drop(["SectionID"],axis=1)
df=df.drop(["Topic"],axis=1)
df=df.drop(["Semester"],axis=1)
df=df.drop(["Relation"],axis=1)
df=df.drop(["ParentAnsweringSurvey"],axis=1)
df=df.drop(["StudentAbsenceDays"],axis=1)
df=df.drop(["ParentschoolSatisfaction"],axis=1)
df=df.drop(["Class"],axis=1)
df.head()
df.to_csv('data.csv')
#Univariate Selection
from sklearn.feature_selection import SelectKBest from
sklearn.feature_selection import chi2 x=df.iloc[:,df.columns
!='LClass'] y=df.iloc[:,df.columns =='LClass']
bestfeatures = SelectKBest(score_func=chi2, k=10) fit =
bestfeatures.fit(x,y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(x.columns)
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']
featureScores.nlargest(10,'Score')
corr = df.corr()

# plot the heatmap


sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)

#Splitting and Classification


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x, y, test_size=0.2)

#logistic regression
from sklearn.linear_model import LogisticRegression lr=LogisticRegression()
lr.fit(x_train,y_train)
predict1=lr.predict(x_test)
model1=accuracy_score(y_test,predict1)
print(model1)
plt.show()

Test Case
1. Test case related to Dataset
10. RESULTS AND SCREENSHOTS
Data Visualization with Matplotlib and seaborn

EDA part has been done as a module to represent the dataset graphically on each attribute from
dataset.
00-0
KW

jordan

Palestine

lebanon

Tunis

5audiArabia

Egypt

Lybia
Iran
Morocco
ve nzuela
ue

J75

KW Class
lebano n
Egypt
5audiArabia

}ordan

é venzuela

' Iran

Tunis
Morocco

Syria
Palestine
0 l0 20 30 40 SO b0 7B BO
Iraq
Lybia

S 150

Mum Father
Relation
Class
- L

100

20

Father
Relation

Relation

5tagelD
lowerlevel Middle5
choo
l
lowerl Hiq
evel hsch
ool
Middle
School
Sta geID

G-04 G-07 G-08 G-06 G-05 G-09 G- L2 G-L1 G-LO G-02


GradelD

G-04 G-07 G-08 G-06 G-05 G-09 G-L2 G-LI G-LO G-02
lass

150

SectionID Section ID

Math ath
Arabic abic
Science nce
English En lish
Duran ran
Spanish nish
French nch
History Hi
Biology
Chemistry Che istry

20 100 0 10 20

Good Parentschool5atisfa Bad


c tion
Parentschool5atisfac tion

Balanced Classes

Elass
Accuracy with logistic regression:-

11. FUTURE WORK


In the future we provide some technical solution by improve the efficiency of student

performance .The user interaction model could be derived for giving the record of student

dynamically and it could give staff an alert message about those students who are having

low performance . We could build the prediction using Neural Network and can expect

improvised results. We can add non- academic attributes along with academics attributes.

12. CONCLUSION
Finally, performance analysis for students are a major problem. It is important that they are

countered. The work reported in this thesis indicates the machine learning techniques with

supervised learning algorithms to understand the performance of algorithm with respect to

student records where we analyses the performance of student and categorized it into three

classes as high , average, low with the accuracy of 64% .

You might also like