0% found this document useful (0 votes)

39 views24 pages

Project Report

Uploaded by

Mouhamadou DEME

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views24 pages

Project Report

Uploaded by

Mouhamadou DEME

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Perceptron Multilayer

Copyrights © 2017 Innodatatics Inc. All Rights Reserve

Machine Learning:

objective

Predict whether annual income of an individual exceeds $50K/yr based

on census data.
The classification goal is to predict whether a person's income is over
$50,000 (<= 50k) a year or (>50k)
Project Architecture / Project Flow
1.Pre-processing the data

3.EDA:Exploratory Data Analysis

4.Model Building

5.Evaluate the model

6.Data Visualizations

7.Deployment Frame
Exploratory Data Analysis (EDA) and
Feature Engineering
Data set details

1)The dataset is having 45211 observations.

2)It is having 17 columns and 45211 rows.

3)There are no missing values in the data set

4)The data set is having both the combination of categorical values and numeric values.So, we need
to convert categorical values to numeric

5)There are no duplicate values in the data set.

6)The target column is the dependent variable with values yes or no

7)The top 5 rows of the data set is shown below.

7)Following are the columns in the data set:

Dependent Variable : 'Target', ‘
Independent Variables: age', 'job', 'marital', 'education', 'default', 'balance', 'housing', 'loan', 'contact’,
'day', 'month', 'duration', 'campaign', 'pdays', 'previous', 'poutcome'
Data set details
1) Age : continuous.
2) Workclass : Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-
pay, Never-worked.
3) Fnlwgt : continuous.
4) Education : Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th,
7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
5) education-num: continuous.
6) marital-status : Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-
spouse-absent, Married-AF-spouse.
7) Occupation : Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty,
Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-
serv, Protective-serv, Armed-Forces.
8) Relationship : Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.
9) Race : White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.
10) Sex : Female, Male.
11) Capital-gain: continuous.
12) Capital-loss : continuous.
13) Hours-per-week : continuous.
14) Native-country : United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-
US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy,
Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador,
Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-
Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.
15) Income_class : >50K, <=50K.
Data set details

Above table show the data set.

The income_class column is the dependent variable with values <=

50k or > 50k . The target value count is (<= 50k : 24720 , >
50k :7841)
Data visualization :

1)The above plot is the box plot for the numeric features in the data set. The main advantage of boxplot
is it shows outliers. An outlier is a data point that differs significantly from other observations.
Data visualization :

1)The above plot represents the scatter plot for numeric features in the data set. A pairplot plot a
pairwise relationships in a dataset. A “pairplot” is also known as a scatterplot, in which one variable in
the same data row is matched with another variable's value,.
Data set details

1)The above plot represents correlation plot for numeric features in the data set. A correlation matrix is
a table showing correlation coefficients between variables. Each cell in the table shows
the correlation between two variables. A correlation matrix is used to summarize data, as an input into
a more advanced analysis, and as a diagnostic for advanced analyses.

1) The diagram shows the heat map for numeric

features in the data set.Heatmaps provide a visual
approach to understanding numeric values. It is a
representation of data in the form of a map or
diagram in which data values are represented as
colours. A heat map is data analysis software that
uses color the way a bar graph uses height and
width: as a data visualization tool.
Data visualization :

We have few features which are categorical so we have to convert them to numeric.
job','marital','education','default','housing','loan','contact','month','poutcome’ these are in categorical values
we need to convert some variable to dummy variable. Below are the count plots for categorical features
which we can get insights from it.
Data visualization :
Model Building
Model Building

Following are the models used for Model building in this

project:
1)Logistic regression.
2)Decision tree classifier.
3) Random forest classifier.
4) Extra tree classifier.
5) Support Vector Machine(SVM) Classifier.
6)Neural Networks
7)Bagging classifier method.
8)Catboost Classifier
9)XGB Classifier
Logistic regression:
1)Logistic regression is often referred as logit model is a technique used to predict the
probability associated with each dependent variable category.
2)Logistic Regression Model is a generalized form of Linear Regression Model. It is a very
good Discrimination Tool.
3)Logistic Regression measures the relationship between the dependent variable (our
label, what we want to predict) and the one or more independent variables (our features),
by estimating probabilities using it’s underlying logistic function.
4)The probability in logistic regression curve can be given by :
Logistic regression:
Advantages of Logistic Regression:

1. Logistic Regression performs well when the dataset is linearly separable.

2. Logistic regression is less prone to over-fitting but it can overfit in high dimensional
datasets. You should consider Regularization (L1 and L2) techniques to avoid over-fitting in
these scenarios.
3. Logistic Regression not only gives a measure of how relevant a predictor (coefficient
size) is, but also its direction of association (positive or negative).
4. Logistic regression is easier to implement, interpret and very efficient to train.

Disadvantages of Logistic Regression:

1. Main limitation of Logistic Regression is the assumption of linearity between the

dependent variable and the independent variables. In the real world, the data is rarely
linearly separable. Most of the time data would be a jumbled mess.
2. If the number of observations are lesser than the number of features, Logistic
Regression should not be used, otherwise it may lead to overfit.
3. Logistic Regression can only be used to predict discrete functions. Therefore, the
dependent variable of Logistic Regression is restricted to the discrete number set. This
restriction itself is problematic, as it is prohibitive to the prediction of continuous data
Decision tree classifier :
1)It is called as Greedy algorithm and Supervised classification model.
2)Decision tree algorithm is like a tree like structure where it is the combination of root
node ,branch node and leaf nodes.
3)With the help of Entropy and Information gain we have to choose Root node.
4)Outcomes here are leaf nodes.
5)Overfitting is the main problem in decision tree classifier. As the model overfits we call it
as greedy. We can go with pruning method for removing the branches without removing
the information.

Root Node: Root node is from where the

decision tree starts. It represents the entire
dataset, which further gets divided into two
or more homogeneous sets.
Leaf Node: Leaf nodes are the final output
node, and the tree cannot be segregated
further after getting a leaf node.
Splitting: Splitting is the process of dividing
the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting
the tree.
Decision tree classifier :
Advantages of the Decision Tree:

1)It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
2)It can be very useful for solving decision-related problems.
3)It helps to think about all the possible outcomes for a problem.
4)There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree:

1)The decision tree contains lots of layers, which makes it complex.
2)It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
3) For more class labels, the computational complexity of the decision tree may increase.
Neural Networks :
1) A neural network is a network or circuit of neurons, or in a modern sense, an artificial
neural network, composed of artificial neurons or nodes. Thus a neural network is
either a biological neural network, made up of real biological neurons, or an artificial
neural network, for solving artificial intelligence(AI) problems.
2) Neural network architecture :
Neural Networks :
Different types of Activation
functions:
Model Deployment using
Deployment Using flask

Flask File Creation:

• Import Flask from flask module

• Create an instance of the Flask class
• We use @app.route(‘/’) to execute home function and
@app.route(‘/predict’, methods=[POST]) to execute predict function
• Using which gives results page for this use index.html
• After execute whole deployment code it gives link like http://127.0.0:5000
run this link to get results.
Thank you

Oracle Generative AI (1Z0-1127-25) Mock Test - Set - 3
No ratings yet
Oracle Generative AI (1Z0-1127-25) Mock Test - Set - 3
5 pages
Knowledge Representation in AI
No ratings yet
Knowledge Representation in AI
14 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Adult Census Income Prediction
100% (1)
Adult Census Income Prediction
31 pages
For Email
No ratings yet
For Email
8 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Supervised Learning
No ratings yet
Supervised Learning
187 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Orbis Full Annual Report 2024
No ratings yet
Orbis Full Annual Report 2024
170 pages
Copia de Chat GPT Seo KINDLE
No ratings yet
Copia de Chat GPT Seo KINDLE
159 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
Pa Unit-Iii
No ratings yet
Pa Unit-Iii
75 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
Aditya Slides For IBM
No ratings yet
Aditya Slides For IBM
125 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Analytics Boot Camp
No ratings yet
Analytics Boot Camp
126 pages
Marketing Analyticsu A Machine Learning Approach 1st Edition by Mansurali 9781000608908 1000608905pdf Download
100% (5)
Marketing Analyticsu A Machine Learning Approach 1st Edition by Mansurali 9781000608908 1000608905pdf Download
77 pages
Presentation 1
No ratings yet
Presentation 1
64 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Classifiction
No ratings yet
Classifiction
42 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
Presentation 4
No ratings yet
Presentation 4
80 pages
Skin Cancer Prediction Using Deep Learning Technique
No ratings yet
Skin Cancer Prediction Using Deep Learning Technique
57 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Module 04
No ratings yet
Module 04
75 pages
Week 6 Chap3 - Basic - Classificationi
No ratings yet
Week 6 Chap3 - Basic - Classificationi
59 pages
ML Classifiers
No ratings yet
ML Classifiers
48 pages
Introduction To RNNS!: Arun Mallya!
No ratings yet
Introduction To RNNS!: Arun Mallya!
52 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
Weka Tutorial 2
No ratings yet
Weka Tutorial 2
50 pages
Application of Machine Learning and Deep Learning in Finite Element Analysis: A Comprehensive Review
No ratings yet
Application of Machine Learning and Deep Learning in Finite Element Analysis: A Comprehensive Review
40 pages
02 Input Output
No ratings yet
02 Input Output
44 pages
Paper On Machine Learning For Kaggle
No ratings yet
Paper On Machine Learning For Kaggle
40 pages
India Biotech and Medtech Report
No ratings yet
India Biotech and Medtech Report
21 pages
Machine Learning Supervised
No ratings yet
Machine Learning Supervised
42 pages
Module 5 - Supervised Learning Algorithms
No ratings yet
Module 5 - Supervised Learning Algorithms
38 pages
Session 0 CO1-Introduction To AI and ML
No ratings yet
Session 0 CO1-Introduction To AI and ML
18 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
ML-classification Models
No ratings yet
ML-classification Models
27 pages
Da Mid 2
No ratings yet
Da Mid 2
12 pages
e Flux Criticism Value in Garbage Out On Ai Art and Hegemony
No ratings yet
e Flux Criticism Value in Garbage Out On Ai Art and Hegemony
7 pages
Deep Learning For Automatic Violence Detection - Tests On The AIRTLab Dataset
No ratings yet
Deep Learning For Automatic Violence Detection - Tests On The AIRTLab Dataset
16 pages
Machine Learning Applications For Precision Agricu
No ratings yet
Machine Learning Applications For Precision Agricu
38 pages
Test2 ML Model Answer
No ratings yet
Test2 ML Model Answer
10 pages
Unit 5
No ratings yet
Unit 5
25 pages
Air Quality Index Prediction Via Multi Task Machine Learning
No ratings yet
Air Quality Index Prediction Via Multi Task Machine Learning
13 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
Reference Papers
No ratings yet
Reference Papers
7 pages
Saleh Et Al-2024-Scientific Reports
No ratings yet
Saleh Et Al-2024-Scientific Reports
11 pages
Lesson 10 Decision Trees
No ratings yet
Lesson 10 Decision Trees
31 pages
US Census Income 1
No ratings yet
US Census Income 1
18 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
5 pages
ML Assignment-01
No ratings yet
ML Assignment-01
7 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
Machine Learningfor Predictive Maintenancein Renewable Energy System 1
No ratings yet
Machine Learningfor Predictive Maintenancein Renewable Energy System 1
10 pages
Module 5
No ratings yet
Module 5
6 pages
Predicting The Term Deposit Subscription
No ratings yet
Predicting The Term Deposit Subscription
38 pages
Word Transcription of MODI Script To Devanagari Using Deep Neural Network
No ratings yet
Word Transcription of MODI Script To Devanagari Using Deep Neural Network
5 pages
Unit3 ML
No ratings yet
Unit3 ML
7 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
16 Comparison of Data Science Algorithms
No ratings yet
16 Comparison of Data Science Algorithms
13 pages
Ai Driven Document Processing A Novel Framework For 22nei1ew7b04
No ratings yet
Ai Driven Document Processing A Novel Framework For 22nei1ew7b04
10 pages
Olaniyan Remilekun Desmond New CV
No ratings yet
Olaniyan Remilekun Desmond New CV
4 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Anonuevo LiamAngelo AS3
No ratings yet
Anonuevo LiamAngelo AS3
4 pages
DrDRajeswara Rao
No ratings yet
DrDRajeswara Rao
3 pages
(M) BROCHURE - Data Science Learning Path
No ratings yet
(M) BROCHURE - Data Science Learning Path
33 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
24CSR1R01 DSF Assignment 10
No ratings yet
24CSR1R01 DSF Assignment 10
3 pages
Adobe Generative AI User Guidelines
No ratings yet
Adobe Generative AI User Guidelines
4 pages
Shashidhar-18csl76 Final
No ratings yet
Shashidhar-18csl76 Final
19 pages
Priyanka M: Brief Summary
No ratings yet
Priyanka M: Brief Summary
3 pages
TNP Portal Using Web Development and Machine Learning
No ratings yet
TNP Portal Using Web Development and Machine Learning
9 pages
Updated Resume
No ratings yet
Updated Resume
1 page
What Is Artificial Intelligence - by Raymond de Lacaze
No ratings yet
What Is Artificial Intelligence - by Raymond de Lacaze
22 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Final Research Paper
No ratings yet
Final Research Paper
3 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Project Report

Uploaded by

Project Report

Uploaded by

Perceptron Multilayer

Copyrights © 2017 Innodatatics Inc. All Rights Reserve

Predict whether annual income of an individual exceeds $50K/yr based

3.EDA:Exploratory Data Analysis

5.Evaluate the model

1)The dataset is having 45211 observations.

2)It is having 17 columns and 45211 rows.

3)There are no missing values in the data set

5)There are no duplicate values in the data set.

6)The target column is the dependent variable with values yes or no

7)The top 5 rows of the data set is shown below.

7)Following are the columns in the data set:

Above table show the data set.

The income_class column is the dependent variable with values <=

1) The diagram shows the heat map for numeric

Following are the models used for Model building in this

1. Logistic Regression performs well when the dataset is linearly separable.

Disadvantages of Logistic Regression:

1. Main limitation of Logistic Regression is the assumption of linearity between the

Root Node: Root node is from where the

Disadvantages of the Decision Tree:

Flask File Creation:

• Import Flask from flask module

You might also like