0% found this document useful (0 votes)

55 views9 pages

Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory

The document analyzes census data from a CSV file containing 48842 rows and 15 columns. It loads the data into a pandas dataframe, explores the data types and distribution of variables, performs outlier treatment on the 'capital-gain' variable, drops an unnecessary variable, and generates summary statistics and visualizations to analyze relationships between variables like gender, income, race, and hours worked.

Uploaded by

Samana Tatheer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views9 pages

Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory

Uploaded by

Samana Tatheer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.

ipynb - Colaboratory

import pandas as pd

import numpy as np

df323= pd.read_csv("/content/census (1).csv")

df323.shape

(48842, 15)

df323.head()

educational- marital-
age workclass fnlwgt education occupation relationship
num status

Never- Machine-
0 25 Private 226802 11th 7 Own-child
married op-inspct

Married-
Farming-
1 38 Private 89814 HS-grad 9 civ- Husband
fishing
spouse

Married-
Assoc- Protective-
2 28 Local-gov 336951 12 civ- Husband
acdm serv
spouse

Married-
Some- Machine-
3 44 Private 160323 10 civ- Husband
ll i t

df323.describe(include='all')

educational- mar
age workclass fnlwgt education
num s

count 48842.000000 48842 4.884200e+04 48842 48842.000000

unique NaN 9 NaN 16 NaN

M
top NaN Private NaN HS-grad NaN
s

freq NaN 33906 NaN 15784 NaN

mean 38.643585 NaN 1.896641e+05 NaN 10.078089

std 13.710510 NaN 1.056040e+05 NaN 2.570973

min 17.000000 NaN 1.228500e+04 NaN 1.000000

25% 28.000000 NaN 1.175505e+05 NaN 9.000000

50% 37.000000 NaN 1.781445e+05 NaN 10.000000

df323.dtypes

age int64
workclass object
fnlwgt int64
education object
educational-num int64
marital-status object
occupation object
relationship object
race object
gender object
capital-gain int64
capital-loss int64
hours-per-week int64
native-country object

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 1/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory
income object
dtype: object

cols=['education','workclass','marital-status','occupation','relationship','race','gender',]
df323[cols] =df323[cols].astype('category')

df323['educational-num'] =df323['educational-num'].astype('float64')

df323.dtypes

age int64
workclass category
fnlwgt int64
education category
educational-num float64
marital-status category
occupation category
relationship category
race category
gender category
capital-gain int64
capital-loss int64
hours-per-week int64
native-country object
income object
dtype: object

Q1,Q3= np.percentile(df323['capital-gain'],[25,75])

IQR=Q3-Q1

upper=np.where(df323['capital-gain']> (Q3+1.5*IQR))
lower= np.where(df323['capital-gain']<(Q1-1.5*IQR))

df323['capital-gain']= df323['capital-gain'].replace(upper, np.NaN)

df323['capital-gain']= df323['capital-gain'].replace(lower, np.NaN)

df3 = df323.drop(['fnlwgt'],axis=1)

df3.dtypes

age int64
workclass category
education category
educational-num float64
marital-status category
occupation category
relationship category
race category
gender category
capital-gain float64
capital-loss int64
hours-per-week int64
native-country object
income object
dtype: object

df3.isnull().sum()

age 0
workclass 0
education 0

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 2/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory
educational-num 0
marital-status 0
occupation 0
relationship 0
race 0
gender 0
capital-gain 342
capital-loss 0
hours-per-week 0
native-country 0
income 0
dtype: int64

df3.describe(include='all')

educational- marital-
age workclass education occupat
num status

count 48842.000000 48842 48842 48842.000000 48842 48

unique NaN 9 16 NaN 7

Married-
top NaN Private HS-grad NaN civ-
spec
spouse

freq NaN 33906 15784 NaN 22379 6

mean 38.643585 NaN NaN 10.078089 NaN

std 13.710510 NaN NaN 2.570973 NaN

min 17.000000 NaN NaN 1.000000 NaN

25% 28.000000 NaN NaN 9.000000 NaN

50% 37.000000 NaN NaN 10.000000 NaN

import matplotlib.pyplot as plt

import seaborn as sns
%matplotlib inline

my_tab = pd.crosstab(index=df388["income"],
columns="count")
my_tab

col_0 count

income

<=50K 37155

>50K 11687

GC_DF = df323[['gender','income','hours-per-week']].groupby(['gender','income']).mean().reset_index()

GC_DF1 = df323[['gender','income','capital-gain']].groupby(['gender','income']).mean().reset_index()

GC_DF2 = df323[['gender','income','capital-loss']].groupby(['gender','income']).mean().reset_index()

sns.barplot(x='gender',y='capital-gain',data=df3)
plt.title('Average capital gain among males and females')

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 3/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory

Text(0.5, 1.0, 'Average capital gain among males and females')

work_DF = df323[['income','race','hours-per-week']].groupby(['income','race']).mean().reset_index()
work_DF

income race hours-per-week

0 <=50K Amer-Indian-Eskimo 39.816867

1 <=50K Asian-Pac-Islander 38.012613

2 <=50K Black 37.824958

3 <=50K Other 38.488764

4 <=50K White 38.994736

5 >50K Amer-Indian-Eskimo 43.709091

6 >50K Asian-Pac-Islander 44.965770

7 >50K Black 44.222615

8 >50K Other 44.280000

Q10

fig1,ax1=plt.subplots(figsize=(13,7))
sns.barplot(x='income',y='hours-per-week',hue= 'race',data = work_DF)
plt.title("Average working hours across income levels and race")

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 4/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory

Text(0.5, 1.0, 'Average working hours across income levels and race')

Q11

from IPython.display import IFrame

url= 'https://your-html-file-url.com'
IFrame(url, width=700, height=500)

Q12

df= df323[['hours-per-week', 'capital-gain','capital-loss','gender','income', 'race']]

hours-per- capital- capital-

gender income race
week gain loss

0 40 0.0 0 Male <=50K Black

1 50 0.0 0 Male <=50K White

2 40 0.0 0 Male >50K White

3 40 7688.0 0 Male >50K Black

4 30 0.0 0 Female <=50K White

... ... ... ... ... ... ...

48837 38 0.0 0 Female <=50K White

48838 40 0.0 0 Male >50K White

48839 40 0.0 0 Female <=50K White

48840 20 0.0 0 Male <=50K White

48841 40 15024 0 0 F l 50K Whit

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 5/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory

Q13

df_gender = pd.get_dummies(df['gender'],drop_first=True)
df_gender.head()

Male

0 1

1 1

2 1

3 1

4 0

df_race = pd.get_dummies(df['race'],drop_first=True)
df_race.head()

Asian-Pac-Islander Black Other White

0 0 1 0 0

1 0 0 0 1

2 0 0 0 1

3 0 1 0 0

4 0 0 0 1

df_income = pd.get_dummies(df['income'],drop_first=True)
df_income.head()

>50K

0 0

1 0

2 1

3 1

4 0

Q14

data_final = pd.concat([df323[['hours-per-week','capital-gain','capital-loss']], df_gender,df_race,df_income],axis=1)

data_final

hours- Asian-
capital- capital-
per- Male Pac- Black Other White >50
gain loss
week Islander

0 40 0.0 0 1 0 1 0 0

1 50 0.0 0 1 0 0 0 1

2 40 0.0 0 1 0 0 0 1

3 40 7688.0 0 1 0 1 0 0

4 30 0.0 0 0 0 0 0 1

... ... ... ... ... ... ... ... ...

48837 38 0.0 0 0 0 0 0 1

48838 40 0.0 0 1 0 0 0 1

48839 40 0.0 0 0 0 0 0 1

Q15
https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 6/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory

data_final.dtypes

hours-per-week int64
capital-gain float64
capital-loss int64
Male uint8
Asian-Pac-Islander uint8
Black uint8
Other uint8
White uint8
>50K uint8
dtype: object

q16

cols= ['Male','Asian-Pac-Islander','Black','Other','White','>50K']
data_final[cols] =data_final[cols].astype('int')
data_final

hours- Asian-
capital- capital-
per- Male Pac- Black Other White >50
gain loss
week Islander

0 40 0.0 0 1 0 1 0 0

1 50 0.0 0 1 0 0 0 1

2 40 0.0 0 1 0 0 0 1

3 40 7688.0 0 1 0 1 0 0

4 30 0.0 0 0 0 0 0 1

... ... ... ... ... ... ... ... ...

48837 38 0.0 0 0 0 0 0 1

48838 40 0.0 0 1 0 0 0 1

48839 40 0.0 0 0 0 0 0 1

Q17

from pandas import read_csv

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

q18

x = data_final.drop('>50K',axis=1)
y = data_final['>50K']

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2, random_state=5)

Q19

score=[]

clf1=LogisticRegression()

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 7/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory
score=[]

clf1=LogisticRegression()
clf1.fit(x_train,y_train)
pred1=clf1.predict(x_test)
s1=accuracy_score(y_test,pred1)
score.append(s1*100)
print(s1)

knn = KNeighborsClassifier()
knn.fit(x_train,y_train)
pred2 = knn.predict(x_test)
s2 = accuracy_score(y_test,pred2)
score.append(s2*100)
print(s2)

dtc = DecisionTreeClassifier()
dtc.fit(x_train,y_train)
pred3 = dtc.predict(x_test)
s3 = accuracy_score(y_test,pred3)
score.append(s3*100)
print(s3)

clf = LinearDiscriminantAnalysis()
clf.fit(x_train,y_train)

clf.fit(x_train,y_train)
pred4 = clf.predict(x_test)
s4 = accuracy_score(y_test,pred4)
score.append(s4*100)
print(s4)
0.7783805916675197

-------------------------------------------------------------------------
--
ValueError Traceback (most recent call
last)
<ipython-input-79-488fff80522e> in <cell line: 4>()
2
3 clf1=LogisticRegression()
----> 4 clf1.fit(x_train,y_train)
5 pred1=clf1.predict(x_test)
6 s1=accuracy_score(y_test,pred1)

4 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py in
_assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
159 "#estimators-that-handle-nan-values"
160 )
--> 161 raise ValueError(msg_err)
162
163

ValueError: Input X contains NaN.

LogisticRegression does not accept missing values encoded as NaN
natively. For supervised learning, you might want to consider
sklearn.ensemble.HistGradientBoostingClassifier and Regressor which
t i i l d d N N ti l Alt ti l it i

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 8/9
9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.ipynb - Colaboratory

https://colab.research.google.com/drive/1ufwLzyFmN4hwEmvRAryaKPBCLTroKV0U#scrollTo=PeAC18pQQlF3&printMode=true 9/9

Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
Aiml
No ratings yet
Aiml
27 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
15 - 11 - 24 - SVM - Jupyter Notebook
No ratings yet
15 - 11 - 24 - SVM - Jupyter Notebook
5 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Name:Fedrick Samuel W Reg No: 19MIS1112 Course: Machine Learning (SWE4012) Slot: L11 + L12 Faculty: Dr.M. Premalatha
No ratings yet
Name:Fedrick Samuel W Reg No: 19MIS1112 Course: Machine Learning (SWE4012) Slot: L11 + L12 Faculty: Dr.M. Premalatha
30 pages
Data Preparation
No ratings yet
Data Preparation
2 pages
M7 Muhammad Sandhi Khadafi 2KB04 (20122007)
No ratings yet
M7 Muhammad Sandhi Khadafi 2KB04 (20122007)
16 pages
EDA Python Code Cheatsheets
No ratings yet
EDA Python Code Cheatsheets
52 pages
Salary Prediction
No ratings yet
Salary Prediction
32 pages
Online Food Orders Analysis Using Python
No ratings yet
Online Food Orders Analysis Using Python
12 pages
2 Tekrek M7 KNN - DGX 1
No ratings yet
2 Tekrek M7 KNN - DGX 1
15 pages
Riandhika Vianto (17818821) - Weeks 6
No ratings yet
Riandhika Vianto (17818821) - Weeks 6
8 pages
Eda - 1@3pm 8th Nov
No ratings yet
Eda - 1@3pm 8th Nov
2 pages
DW 14
No ratings yet
DW 14
14 pages
LDA Code
No ratings yet
LDA Code
19 pages
2IA02 Fauzan Ramadhan
No ratings yet
2IA02 Fauzan Ramadhan
10 pages
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
No ratings yet
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
16 pages
188 Code Tugas 1
No ratings yet
188 Code Tugas 1
18 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
Loan Prediction
No ratings yet
Loan Prediction
33 pages
Capstone Project - Employee Attrition Rate
No ratings yet
Capstone Project - Employee Attrition Rate
66 pages
Code
No ratings yet
Code
3 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Kunal DA-12 Assignment-4
No ratings yet
Kunal DA-12 Assignment-4
26 pages
Animesh Jain
No ratings yet
Animesh Jain
13 pages
ML Project
No ratings yet
ML Project
112 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Unit7 Working With Pandas - Solved
No ratings yet
Unit7 Working With Pandas - Solved
12 pages
Assignment Ds Midterm
No ratings yet
Assignment Ds Midterm
2 pages
Pandas PDF
No ratings yet
Pandas PDF
6 pages
DACLUSTER
No ratings yet
DACLUSTER
9 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Adult Census Income Prediction
100% (1)
Adult Census Income Prediction
31 pages
Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
Credit Pruned and Cleaned
No ratings yet
Credit Pruned and Cleaned
37 pages
Suicide Analysis
No ratings yet
Suicide Analysis
18 pages
Family Main
No ratings yet
Family Main
5 pages
Satya772244@gmail Compdf
No ratings yet
Satya772244@gmail Compdf
7 pages
Predicting Credit Risk 1713295035
No ratings yet
Predicting Credit Risk 1713295035
19 pages
DSBDA3 - Jupyter Notebook
No ratings yet
DSBDA3 - Jupyter Notebook
12 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
Ads Exam 21c3
No ratings yet
Ads Exam 21c3
22 pages
Project3: Loading Library
No ratings yet
Project3: Loading Library
17 pages
3 Mesures of Central Tendrncy
No ratings yet
3 Mesures of Central Tendrncy
10 pages
Numpy
No ratings yet
Numpy
9 pages
Project 3 Thera Bank
100% (1)
Project 3 Thera Bank
24 pages
MKT Data2
No ratings yet
MKT Data2
98 pages
DS 8
No ratings yet
DS 8
6 pages
Pengambilan Data: DAN IMPORT PACKAGE: Import As Import As Import As Import As Import From Import From Import
No ratings yet
Pengambilan Data: DAN IMPORT PACKAGE: Import As Import As Import As Import As Import From Import From Import
7 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Day89 90 Loan Predictions Model 1706059551
No ratings yet
Day89 90 Loan Predictions Model 1706059551
25 pages
00 - Project - Your First Data Science Project - Jupyter Notebook
No ratings yet
00 - Project - Your First Data Science Project - Jupyter Notebook
8 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Cleaning Data
No ratings yet
Cleaning Data
18 pages
Solution Manual Advanced Accounting 11e by Beams 03 Chapter
50% (2)
Solution Manual Advanced Accounting 11e by Beams 03 Chapter
22 pages
B.S Answers PR
No ratings yet
B.S Answers PR
5 pages
XII MS ACCOUNTANCY CPB P2 2k24
No ratings yet
XII MS ACCOUNTANCY CPB P2 2k24
12 pages
KFC Project Corp
No ratings yet
KFC Project Corp
14 pages
Buybacks and Delisting
100% (2)
Buybacks and Delisting
23 pages
SAP Assest Depreciation Calculation Explained
No ratings yet
SAP Assest Depreciation Calculation Explained
7 pages
Marketing Assignment
No ratings yet
Marketing Assignment
13 pages
The Real Options Model of Land Value and Development Project Valuation
No ratings yet
The Real Options Model of Land Value and Development Project Valuation
109 pages
Financial Ratios
No ratings yet
Financial Ratios
8 pages
Company Final Accounts PDF
64% (11)
Company Final Accounts PDF
31 pages
aSSIGNMENT 1 BUSINESS sTRAtegy
No ratings yet
aSSIGNMENT 1 BUSINESS sTRAtegy
2 pages
Structure and Organization of BCEL
No ratings yet
Structure and Organization of BCEL
5 pages
Partnership Liquidation
No ratings yet
Partnership Liquidation
3 pages
Analisis Penerapan Akuntansi Keuangan Desa (Studi Kasus Pada Kantor Desa Suka Makmur Kecamatan Gunung Sahilan)
No ratings yet
Analisis Penerapan Akuntansi Keuangan Desa (Studi Kasus Pada Kantor Desa Suka Makmur Kecamatan Gunung Sahilan)
64 pages
Effect of Credit Risk Management On Market Performance of Listed Deposit Money Banks in Nigeria
No ratings yet
Effect of Credit Risk Management On Market Performance of Listed Deposit Money Banks in Nigeria
87 pages
JPM 1997 409612
No ratings yet
JPM 1997 409612
13 pages
Sharpe Ratio: Finance Risk-Free Asset Risk Risk-Free Return Standard Deviation William F. Sharpe 1966
No ratings yet
Sharpe Ratio: Finance Risk-Free Asset Risk Risk-Free Return Standard Deviation William F. Sharpe 1966
7 pages
By Bhavana Ramya Das Sarunya Amulin.T
No ratings yet
By Bhavana Ramya Das Sarunya Amulin.T
15 pages
Accounting For Partnership
No ratings yet
Accounting For Partnership
14 pages
MCQ's Treasure
No ratings yet
MCQ's Treasure
144 pages
Handout Statement of Financial Positions
No ratings yet
Handout Statement of Financial Positions
5 pages
Cambridge O Level: Accounting 7707/22 October/November 2022
No ratings yet
Cambridge O Level: Accounting 7707/22 October/November 2022
16 pages
Financial Performance
No ratings yet
Financial Performance
2 pages
5 Interest Rates
No ratings yet
5 Interest Rates
50 pages
Set B Instructions: Choose The BEST Answer For Each of The Following Items. Mark Only One
No ratings yet
Set B Instructions: Choose The BEST Answer For Each of The Following Items. Mark Only One
15 pages
Illustrations On Cost of Capital
No ratings yet
Illustrations On Cost of Capital
5 pages
Avanse Annual FY23
No ratings yet
Avanse Annual FY23
229 pages
Grade 10 Learner Marking Guideline
No ratings yet
Grade 10 Learner Marking Guideline
19 pages
Chapter 7
0% (1)
Chapter 7
5 pages
Bank Performance Analysis-INDUSIND BANK: Particulars Mar-16
No ratings yet
Bank Performance Analysis-INDUSIND BANK: Particulars Mar-16
26 pages

Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory

Uploaded by

Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory

Uploaded by

9/20/23, 11:44 AM Samana Tatheer-Assign 7-20U00323.

df323= pd.read_csv("/content/census (1).csv")

count 48842.000000 48842 4.884200e+04 48842 48842.000000

unique NaN 9 NaN 16 NaN

freq NaN 33906 NaN 15784 NaN

mean 38.643585 NaN 1.896641e+05 NaN 10.078089

std 13.710510 NaN 1.056040e+05 NaN 2.570973

min 17.000000 NaN 1.228500e+04 NaN 1.000000

25% 28.000000 NaN 1.175505e+05 NaN 9.000000

50% 37.000000 NaN 1.781445e+05 NaN 10.000000

df323['capital-gain']= df323['capital-gain'].replace(upper, np.NaN)

count 48842.000000 48842 48842 48842.000000 48842 48

unique NaN 9 16 NaN 7

freq NaN 33906 15784 NaN 22379 6

mean 38.643585 NaN NaN 10.078089 NaN

std 13.710510 NaN NaN 2.570973 NaN

min 17.000000 NaN NaN 1.000000 NaN

25% 28.000000 NaN NaN 9.000000 NaN

50% 37.000000 NaN NaN 10.000000 NaN

import matplotlib.pyplot as plt

Text(0.5, 1.0, 'Average capital gain among males and females')

income race hours-per-week

0 <=50K Amer-Indian-Eskimo 39.816867

1 <=50K Asian-Pac-Islander 38.012613

2 <=50K Black 37.824958

3 <=50K Other 38.488764

4 <=50K White 38.994736

5 >50K Amer-Indian-Eskimo 43.709091

6 >50K Asian-Pac-Islander 44.965770

7 >50K Black 44.222615

8 >50K Other 44.280000

from IPython.display import IFrame

df= df323[['hours-per-week', 'capital-gain','capital-loss','gender','income', 'race']]

hours-per- capital- capital-

0 40 0.0 0 Male <=50K Black

1 50 0.0 0 Male <=50K White

2 40 0.0 0 Male >50K White

3 40 7688.0 0 Male >50K Black

4 30 0.0 0 Female <=50K White

... ... ... ... ... ... ...

48837 38 0.0 0 Female <=50K White

48838 40 0.0 0 Male >50K White

48839 40 0.0 0 Female <=50K White

48840 20 0.0 0 Male <=50K White

48841 40 15024 0 0 F l 50K Whit

Asian-Pac-Islander Black Other White

data_final = pd.concat([df323[['hours-per-week','capital-gain','capital-loss']], df_gender,df_race,df_income],axis=1)

... ... ... ... ... ... ... ... ...

... ... ... ... ... ... ... ... ...

from pandas import read_csv

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2, random_state=5)

ValueError: Input X contains NaN.

You might also like