0% found this document useful (0 votes)

5 views3 pages

Bin Ar Ization

The document outlines a process for analyzing a dataset using a Decision Tree Classifier with Python's sklearn library. It includes data preprocessing steps such as handling missing values, creating a new feature for family size, and applying binarization. The model's accuracy is evaluated using cross-validation and accuracy scores before and after applying transformations.

Uploaded by

Rudraksh Amar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views3 pages

Bin Ar Ization

Uploaded by

Rudraksh Amar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

In [28]:

import numpy as np
import pandas as pd

In [29]:
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

from sklearn.compose import ColumnTransformer

In [30]:
df = pd.read_csv('train.csv')[['Age','Fare','SibSp','Parch','Survived']]

In [31]:
df.dropna(inplace=True)

In [32]:
df.head()

Out[32]: Age Fare SibSp Parch Survived

0 22.0 7.2500 1 0 0

1 38.0 71.2833 1 0 1

2 26.0 7.9250 0 0 1

3 35.0 53.1000 1 0 1

4 35.0 8.0500 0 0 0

In [33]:
df['family'] = df['SibSp'] + df['Parch']

In [34]:
df.head()

Out[34]: Age Fare SibSp Parch Survived family

0 22.0 7.2500 1 0 0 1

1 38.0 71.2833 1 0 1 1

2 26.0 7.9250 0 0 1 0

3 35.0 53.1000 1 0 1 1

4 35.0 8.0500 0 0 0 0

In [35]:
df.drop(columns=['SibSp','Parch'],inplace=True)

In [36]:
df.head()
Out[36]: Age Fare Survived family

0 22.0 7.2500 0 1

1 38.0 71.2833 1 1

2 26.0 7.9250 1 0

3 35.0 53.1000 1 1

4 35.0 8.0500 0 0

In [37]:
X = df.drop(columns=['Survived'])
y = df['Survived']

In [38]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_stat

In [39]:
X_train.head()

Out[39]: Age Fare family

328 31.0 20.5250 2

73 26.0 14.4542 1

253 30.0 16.1000 1

719 33.0 7.7750 0

666 25.0 13.0000 0

In [40]:
# Without binarization

clf = DecisionTreeClassifier()

clf.fit(X_train,y_train)

y_pred = clf.predict(X_test)

accuracy_score(y_test,y_pred)

Out[40]: 0.6293706293706294

In [41]:
np.mean(cross_val_score(DecisionTreeClassifier(),X,y,cv=10,scoring='accuracy')

Out[41]: 0.6429381846635367

In [20]:
# Applying Binarization

from sklearn.preprocessing import Binarizer

In [42]: trf = ColumnTransformer([
('bin',Binarizer(copy=False),['family'])
],remainder='passthrough')

In [43]:
X_train_trf = trf.fit_transform(X_train)
X_test_trf = trf.transform(X_test)

In [44]:
pd.DataFrame(X_train_trf,columns=['family','Age','Fare'])

Out[44]: family Age Fare

0 1.0 31.0 20.5250

1 1.0 26.0 14.4542

2 1.0 30.0 16.1000

3 0.0 33.0 7.7750

4 0.0 25.0 13.0000

... ... ... ...

566 1.0 46.0 61.1750

567 0.0 25.0 13.0000

568 0.0 41.0 134.5000

569 1.0 33.0 20.5250

570 0.0 33.0 7.8958

571 rows × 3 columns

In [45]:
clf = DecisionTreeClassifier()
clf.fit(X_train_trf,y_train)
y_pred2 = clf.predict(X_test_trf)

accuracy_score(y_test,y_pred2)

Out[45]: 0.6363636363636364

In [46]:
X_trf = trf.fit_transform(X)
np.mean(cross_val_score(DecisionTreeClassifier(),X_trf,y,cv=10,scoring='accura

Out[46]: 0.6304186228482003

In [ ]:

Python Can
No ratings yet
Python Can
174 pages
The Internet of Things: Architecture and Applications (ELEC423)
No ratings yet
The Internet of Things: Architecture and Applications (ELEC423)
48 pages
MTH603 Final Term Solved MCQ's
No ratings yet
MTH603 Final Term Solved MCQ's
9 pages
SIM7100 SIM7500 SIM7600 Sleep Mode Application Note V1.01
No ratings yet
SIM7100 SIM7500 SIM7600 Sleep Mode Application Note V1.01
11 pages
Thesis Statement Worksheet 5th Grade
100% (2)
Thesis Statement Worksheet 5th Grade
4 pages
Modbus RS485 Communications Wilo Pumps
No ratings yet
Modbus RS485 Communications Wilo Pumps
40 pages
IAP301 SE161501 Lab2docx
No ratings yet
IAP301 SE161501 Lab2docx
5 pages
Assignment 2
No ratings yet
Assignment 2
10 pages
Bhopal XII CS QP - PRE TERM END 2
No ratings yet
Bhopal XII CS QP - PRE TERM END 2
4 pages
Flutter User Interface Using Scaffolds
No ratings yet
Flutter User Interface Using Scaffolds
36 pages
Long Password DOS Attack 1702916027
No ratings yet
Long Password DOS Attack 1702916027
9 pages
PV Inverter Thesis
100% (1)
PV Inverter Thesis
7 pages
1 KNN - Jupyter Notebook
No ratings yet
1 KNN - Jupyter Notebook
3 pages
Oracle Lab 3
No ratings yet
Oracle Lab 3
18 pages
Gnuradio Install
100% (1)
Gnuradio Install
3 pages
IMS Brochure
No ratings yet
IMS Brochure
11 pages
Decision Trees - Jupyter Notebook
No ratings yet
Decision Trees - Jupyter Notebook
4 pages
Lab 04 - Composition
No ratings yet
Lab 04 - Composition
3 pages
Home Work
No ratings yet
Home Work
12 pages
Manual Control Sony Bravia
No ratings yet
Manual Control Sony Bravia
43 pages
Eipl Profile - 24
No ratings yet
Eipl Profile - 24
21 pages
Logistic Regression On Titanic Dataset
No ratings yet
Logistic Regression On Titanic Dataset
6 pages
Ai ML Programs
No ratings yet
Ai ML Programs
34 pages
Practice Test
No ratings yet
Practice Test
12 pages
Practical No 01
No ratings yet
Practical No 01
9 pages
ML 7
No ratings yet
ML 7
6 pages
Free Valentine Homework Pass Printable
100% (1)
Free Valentine Homework Pass Printable
5 pages
ML Final-1
No ratings yet
ML Final-1
7 pages
Assignment 5
No ratings yet
Assignment 5
5 pages
ESB Services API Reference Guide
No ratings yet
ESB Services API Reference Guide
12 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
Naive Baye
No ratings yet
Naive Baye
1 page
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Ex 6
No ratings yet
Ex 6
2 pages
Titanic Survival Prediction ML
No ratings yet
Titanic Survival Prediction ML
36 pages
6 Binary Classifier
No ratings yet
6 Binary Classifier
4 pages
C: Users Dell Downloads Salary - Data - CSV
No ratings yet
C: Users Dell Downloads Salary - Data - CSV
2 pages
MLT 1 - 7 Kanish
No ratings yet
MLT 1 - 7 Kanish
24 pages
Exp 5
No ratings yet
Exp 5
4 pages
ML File
No ratings yet
ML File
13 pages
How To Send Money Without Debit Card On Cash App - Google Search
No ratings yet
How To Send Money Without Debit Card On Cash App - Google Search
1 page
Aiml 5-8
No ratings yet
Aiml 5-8
19 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
Titanic Dataset Model Prediction
No ratings yet
Titanic Dataset Model Prediction
11 pages
Cognizant FAQ
No ratings yet
Cognizant FAQ
21 pages
Python 2
No ratings yet
Python 2
3 pages
Tech Achievements With Photos (IT Batch 2026)
No ratings yet
Tech Achievements With Photos (IT Batch 2026)
23 pages
1st PGM
No ratings yet
1st PGM
10 pages
Pca2 1
No ratings yet
Pca2 1
26 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Codes & Outputs
No ratings yet
Codes & Outputs
9 pages
Slip
No ratings yet
Slip
5 pages
Naive
No ratings yet
Naive
5 pages
Anemia Word
No ratings yet
Anemia Word
7 pages
Naivebayes Labprg2
No ratings yet
Naivebayes Labprg2
3 pages
Complete Project
No ratings yet
Complete Project
43 pages
Predict Student Passfail
No ratings yet
Predict Student Passfail
1 page
Decision Tree
No ratings yet
Decision Tree
2 pages
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
No ratings yet
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
7 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
23x Xbox Full Capture
No ratings yet
23x Xbox Full Capture
3 pages
DC M:C Notes
No ratings yet
DC M:C Notes
233 pages
3 Phase Transformer
No ratings yet
3 Phase Transformer
143 pages
Prakhar - Week 5
No ratings yet
Prakhar - Week 5
8 pages
Ai Int-1
No ratings yet
Ai Int-1
6 pages
AML Lab
No ratings yet
AML Lab
14 pages
01 - LEGRAND - Cable F - UTP - LSZH Cat6A
No ratings yet
01 - LEGRAND - Cable F - UTP - LSZH Cat6A
2 pages
G812 3
No ratings yet
G812 3
9 pages
DSBDA4
No ratings yet
DSBDA4
6 pages
3 Phase Induction Motor 2 Upto Speed Control
No ratings yet
3 Phase Induction Motor 2 Upto Speed Control
52 pages
A Survey On Large Language Model Acceleration Based On KV Cache Management
No ratings yet
A Survey On Large Language Model Acceleration Based On KV Cache Management
43 pages
23BCE7199 ML Lab Assignment
No ratings yet
23BCE7199 ML Lab Assignment
15 pages
Replit Prompt
No ratings yet
Replit Prompt
3 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
PYHTONPRACT
No ratings yet
PYHTONPRACT
4 pages
PRGM 4
No ratings yet
PRGM 4
3 pages
1 10
No ratings yet
1 10
4 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Build A Random Forest Algorithm Aim
No ratings yet
Build A Random Forest Algorithm Aim
3 pages
Program 7
No ratings yet
Program 7
2 pages
Deci Tree Codes
No ratings yet
Deci Tree Codes
2 pages
Automatically Select Imputer Parameters
No ratings yet
Automatically Select Imputer Parameters
5 pages
Binning
No ratings yet
Binning
4 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
AI Assignment-6
No ratings yet
AI Assignment-6
7 pages
1
No ratings yet
1
13 pages
Machine Learning Lab Assignment 1
No ratings yet
Machine Learning Lab Assignment 1
23 pages
Arbitrary Value Imputation.
No ratings yet
Arbitrary Value Imputation.
5 pages
Income Prediction Project by Om Ghadge
No ratings yet
Income Prediction Project by Om Ghadge
2 pages
Day 30
No ratings yet
Day 30
7 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
The Ultimate Guide To Debian 12 1 Dyslexia Friendly Edition Morgan Partridge PDF Download
100% (3)
The Ultimate Guide To Debian 12 1 Dyslexia Friendly Edition Morgan Partridge PDF Download
61 pages

Bin Ar Ization

Uploaded by

Bin Ar Ization

Uploaded by

In [28]:

from sklearn.metrics import accuracy_score

from sklearn.compose import ColumnTransformer

Out[32]: Age Fare SibSp Parch Survived

Out[34]: Age Fare SibSp Parch Survived family

Out[39]: Age Fare family

328 31.0 20.5250 2

253 30.0 16.1000 1

719 33.0 7.7750 0

666 25.0 13.0000 0

from sklearn.preprocessing import Binarizer

Out[44]: family Age Fare

0 1.0 31.0 20.5250

1 1.0 26.0 14.4542

2 1.0 30.0 16.1000

3 0.0 33.0 7.7750

4 0.0 25.0 13.0000

... ... ... ...

566 1.0 46.0 61.1750

567 0.0 25.0 13.0000

568 0.0 41.0 134.5000

569 1.0 33.0 20.5250

570 0.0 33.0 7.8958

571 rows × 3 columns

You might also like