0% found this document useful (0 votes)

5 views5 pages

Automatically Select Imputer Parameters

The document outlines a machine learning workflow using Python libraries to predict survival on the Titanic dataset. It includes data preprocessing steps such as handling missing values and scaling, followed by the implementation of a logistic regression model with hyperparameter tuning using GridSearchCV. The best parameters and internal cross-validation score are reported, indicating the model's performance.

Uploaded by

Rudraksh Amar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

Automatically Select Imputer Parameters

Uploaded by

Rudraksh Amar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

In [33]:

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, GridSearchCV

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression

In [34]:
df = pd.read_csv('train.csv')

In [35]:
df.head()

Out[35]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599
(Florence
Briggs
Th...

Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0
3101282
Laina

Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450
Henry

In [36]:
df.drop(columns=['PassengerId','Name','Ticket','Cabin'],inplace=True)

In [37]:
df.head()
Out[37]: Survived Pclass Sex Age SibSp Parch Fare Embarked

0 0 3 male 22.0 1 0 7.2500 S

1 1 1 female 38.0 1 0 71.2833 C

2 1 3 female 26.0 0 0 7.9250 S

3 1 1 female 35.0 1 0 53.1000 S

4 0 3 male 35.0 0 0 8.0500 S

In [38]:
X = df.drop(columns=['Survived'])
y = df['Survived']

In [39]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_stat

In [40]:
X_train.head()

Out[40]: Pclass Sex Age SibSp Parch Fare Embarked

30 1 male 40.0 0 0 27.7208 C

10 3 female 4.0 1 1 16.7000 S

873 3 male 47.0 0 0 9.0000 S

182 3 male 9.0 4 2 31.3875 S

876 3 male 20.0 0 0 9.8458 S

In [41]:
numerical_features = ['Age', 'Fare']
numerical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])

categorical_features = ['Embarked', 'Sex']

categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('ohe',OneHotEncoder(handle_unknown='ignore'))
])

In [42]:
preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)
]
)

In [43]:
clf = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression())
])

In [21]:
from sklearn import set_config

set_config(display='diagram')
clf

Out[21]: Pipeline
Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('num',
Pipeline(steps=
[('imputer',

SimpleImputer(strategy='median')),

('scaler',

StandardScaler())]),
['Age', 'Far
e']),
('cat',
Pipeline(steps=
[('imputer',

SimpleImputer(strategy='most_frequent')),

('ohe',

OneHotEncoder(handle_unknown='ignore'))]),
['Embarked', 'S
ex'])])),
('classifier', LogisticRegression())])
preprocessor: ColumnTransformer
ColumnTransformer(transformers=[('num',
Pipeline(steps=[('imputer',
SimpleImputer(s
trategy='median')),
('scaler', Stand
ardScaler())]),
['Age', 'Fare']),
('cat',
Pipeline(steps=[('imputer',
SimpleImputer(s
trategy='most_frequent')),
('ohe',
OneHotEncoder(h
andle_unknown='ignore'))]),
['Embarked', 'Sex'])])
num
['Age', 'Fare']
SimpleImputer
SimpleImputer(strategy='median')
StandardScaler
StandardScaler()
cat
['Embarked', 'Sex']
SimpleImputer
SimpleImputer(strategy='most_frequent')
OneHotEncoder
OneHotEncoder(handle_unknown='ignore')
LogisticRegression
LogisticRegression()

In [44]:
param_grid = {
'preprocessor__num__imputer__strategy': ['mean', 'median'],
'preprocessor__cat__imputer__strategy': ['most_frequent', 'constant'],
'classifier__C': [0.1, 1.0, 10, 100]
}

grid_search = GridSearchCV(clf, param_grid, cv=10)

In [45]:
grid_search.fit(X_train, y_train)

print(f"Best params:")
print(grid_search.best_params_)

Best params:
{'classifier__C': 1.0, 'preprocessor__cat__imputer__strategy': 'most_frequ
ent', 'preprocessor__num__imputer__strategy': 'mean'}

In [26]:
print(f"Internal CV score: {grid_search.best_score_:.3f}")

Internal CV score: 0.788

In [46]:
import pandas as pd

cv_results = pd.DataFrame(grid_search.cv_results_)
cv_results = cv_results.sort_values("mean_test_score", ascending=False)
cv_results[['param_classifier__C','param_preprocessor__cat__imputer__strategy'
Out[46]: param_classifier__C param_preprocessor__cat__imputer__strategy param_preprocess

4 1 most_frequent

5 1 most_frequent

6 1 constant

7 1 constant

8 10 most_frequent

9 10 most_frequent

10 10 constant

11 10 constant

12 100 most_frequent

13 100 most_frequent

14 100 constant

15 100 constant

0 0.1 most_frequent

1 0.1 most_frequent

2 0.1 constant

3 0.1 constant

In [ ]:

E Learning Answer Key
86% (21)
E Learning Answer Key
9 pages
Bring-Your-Own-Device ("Byod") Acceptable Use Policy
No ratings yet
Bring-Your-Own-Device ("Byod") Acceptable Use Policy
7 pages
Logistic Regression On Titanic Dataset
No ratings yet
Logistic Regression On Titanic Dataset
6 pages
Multi Classification.py(for 1 Class Tp,Tn,Fp,Fn)
No ratings yet
Multi Classification.py(for 1 Class Tp,Tn,Fp,Fn)
25 pages
Titanic Dataset Model Prediction
No ratings yet
Titanic Dataset Model Prediction
11 pages
ML File 211173
No ratings yet
ML File 211173
19 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
DataWare Housing Asg01 Shaheer Zia Qazi-47-2
No ratings yet
DataWare Housing Asg01 Shaheer Zia Qazi-47-2
9 pages
Making - End - To - End - Project - Without - Pipeline - Jupyter Notebook
No ratings yet
Making - End - To - End - Project - Without - Pipeline - Jupyter Notebook
5 pages
Ml practicals
No ratings yet
Ml practicals
21 pages
2795529-Python One Hot Encoding
No ratings yet
2795529-Python One Hot Encoding
2 pages
DL Assignment 1
No ratings yet
DL Assignment 1
7 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
ML
No ratings yet
ML
30 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Machine Learning Lab (17CSL76)
No ratings yet
Machine Learning Lab (17CSL76)
48 pages
decision tree
No ratings yet
decision tree
2 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
AI lab 1
No ratings yet
AI lab 1
7 pages
EDA Pipeline Final
No ratings yet
EDA Pipeline Final
7 pages
Ml_Lab_Manual
No ratings yet
Ml_Lab_Manual
70 pages
Instructions:: Mltest2question - Jupyter Notebook
No ratings yet
Instructions:: Mltest2question - Jupyter Notebook
6 pages
Da Program
No ratings yet
Da Program
18 pages
1-10
No ratings yet
1-10
4 pages
TP.ipynb - Colab
No ratings yet
TP.ipynb - Colab
6 pages
DOC-20250211-WA0009. (1)
No ratings yet
DOC-20250211-WA0009. (1)
26 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
28 pages
p7
No ratings yet
p7
5 pages
Practical No 01
No ratings yet
Practical No 01
9 pages
command_classifier
No ratings yet
command_classifier
4 pages
ML - LAB - FILE Pankaj
No ratings yet
ML - LAB - FILE Pankaj
13 pages
Data_preprocessing_example_programs1
No ratings yet
Data_preprocessing_example_programs1
9 pages
Homework2
No ratings yet
Homework2
12 pages
ML - LAB - FILE Amrit
No ratings yet
ML - LAB - FILE Amrit
13 pages
Machine Learning Laboratory Manual
No ratings yet
Machine Learning Laboratory Manual
11 pages
Random Forest/Roc&Auc - Hyperparamer Tuning With For Loop - TITANIC DB
No ratings yet
Random Forest/Roc&Auc - Hyperparamer Tuning With For Loop - TITANIC DB
17 pages
MLWP LAB Experiment's
No ratings yet
MLWP LAB Experiment's
11 pages
Machine Learning Laboratory (21AIL66)
No ratings yet
Machine Learning Laboratory (21AIL66)
7 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
ML (1)(LAB)
No ratings yet
ML (1)(LAB)
51 pages
Code 1
No ratings yet
Code 1
3 pages
MLT(1)
No ratings yet
MLT(1)
18 pages
ML Lab Report
No ratings yet
ML Lab Report
8 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
1
No ratings yet
1
13 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
ML Lab File Batch 1
No ratings yet
ML Lab File Batch 1
20 pages
C121 Exp1
No ratings yet
C121 Exp1
32 pages
amll
No ratings yet
amll
1 page
2-1
No ratings yet
2-1
24 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
Python for Machine Learning
No ratings yet
Python for Machine Learning
33 pages
ad3461-ml-lab-manual-format-edited
No ratings yet
ad3461-ml-lab-manual-format-edited
45 pages
178 - NaiveBaye's.ipynb - Colab
No ratings yet
178 - NaiveBaye's.ipynb - Colab
3 pages
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
No ratings yet
Machine Learning Techniques Lab: Session: 2023-24, Even Semester
20 pages
Arbitrary Value Imputation.
No ratings yet
Arbitrary Value Imputation.
5 pages
pipeline test pickle
No ratings yet
pipeline test pickle
2 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
Bin Ar Ization
No ratings yet
Bin Ar Ization
3 pages
3 Phase Transformer
No ratings yet
3 Phase Transformer
143 pages
DC M:c notes
No ratings yet
DC M:c notes
233 pages
3 Phase Induction Motor 2 Upto Speed Control
No ratings yet
3 Phase Induction Motor 2 Upto Speed Control
52 pages
My Awesome React Redux Structure PDF
No ratings yet
My Awesome React Redux Structure PDF
26 pages
Alm CHANAKYA AND RAKSHASA
No ratings yet
Alm CHANAKYA AND RAKSHASA
3 pages
Security and Ethics: Understanding Operating Systems, Fourth Edition
No ratings yet
Security and Ethics: Understanding Operating Systems, Fourth Edition
48 pages
CENG 291 A Comprehensive Guide, CESA-1
No ratings yet
CENG 291 A Comprehensive Guide, CESA-1
30 pages
LO2 - Gather Data Through Formal and Informal Process
No ratings yet
LO2 - Gather Data Through Formal and Informal Process
8 pages
MMX 352G USB Modem User Manual
No ratings yet
MMX 352G USB Modem User Manual
26 pages
La Bamba Tab by Ritchie Valens Tabs at Ultimate Guitar Archive PDF
100% (1)
La Bamba Tab by Ritchie Valens Tabs at Ultimate Guitar Archive PDF
2 pages
VoLTE E2e Optimization
No ratings yet
VoLTE E2e Optimization
56 pages
SSP Appendix A High FedRAMP Security Controls
No ratings yet
SSP Appendix A High FedRAMP Security Controls
531 pages
Programming Assignment-1: Operating System (CS F372)
No ratings yet
Programming Assignment-1: Operating System (CS F372)
2 pages
Cs3691-Unit 1
No ratings yet
Cs3691-Unit 1
37 pages
UNIT5 ECE MC4TH SEM
No ratings yet
UNIT5 ECE MC4TH SEM
54 pages
S3DO Briefing
No ratings yet
S3DO Briefing
16 pages
RL SEM ANS
No ratings yet
RL SEM ANS
90 pages
Digital Communication KEC601
No ratings yet
Digital Communication KEC601
2 pages
Emp Tech 1ST Periodical Exam
No ratings yet
Emp Tech 1ST Periodical Exam
4 pages
Log
No ratings yet
Log
21 pages
MOCK TEST 4
No ratings yet
MOCK TEST 4
2 pages
Online Classes Time Table 2nd Sem 2020-21
No ratings yet
Online Classes Time Table 2nd Sem 2020-21
2 pages
Atmel Studio 7
100% (1)
Atmel Studio 7
360 pages
DLD Equipment List
100% (1)
DLD Equipment List
3 pages
333962_Competitive_Landscape__Intelligent_Document_Processing_Platform_Providers
No ratings yet
333962_Competitive_Landscape__Intelligent_Document_Processing_Platform_Providers
19 pages
Help
No ratings yet
Help
2 pages
Temas para Un Ensayo de Sátira
100% (1)
Temas para Un Ensayo de Sátira
7 pages
2024 Gartner Market Guide for API Security
No ratings yet
2024 Gartner Market Guide for API Security
12 pages
BIS - LIMS (Laboratory Information Management System) Dashboard
No ratings yet
BIS - LIMS (Laboratory Information Management System) Dashboard
2 pages
IT Disaster Recovery Plan Template
No ratings yet
IT Disaster Recovery Plan Template
33 pages
REF DF SM Eng
No ratings yet
REF DF SM Eng
2 pages

Automatically Select Imputer Parameters

Uploaded by

Automatically Select Imputer Parameters

Uploaded by

In [33]:

from sklearn.model_selection import train_test_split, GridSearchCV

from sklearn.compose import ColumnTransformer

0 0 3 male 22.0 1 0 7.2500 S

1 1 1 female 38.0 1 0 71.2833 C

2 1 3 female 26.0 0 0 7.9250 S

3 1 1 female 35.0 1 0 53.1000 S

4 0 3 male 35.0 0 0 8.0500 S

Out[40]: Pclass Sex Age SibSp Parch Fare Embarked

30 1 male 40.0 0 0 27.7208 C

10 3 female 4.0 1 1 16.7000 S

873 3 male 47.0 0 0 9.0000 S

182 3 male 9.0 4 2 31.3875 S

876 3 male 20.0 0 0 9.8458 S

categorical_features = ['Embarked', 'Sex']

grid_search = GridSearchCV(clf, param_grid, cv=10)

Internal CV score: 0.788

You might also like