0% found this document useful (0 votes)
14 views

4.4. Data Standardization - Ipynb - Colaboratory

The document discusses data standardization using the StandardScaler. It loads breast cancer data, splits it into training and test sets, then standardizes the training data using the StandardScaler. The StandardScaler transforms the training data to have mean 0 and standard deviation 1 based on the training data statistics. It then transforms the test data using the same parameters to put it on the same scale as the training data.

Uploaded by

lokesh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

4.4. Data Standardization - Ipynb - Colaboratory

The document discusses data standardization using the StandardScaler. It loads breast cancer data, splits it into training and test sets, then standardizes the training data using the StandardScaler. The StandardScaler transforms the training data to have mean 0 and standard deviation 1 based on the training data statistics. It then transforms the test data using the same parameters to put it on the same scale as the training data.

Uploaded by

lokesh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

StandardScaler(copy=True, with_mean=True, with_std=True)

Data Standardization:

The process of standardizing the data to a common format and common range X_train_standardized = scaler.transform(X_train)

import numpy as np print(X_train_standardized)


import pandas as pd
import sklearn.datasets [[ 1.40381088 1.79283426 1.37960065 ... 1.044121 0.52295995
from sklearn.preprocessing import StandardScaler 0.64990763]
from sklearn.model_selection import train_test_split [ 1.16565505 -0.14461158 1.07121375 ... 0.5940779 0.44153782
-0.85281516]
[-0.0307278 -0.77271123 -0.09822185 ... -0.64047556 -0.31161687
# loading the dataset -0.69292805]
dataset = sklearn.datasets.load_breast_cancer() ...
[ 1.06478904 0.20084323 0.89267396 ... 0.01694621 3.06583565
-1.29952679]
# loading the data to a pandas dataframe [ 1.51308238 2.3170559 1.67987211 ... 1.14728703 -0.16599653
df = pd.DataFrame(dataset.data, columns=dataset.feature_names) 0.82816016]
[-0.73678981 -1.02636686 -0.74380549 ... -0.31826862 -0.40713129
-0.38233653]]
df.head()

X_test_standardized = scaler.transform(X_test)
mean mean
mean mean mean mean mean mean mean mean radius texture perimeter
concave fractal
radius texture perimeter area smoothness compactness concavity symmetry error error error e
points dimension print(X_train_standardized.std())

0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 1.0950 0.9053 8.589 1 1.0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 0.5435 0.7339 3.398

2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 0.7456 0.7869 4.585 print(X_test_standardized.std())

3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 0.4956 1.1560 3.445 0.8654541077212674

4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 0.7572 0.7813 5.438

df.shape

(569, 30)

X = df
Y = dataset.target

print(X)

mean radius mean texture ... worst symmetry worst fractal dimension
0 17.99 10.38 ... 0.4601 0.11890
1 20.57 17.77 ... 0.2750 0.08902
2 19.69 21.25 ... 0.3613 0.08758
3 11.42 20.38 ... 0.6638 0.17300
4 20.29 14.34 ... 0.2364 0.07678
.. ... ... ... ... ...
564 21.56 22.39 ... 0.2060 0.07115
565 20.13 28.25 ... 0.2572 0.06637
566 16.60 28.08 ... 0.2218 0.07820
567 20.60 29.33 ... 0.4087 0.12400
568 7.76 24.54 ... 0.2871 0.07039

[569 rows x 30 columns]

Splitting the data into training data and test data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)

print(X.shape, X_train.shape, X_test.shape)

(569, 30) (455, 30) (114, 30)

Standardize the data

print(dataset.data.std())

account_circle 228.29740508276657
Code Text
scaler = StandardScaler()

scaler.fit(X_train)

You might also like