Python Scikit-Learn Cheat Sheet For Machine Learning
Python Scikit-Learn Cheat Sheet For Machine Learning
Let’s create a basic example using scikit-learn library which will be used to
import numpy as np
X = np.random.random((10,5))
y = np.array(['M','M','F','F','M','F','M','M','F','F','F'])
X[X < 0.7] = 0
Standardization
Data standardization is one of the data preprocessing step which is used for
rescaling one or more attributes so that the attributes have a mean value of 0 and a
standard deviation of 1. Standardization assumes that your data has a Gaussian
(bell curve) distribution.
Binarization
Binarization is a common operation performed on text count data. Using binarization
the analyst can decide to consider the presence or absence of a feature rather than
having a quantified number of occurrences for instance.
Normalization
Normalization is a technique generally used for data preparation for machine
learning. The main goal of normalization is to change the values of numeric columns
in the dataset so that we can have a common scale, without losing the information
or distorting the differences in the ranges of values.