Categorical Variable
Categorical Variable
easiest way is to drop them from the dataset , only work if column contains
no useful information
2. Label Encoding
3. One-Hot Encoding
One-hot encoding creates new columns indicating the presence (or absence) of
each possible value in the original data.
In contrast to label encoding, one-hot encoding does not assume an ordering
of the categories.
We refer to categorical variables without an intrinsic ranking as nominal
variables.
One-hot encoding generally does not perform well if the categorical variable
takes on a large number of values
(i.e., you generally won't use it for variables taking more than 15 different
values).
s=(X_train.dtypes == 'object')
object_cols = list(s[s].index)
print("Categorical variables")
print(object_cols)
We use the OneHotEncoder class from scikit-learn to get one-hot encodings. There
are a number of parameters that can be used to customize its behavior.