Data Preprocess Steps
Data Preprocess Steps
1. data read, shape, sample, remove unnecessary columns (eg. ID, Name etc),
describe() data, info(), nunique()
2. check datatype of every column with the sample value in that column
* check for numerical column should be int64 / float64 type and categorical
should be categorical
* if datatype is object for numerical col then convert it using pd.to_numeric
which will replace the non numerical values to NaN.
* categorical columns in X (independent variable) should be converted numbers
using one hot encoding and categorical values in Y (target variable) should also be
converted to numeric using manual replace or label encoding , but if we use label
encoding it produces hierarch randomly given more import to other class give biased
result in prediction.
Nominal Data – Nominal data is a basic data type that categorizes data by
labeling or naming values such as Gender, hair color, or types of animal. It does
not have any hierarchy.
Ordinal Data – Ordinal data involves classifying data based on rank, such as
social status in categories like ‘wealthy’, ‘middle income’, or ‘poor’. However,
there are no set intervals between these categories.