Activity 2 M3 Data Management and Preprocessing
Activity 2 M3 Data Management and Preprocessing
Definition: The process of converting data from its original format to another format that is
more useful, cleaner, or suitable for analysis or processing. This may include:
Practical example:
Transformation: Extract only the year, month or day for temporal analysis.
df['year'] = pd.to_datetime(df['fecha']).dt.year
• Another example: You have a column with categorical values such as “low”,
“medium”, “high”.
Transformation:
Data Normalization
Definition:
The process of scaling numerical values of data to fall within a specific range, such as [0, 1] or
with mean 0 and standard deviation 1. This is especially important for machine learning
algorithms sensitive to the scale of the data (such as k-NN, neural networks, SVM, etc.).
Practical example:
You have a column with salaries ranging from 20,000 to 150,000. Another column represents
age, between 18 and 65. Differences in scale can affect the model.
scaler = MinMaxScaler()
scaler = StandardScaler()