0% found this document useful (0 votes)
3 views1 page

Activity 2 M3 Data Management and Preprocessing

Data transformation involves converting data into a more useful format for analysis, including changing data types and applying functions. An example includes extracting date components or mapping categorical values to numeric. Data normalization scales numerical values to a specific range, which is crucial for machine learning algorithms, with methods like min-max scaling and standardization provided as examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views1 page

Activity 2 M3 Data Management and Preprocessing

Data transformation involves converting data into a more useful format for analysis, including changing data types and applying functions. An example includes extracting date components or mapping categorical values to numeric. Data normalization scales numerical values to a specific range, which is crucial for machine learning algorithms, with methods like min-max scaling and standardization provided as examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Transformation

Definition: The process of converting data from its original format to another format that is
more useful, cleaner, or suitable for analysis or processing. This may include:

• Changing data types (e.g., from text to number).


• Joining or splitting columns.
• Applying mathematical or logical functions.
• Converting categorical values to numeric (one-hot encoding, for example).

Practical example:

Suppose you have a database with dates in text format: '2025-05-15'.

Transformation: Extract only the year, month or day for temporal analysis.

df['year'] = pd.to_datetime(df['fecha']).dt.year

• Another example: You have a column with categorical values such as “low”,
“medium”, “high”.

Transformation:

Replace these values with numbers so you can analyze them:

df['level'] = df['level'].map({'low': 1, 'medium': 2, 'high': 3})

Data Normalization

Definition:

The process of scaling numerical values of data to fall within a specific range, such as [0, 1] or
with mean 0 and standard deviation 1. This is especially important for machine learning
algorithms sensitive to the scale of the data (such as k-NN, neural networks, SVM, etc.).

Practical example:

You have a column with salaries ranging from 20,000 to 150,000. Another column represents
age, between 18 and 65. Differences in scale can affect the model.

Normalization min-max (scale 0 to 1):

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

df[['salario', 'edad']] = scaler.fit_transform(df[['salario', 'edad']])

Standardization (mean 0, standard deviation 1):

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

df[['salary', 'age']] = scaler.fit_transform(df[['salary', 'age']])

You might also like