0% found this document useful (0 votes)
33 views

Data Processing

The document provides instructions for preprocessing an 'Absenteeism_data.csv' file for analysis. It instructs the reader to: 1) Drop the 'ID' column, split the 'Reason for Absence' into dummy variables grouped into 4 categories, and drop the 'Reason for Absence' column. 2) Extract the month and day of week from the 'Date' column before dropping the 'Date' column. 3) Transform the 'Education' column into binary data by mapping specific values. The final preprocessed data should match the 'df_preprocessed.csv' file.

Uploaded by

Sandesh More
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Data Processing

The document provides instructions for preprocessing an 'Absenteeism_data.csv' file for analysis. It instructs the reader to: 1) Drop the 'ID' column, split the 'Reason for Absence' into dummy variables grouped into 4 categories, and drop the 'Reason for Absence' column. 2) Extract the month and day of week from the 'Date' column before dropping the 'Date' column. 3) Transform the 'Education' column into binary data by mapping specific values. The final preprocessed data should match the 'df_preprocessed.csv' file.

Uploaded by

Sandesh More
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Preprocessing

Homework

Examine the ‘Absenteeism_data.csv’ carefully. Then, use the following as a guide to how you should prepare the data for further
analysis:

• Drop the ‘ID’ column


• Split the reasons for absence into multiple dummy variables, and then group them in the following way:
➢ Group 1: Columns 1 to 14
➢ Group 2: Columns 15, 16, and 17
➢ Group 3: Columns 18, 19, 20, and 21
➢ Group 4: Columns 22 to 28
• After you’ve done that, don’t forget to drop the ‘Reason for Absence’ column.
• Extract the month value and the day of the week from the ‘Date’ column. Then, drop the ‘Date’ column as well.
• Turn the data from the ‘Education’ column into binary data, by mapping the value of 0 to the values of 1, and the value of 1 to
the rest of the values found in this column.

Don’t forget to create checkpoints as you go. If you have worked correctly, the final version of your DataFrame should contain
the same data as the one stored in the ‘df_preprocessed.csv’ file.

Good luck!

You might also like