100% found this document useful (1 vote)
238 views

KPMG Data Analytics - Task 1

The document provides recommendations to improve data quality issues in three datasets provided by Sprocket Central including customer demographics, addresses, and transactions. It identifies inconsistent representations of attributes, different data types for fields, and additional customer IDs in some tables that are not present in others. Moving forward, the data will be cleaned, modeled, and analyzed to create insightful reports, and it is recommended to involve Sprocket Central SMEs to ensure assumptions are aligned.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
238 views

KPMG Data Analytics - Task 1

The document provides recommendations to improve data quality issues in three datasets provided by Sprocket Central including customer demographics, addresses, and transactions. It identifies inconsistent representations of attributes, different data types for fields, and additional customer IDs in some tables that are not present in others. Moving forward, the data will be cleaned, modeled, and analyzed to create insightful reports, and it is recommended to involve Sprocket Central SMEs to ensure assumptions are aligned.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

To

Sprocket Central Pty Ltd


Thank you for providing us with the three datasets

• Customer Demographic
• Customer Addresses
• Transactions data in the past 3 months

Furthermore, recommendations have been provided to avoid the reoccurrence of data quality issues
and improve the accuracy of the underlying data used to drive business decisions.

1. The data of same attribute has some inconsistent values

e.g. Victoria being represented as “V”, “Vic” and “Victoria”)

Recommendation: In order to construct meaningful variables for the model, the data has been
cleaned to avoid multiple representations of the same value. Additionally, gender records where ‘U’
have been replaced based on the distribution from the training dataset.

2. DataType of same attribute

There is different data types for a given field make it difficult to interpret results at the later stage.
Therefore, appropriate data transformations are made to ensure consistent data types for a given
field.

3.Customer Demographics

There is additional customer ids in transaction table and Customer address table but it is not present
in the customer master table. Which indicates that the data received may not be in sync with each
other which may skew the analysis results if there are missing data records. Please refer to excel file
‘data_outliers.xlsx’ for the list of outliers between tables.

Moving to the next step will continue with cleaning the data, modelling the data , Analyzing and
transforming to insightful report it would be great to spend some time with your data SME to ensure
that all assumptions are aligned with Sprocket Central’s understanding.

Thanks and Regards

Vandana Prajapati

Junior Data Analyst

You might also like