0% found this document useful (0 votes)
50 views2 pages

KPMG Data Quality Assessment

The document provides a summary of an assessment of customer data including demographic, address, and transaction sheets. It notes that there are missing values in key fields like last name, date of birth, job title, and tenure. An abnormal age of 178 was detected. Gender values need normalization. Missing values in fields like job, industry, and date of birth will impact analysis of target customers. The transaction sheet is missing many values for fields like online order, brand, and product details. Standard costs and list prices differ significantly in some cases. Overall the data has inconsistencies and missing values that need to be addressed.

Uploaded by

member2 mtri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views2 pages

KPMG Data Quality Assessment

The document provides a summary of an assessment of customer data including demographic, address, and transaction sheets. It notes that there are missing values in key fields like last name, date of birth, job title, and tenure. An abnormal age of 178 was detected. Gender values need normalization. Missing values in fields like job, industry, and date of birth will impact analysis of target customers. The transaction sheet is missing many values for fields like online order, brand, and product details. Standard costs and list prices differ significantly in some cases. Overall the data has inconsistencies and missing values that need to be addressed.

Uploaded by

member2 mtri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Hi,

I have summarized the assessment of your data and presented


below.

Customer Demographic Sheet


-Provided data - 4000

1. There is a column named 'default' which has no relevant information.


2. There are 125 last names, 87 date of births, 506 job titles, 656 job industry category, and 87
tenure information missing.
3. An abnormal age of 178 years is detected in the data, so while looking further to the data, the
person has been doing a lot of transactions, so we can assume the year might have been
mistyped as 1843 instead of 1943.
4. 3 data in gender column are specified as M, Femal and F, we can assume those as Male, Female
and Female respectively, just mistyped.
5. While looking to parts related to bike purchased column according to age, those whose date of
births or age is unknown, they seem to contribute a significant amount. so that part might play a
big role in analyzing the target customers. So, this data needs to be filled.
6. Same goes with job titles and job industry category columns. We could analyze customer
involved in what job category or job titles are more likely to buy the product. We have
significant amount of data missing in these two fields.
7. Wealth segment seems to be alright with mass customers buying most and affluent customers
and customers with high net worth purchasing in almost same proportion.
8. Looking to the deceased customers, 2 customer were found to be deceased. Among them,
customer with id 753 's transactions are available while customer with id 3789 's transaction is
not available. Since, both the customers were involved in buying quite a lot of products for a
individual, the transaction data could be helpful as well.
9. In tenure as well, the missing data could be quite relevant to see if someone who has been
involved for a long term has been purchasing or not. Though the data is not much missing in
tenure section. No abnormalities were found in tenure column.
10. The data is in almost proportion for someone who owns a car and someone who does not.
11. No duplicate data were found inside name and customer id. Although due to absence of last
name for 2 customers, the names were matched but after checking address they were different
people.
Customer Address Sheet
-Provided data- 3999
1. No values are missing in customer address sheet.
2. By looking the shape of address and demographic sheet, we can say not all address of the
customers are available.
3. 3 sets of 2 addresses were found similar, so after checking to it, the postal codes were
different and hence the data are assumed to be accurate.
4. In this sheet, data are found to be ok.

Transactions Sheet
- Provided data - 20000
1. There are a lot of missing data in this sheet.
360 online order, 197 brand, 197 product line, 197 product class, 197 product
size , 197 standard cost, 197 product first sold date data are missing.

2. The missing data in brand, product line, product class, product size , standard cost and
first sold date are equal so after looking into it, all missing data belong to same set of
values., so these rows could be removed, as they won't be of much help.
3. The values in online order column are integers, so the missing values in this column
could create some problems, but as we have more than 20000 data here, and only 358
missing values, it might not cause damage if we removed this data as well.
4. The column product first sold date does not seem to be clear. I does not look like date
and not like price as well. It seems ambiguous.
5. The standard cost and list price of many products differ a lot by huge margin. This is
truly concerning.
6. Looking to all the data from brand, product line, product size, product class, the data do
not have any anomalies. The data such as Solex brand was bought the most, Standard
product was purchased the most, medium product class and medium product size were
purchased the most.
7. The data are for the transactions performed in 2017, newer data could help to analyze
more. Still we can see which time of the year more transactions were performed.
8. After looking to the number of orders in terms of month, we can see that maximum
orders were done in October and least in June.

Regards,

Abinash Bajgain

You might also like