KPMG Data Quality Assessment
KPMG Data Quality Assessment
Transactions Sheet
- Provided data - 20000
1. There are a lot of missing data in this sheet.
360 online order, 197 brand, 197 product line, 197 product class, 197 product
size , 197 standard cost, 197 product first sold date data are missing.
2. The missing data in brand, product line, product class, product size , standard cost and
first sold date are equal so after looking into it, all missing data belong to same set of
values., so these rows could be removed, as they won't be of much help.
3. The values in online order column are integers, so the missing values in this column
could create some problems, but as we have more than 20000 data here, and only 358
missing values, it might not cause damage if we removed this data as well.
4. The column product first sold date does not seem to be clear. I does not look like date
and not like price as well. It seems ambiguous.
5. The standard cost and list price of many products differ a lot by huge margin. This is
truly concerning.
6. Looking to all the data from brand, product line, product size, product class, the data do
not have any anomalies. The data such as Solex brand was bought the most, Standard
product was purchased the most, medium product class and medium product size were
purchased the most.
7. The data are for the transactions performed in 2017, newer data could help to analyze
more. Still we can see which time of the year more transactions were performed.
8. After looking to the number of orders in terms of month, we can see that maximum
orders were done in October and least in June.
Regards,
Abinash Bajgain