Module 1 Answer PDF

This document summarizes data received from Sprocket Central Pty Ltd, noting data quality issues encountered like inconsistent customer IDs between tables and missing or inconsistent values. It outlines the methods used to mitigate issues, like only using customer records that exist across all tables and standardizing address values. Recommendations are made to enforce input restrictions and data type constraints. The client is asked to validate the data statistics and cleaning assumptions. Further engagement with a data subject matter expert is proposed before continuing model analysis.

Uploaded by

Yeji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

2K views2 pages

Module 1 Answer PDF

Uploaded by

Yeji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Note: The data and information in this document is reflective of a hypothetical situation and client.

This document is to be used for KPMG Virtual Internship purposes only.

Dear [Client point-of-contact],

Thank you for providing us with the three datasets from Sprocket Central Pty Ltd. The below table
highlights the summary statistics from the three datasets received. Please let us know if the figures are
not aligned with your understanding.

Table name No. of records Distinct Customer IDs Date Data Received
Customer -insert value- -insert value- -insert value-
Demographic
Customer Address -insert value- -insert value- -insert value-
Transaction Data -insert value- -insert value- -insert value-
Notable data quality issues that were encountered and the methods used to mitigate the identified data
inconsistencies are as follows. Furthermore, recommendations have been provided to avoid the re-
occurrence of data quality issues and improve the accuracy of the underlying data used to drive business
decisions.
● Additional customer_ids in the ‘Transactions table’ and ‘Customer Address table’
but not in ‘Customer Master (Customer Demographic)’ 
Mitigation: Please ensure that all tables are from the same period. Only customers in the Customer Master
list will be used as a training set for our model.  
This indicates that the data received may not be in sync with each other which may skew the
analysis results if there are missing data records. Please refer to excel file ‘data_outliers.xlsx’ for
the list of outliers between tables.  

● Various columns, such as the brand of a purchase, or job title, have empty values in
certain records 
Mitigation: If only a small number of rows are empty, filter out the record entirely from the training set for
prediction. Else, if it is a core field, impute based on distribution in the training dataset.  
For key datasets, such as transactions, less than 1% of transactions (totalling less than 0.1% of
revenue) have missing fields. These records have been removed from the training dataset.

● Inconsistent values for the same attribute  

(e.g. Victoria being represented as “V”, “Vic” and “Victoria”) 
Mitigation: Use regular expression to replaced extended values into abbreviations to ensure consistency
across addresses. 
Recommendation: Enforce a drop-down list for the user entering the data rather than a free text field. 
In order to construct meaningful variables for the model, the data has been cleaned to avoid
multiple representations of the same value. Additionally, gender records where ‘U’ have been
replaced based on the distribution from the training dataset.

● Inconsistent data type for the sameattribute 

(e.g. numeric values for some fields and strings for others) 
Mitigation: Convert selected records in characters to numeric. Remove non-numeric characters from string.  
Recommendation: Ensure that fact tables in the given database have constraints on data types. 
Having different data types for a given field make it difficult to interpret results at the later stage.
Therefore, appropriate data transformations are made to ensure consistent data types for a given
field.
Moving forward, the team will continue with the data cleaning, standardisation and transformation process
for the purpose of model analysis. Questions will be raised along the way and assumptions documented.
After we have completed this, it would be great to spend some time with your data SME to ensure that all
assumptions are aligned with Sprocket Central’s understanding.

Kind regards,
[Junior Consultant Name]

SANS FOR610 - Reverse-Engineering Malware: Malware Analysis Tools and Techniques
No ratings yet
SANS FOR610 - Reverse-Engineering Malware: Malware Analysis Tools and Techniques
40 pages
Machine Learning With Real Life Project: by - Rishabh Gaur
100% (2)
Machine Learning With Real Life Project: by - Rishabh Gaur
26 pages
Capstone Project 2 1
No ratings yet
Capstone Project 2 1
3 pages
Additional Customer - Ids in The Transactions Table' and Customer Address Table' But Not in Customer Master (Customer Demographic) '
No ratings yet
Additional Customer - Ids in The Transactions Table' and Customer Address Table' But Not in Customer Master (Customer Demographic) '
2 pages
KPMG Data
50% (2)
KPMG Data
3,723 pages
Q-3-Q-4 - PREDICTIVE ANALYTICS For Class
No ratings yet
Q-3-Q-4 - PREDICTIVE ANALYTICS For Class
32 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
32 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
89 pages
Solution - Data Analysis With Python-Project-2 - v1.0
No ratings yet
Solution - Data Analysis With Python-Project-2 - v1.0
14 pages
KPMG Data Analytics - Task 1
100% (1)
KPMG Data Analytics - Task 1
1 page
Sales Prediction
No ratings yet
Sales Prediction
37 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
22 pages
PROJECT Advanced Statistics
No ratings yet
PROJECT Advanced Statistics
58 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
DBMS LAB Practice
0% (2)
DBMS LAB Practice
4 pages
Data Science in E-Commerce - Report - Writing
No ratings yet
Data Science in E-Commerce - Report - Writing
18 pages
Azure Security Overview
No ratings yet
Azure Security Overview
26 pages
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
100% (1)
Comparative Analysis of Machine Learning Algorithms Using Diabetes Dataset
35 pages
Credit Card EDA: Authored by
100% (1)
Credit Card EDA: Authored by
16 pages
CRM Unit 5 - Customer Analytics Part I
No ratings yet
CRM Unit 5 - Customer Analytics Part I
23 pages
K2 Cold Storage Case Study
0% (1)
K2 Cold Storage Case Study
1 page
Data Analytics Notes
100% (2)
Data Analytics Notes
8 pages
New Wheels Quarterly Business Report
No ratings yet
New Wheels Quarterly Business Report
20 pages
Credit Eda Case Study
100% (2)
Credit Eda Case Study
17 pages
Amazon ACE Challenge - Operations Case Breaker
No ratings yet
Amazon ACE Challenge - Operations Case Breaker
6 pages
Starbucks Sentiment Analysis Using VADER
No ratings yet
Starbucks Sentiment Analysis Using VADER
23 pages
Monetary Policy & Inflation in India: Group 11
No ratings yet
Monetary Policy & Inflation in India: Group 11
18 pages
Long Quiz FRA - Finance and Risk Analytics - Great Learning
100% (1)
Long Quiz FRA - Finance and Risk Analytics - Great Learning
8 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Advantages of Distributed database
No ratings yet
Advantages of Distributed database
6 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
Data Analytics
100% (1)
Data Analytics
24 pages
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
100% (1)
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
20 pages
Week 2 Assignment
50% (4)
Week 2 Assignment
5 pages
Mba ZG536 Course Handout
No ratings yet
Mba ZG536 Course Handout
7 pages
Multivariate Data Analysis: Overview of Methods
100% (1)
Multivariate Data Analysis: Overview of Methods
30 pages
fastapi
No ratings yet
fastapi
3 pages
Class Assignment 1 For Business Analytics
No ratings yet
Class Assignment 1 For Business Analytics
5 pages
Assignment Data Analysis Example
100% (1)
Assignment Data Analysis Example
10 pages
Submitted To:: Prof. Vinay Singh Chawan
No ratings yet
Submitted To:: Prof. Vinay Singh Chawan
12 pages
Case Analysis Business Analytics
No ratings yet
Case Analysis Business Analytics
2 pages
Data Preparation
No ratings yet
Data Preparation
12 pages
Lookup Functions
No ratings yet
Lookup Functions
4 pages
PG Program Dsba Classroom
No ratings yet
PG Program Dsba Classroom
16 pages
Online Furniture Shopping
No ratings yet
Online Furniture Shopping
54 pages
Microprocessor Architecture Question Bank
No ratings yet
Microprocessor Architecture Question Bank
4 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
SIP Session Initiation Protocol
No ratings yet
SIP Session Initiation Protocol
76 pages
LSMW Eaq Zcap Rollout VM
No ratings yet
LSMW Eaq Zcap Rollout VM
632 pages
Vmax 3 Lab Guide
100% (1)
Vmax 3 Lab Guide
92 pages
Solution To Problem 1: Importing The Libraries
No ratings yet
Solution To Problem 1: Importing The Libraries
6 pages
Data Sheet Ultrastar DC hc510
No ratings yet
Data Sheet Ultrastar DC hc510
2 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
6 pages
BCA NOTES Final English
No ratings yet
BCA NOTES Final English
16 pages
Technical Questions Btech
No ratings yet
Technical Questions Btech
58 pages
ODICU
No ratings yet
ODICU
230 pages
Barringer E3 PPT 01
100% (1)
Barringer E3 PPT 01
27 pages
Data Mining
100% (3)
Data Mining
18 pages
Data Analytics Project
No ratings yet
Data Analytics Project
9 pages
Java IO
No ratings yet
Java IO
23 pages
Azure File Share
No ratings yet
Azure File Share
12 pages
Nfcforum Ts Ndef
No ratings yet
Nfcforum Ts Ndef
69 pages
Process Mining Manifesto PDF
No ratings yet
Process Mining Manifesto PDF
15 pages
FedEx Case Study - Ciprian Jitaru
100% (1)
FedEx Case Study - Ciprian Jitaru
8 pages
Advanced Statistical Techniques For Analytics (Course Handout, 2018H2)
No ratings yet
Advanced Statistical Techniques For Analytics (Course Handout, 2018H2)
6 pages
Diploma (One Year) (B.Voc.) Semester-I (Software Development) Examination Computer Fundamentals-I Compulsory Paper-4
No ratings yet
Diploma (One Year) (B.Voc.) Semester-I (Software Development) Examination Computer Fundamentals-I Compulsory Paper-4
2 pages
26.1.7 Lab - Snort and Firewall Rules
100% (1)
26.1.7 Lab - Snort and Firewall Rules
8 pages
Tutorial 1 With Answers
No ratings yet
Tutorial 1 With Answers
6 pages
Final Capstone Report
No ratings yet
Final Capstone Report
16 pages
Project Report Adv Stat V1.0
No ratings yet
Project Report Adv Stat V1.0
5 pages
Fusion Assets Physical Inventory Comparison Process ADFDI
No ratings yet
Fusion Assets Physical Inventory Comparison Process ADFDI
4 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
37 pages
Tutorial 4 With Answers
No ratings yet
Tutorial 4 With Answers
4 pages
Tutorial 5
No ratings yet
Tutorial 5
3 pages
Classification o F Physical Storage Media
No ratings yet
Classification o F Physical Storage Media
21 pages
Tushar Tukaram Bhakare: Education Skills
No ratings yet
Tushar Tukaram Bhakare: Education Skills
1 page
Advanced Statistics Project
No ratings yet
Advanced Statistics Project
12 pages
Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page
Description: Tags: 0203Overview1Nov280H
No ratings yet
Description: Tags: 0203Overview1Nov280H
20 pages
Market Basket Analysis Using: R Tool
No ratings yet
Market Basket Analysis Using: R Tool
23 pages
Introduction To AWS DEVOPS
No ratings yet
Introduction To AWS DEVOPS
4 pages
DBMS MID 1 - Short Answers
No ratings yet
DBMS MID 1 - Short Answers
3 pages
Jinal Desai
No ratings yet
Jinal Desai
4 pages
Tutorial 3 With Answers
No ratings yet
Tutorial 3 With Answers
5 pages
Zorro Cheat Sheet
No ratings yet
Zorro Cheat Sheet
9 pages
Monitoring Messages in AEX
No ratings yet
Monitoring Messages in AEX
3 pages
Singly Linked List
No ratings yet
Singly Linked List
58 pages
Capstone Presentation
No ratings yet
Capstone Presentation
9 pages
Audit For ActiveDirectory Windows 2008 Server
No ratings yet
Audit For ActiveDirectory Windows 2008 Server
17 pages
GDS-VFP Interview Guides July 9 2021
No ratings yet
GDS-VFP Interview Guides July 9 2021
6 pages
Hyperion Essbase Faq's
100% (1)
Hyperion Essbase Faq's
28 pages
Lifecycle of A Data Science Project
No ratings yet
Lifecycle of A Data Science Project
1 page
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Organizational Readiness to E-Transformation
From Everand
Organizational Readiness to E-Transformation
Aqel M. Aqel
No ratings yet

Module 1 Answer PDF

Uploaded by

Module 1 Answer PDF

Uploaded by

Note: The data and information in this document is reflective of a hypothetical situation and client.

This document is to be used for KPMG Virtual Internship purposes only.

Dear [Client point-of-contact],

● Inconsistent values for the same attribute

● Inconsistent data type for the sameattribute

You might also like

● Inconsistent values for the same attribute  

● Inconsistent data type for the sameattribute