(M3S1) Data Analytics Framework
(M3S1) Data Analytics Framework
D.A.T.A
Decisions – What is the objective of your data analysis?
Professional – How much revenue can you expect by the next quarter?
Personal – What is the ideal way to maximize savings?
Data Pre-processing
Data Integration – Getting data from multiple sources
Data Transformation – Normalizing data, for example if we are comparing scales of 100 – 200 and
1000 – 2000.
Data Cleaning -
Data Reduction
The Basics in Preprocessing
Core Concept: Real World Data is Dirty
Definition – Data collected from the real world can be incomplete, noisy, and inconsistent
Incomplete – lacking important or crucial information in a data set, such as empty rows or columns in a data set
Noisy – Data sets containing errors or outliers that can alter the results of the analysis
Inconsistent – The data may have values that are incoherent
Incomplete
Caused by
Not Applicable Data or Human Error – Data Entry or fields in a form are not required
Software / Hardware Errors
Example – Name or ID # being missing
Noisy
Caused by
Faulty Data Collection Instruments
Human / Computer Error at Data Entry
Errors in Data Transmission from one medium to another
Inconsistent
Example – Age = 30, Birthday 10/25/1955
Example – Rating 1, 2, 3 then Easy, Medium, Challenging
Major Tasks in Data Preprocessing
Core Concept: Four Major Tasks
Definition – Data preprocessing mainly deals with Integration, Transformation, Cleaning, and
Reduction of Data
Data Integration – Getting data from multiple sources
Data Transformation – Changing data from one format to another, Example – Categorical or
Numerical
Data Integration
Schema Integration – Example Student ID = Student Number
Entity Identification Problem – Example Pacman = Manny Pacquiao
Detecting and Resolving Data Conflicts – Example Fahrenheit vs Celsius, Imperial vs Metric
Handling Redundancy – Duplicate Data
Data Transformation
Normalization – Example Min-Max Normalization or Z-score standardization
Binning – Transforming Numerical values into categorical counterparts, Example – Age Groups 1-
10,11-19, 20-29, 30-39, and onwards
Some data analysis algorithms run better on categorical or numerical data
Major Tasks in Data Preprocessing
Core Concept: Four Major Tasks
Definition – Data preprocessing mainly deals with Integration, Transformation, Cleaning, and
Reduction of Data
Data Cleaning – Getting data ready for analysis and storage
Data Reduction / Manipulation – Simplifying data for faster processing
Data Cleaning
“Data cleaning is one of the three biggest problems in data warehousing” – Ralph Kimball
Tasks in Data Cleaning:
Fill in missing values
Identify and smooth outliers and noisy data
Remove or replace inconsistent data
Resolve any problems that occurred when data was merged
Instructions:
In a ½ yellow paper, answer the following questions:
Name:One Chicken Recipe you like:
1. What do you think of the Data Analytics Framework(2 – 3 Sentences)
2. How can you apply it in your daily life? (2 – 3 Sentences)
3. Is there a way to improve the Data Analytics Framework? (2 – 3 Sentences)
Resources Used
https://img.freepik.com/free-vector/tools_24908-54620.jpg?t=st=1714908125~exp=1714911725~hmac
=81aeccf884f01b6b1360639eae443253905199be610b6f4a04a5f0ca3da3366d&w=900
https://img.freepik.com/free-photo/busy-young-woman-typing-computer-writing-code-software-app-programmer-working-from-home-her-office-desk_662251-824.jpg?t=st=1714908176~exp=1714911776~hmac=62fd82aa901b87c200fb0940108a25f18e4c1b31eb381161013f75fac86f6136&w=740
https://img.freepik.com/free-photo/office-material-table_23-2148148305.jpg?t=st=1714910447~exp=1
714914047~hmac=173d44b220a6bf5fbfc2ed59c3659b33474f59564911877fcc08238871c48adf&w=7
40
https://img.freepik.com/free-vector/statistical-data-abstract-paper-tablet_3446-313.jpg?t=st=1714910
484~exp=1714914084~hmac=78099db76d8aec13a6af534783093c070153d445574777f88011f38b50
1808bc&w=826
https://media.moyaproject.com/willmcgugan/uploads/thumbnails/techblog/2eb2d66e-7280-11ea-9f38-
f23c91845b44.png/2eb2d66e-7280-11ea-9f38-f23c91845b44png.lg.3.jpeg
Girvin, M. (2022). Microsoft 365 Excel: the Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making and Insight. Holy Macro! Books.