0% found this document useful (0 votes)
49 views12 pages

(M3S1) Data Analytics Framework

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views12 pages

(M3S1) Data Analytics Framework

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 12

ALACC0001L: Decision Analytics using Spreadsheets

M3S1: Data Analytics Framework


Data Analytics Framework
Core Message: Breakdown Tasks into manageable pieces
The Data Analytics Framework helps simplify the task while keeping it focused on the most
crucial aspects

D.A.T.A
Decisions – What is the objective of your data analysis?
Professional – How much revenue can you expect by the next quarter?
Personal – What is the ideal way to maximize savings?

Acquisition – The data you will need to accomplish the task


Professional – Do you need sales data?
Personal – Find out where you spend most of your budget

Time – How long of a period do you have


Short Time Period – Will this take one year or less?
Long Time Period – Will this take longer than one year?

Analysis – How do you plan on executing this task?


Professional – What departments and products are the biggest sources of revenue?
Personal – Is my budget being spent mostly on gifts?

Think of the Data Analysis Framework as a method for us to be efficient


Knowing the goal is important, as we would like to know when we achieve it.
Data Audit and Internal Controls
Core Message: Data Audits for Pre-established spreadsheets
The Data Analytics Framework is helpful when we start from an empty spreadsheet
However, many data analysis tasks may already have existing spreadsheets
Data Audits that make use of the Data Analysis Framework help to ensure that the spreadsheet
can achieve its desired goal

Data Internal Controls


Spreadsheets are not worked upon by a single individual, most of the time many individuals will
be sharing the same spreadsheet
The collaborative nature of working on a single spreadsheet helps utilize everyone’s expertise,
but it also has a downside
Errors can easily pop up and makes it susceptible to simple mistakes such in Data Entry or Data
Loss
Data Entry Errors – Mistyping certain values like warehouse and ware house
Data Loss – Accidentally deleting data and editing the wrong values
Data Internal Controls like Data Validation are a great way to ensure that there are no mistakes
It could be as simple as letting all numerical values having a “,”. Which is helpful to distinguish
between 100000 and 1,000,000
Data Pre-processing
Core Message: Data Analysis should be done after the data has been
prepared
Regardless of your field of specialization, the fundamentals of having clean data remains
Data Pre-processing helps users prepare the data
It is different from the Data Analysis Framework
The Data Pre-processing is about cleaning the data set, the Data Analysis Framework is about
breaking down the task

Data Pre-processing
Data Integration – Getting data from multiple sources
Data Transformation – Normalizing data, for example if we are comparing scales of 100 – 200 and
1000 – 2000.
Data Cleaning -
Data Reduction
The Basics in Preprocessing
Core Concept: Real World Data is Dirty
Definition – Data collected from the real world can be incomplete, noisy, and inconsistent
Incomplete – lacking important or crucial information in a data set, such as empty rows or columns in a data set
Noisy – Data sets containing errors or outliers that can alter the results of the analysis
Inconsistent – The data may have values that are incoherent

Incomplete
Caused by
Not Applicable Data or Human Error – Data Entry or fields in a form are not required
Software / Hardware Errors
Example – Name or ID # being missing

Noisy
Caused by
Faulty Data Collection Instruments
Human / Computer Error at Data Entry
Errors in Data Transmission from one medium to another

Example – Time In: 25:32:22

Inconsistent
Example – Age = 30, Birthday 10/25/1955
Example – Rating 1, 2, 3 then Easy, Medium, Challenging
Major Tasks in Data Preprocessing
Core Concept: Four Major Tasks
Definition – Data preprocessing mainly deals with Integration, Transformation, Cleaning, and
Reduction of Data
Data Integration – Getting data from multiple sources
Data Transformation – Changing data from one format to another, Example – Categorical or
Numerical

Data Integration
Schema Integration – Example Student ID = Student Number
Entity Identification Problem – Example Pacman = Manny Pacquiao
Detecting and Resolving Data Conflicts – Example Fahrenheit vs Celsius, Imperial vs Metric
Handling Redundancy – Duplicate Data

Data Transformation
Normalization – Example Min-Max Normalization or Z-score standardization
Binning – Transforming Numerical values into categorical counterparts, Example – Age Groups 1-
10,11-19, 20-29, 30-39, and onwards
Some data analysis algorithms run better on categorical or numerical data
Major Tasks in Data Preprocessing
Core Concept: Four Major Tasks
Definition – Data preprocessing mainly deals with Integration, Transformation, Cleaning, and
Reduction of Data
Data Cleaning – Getting data ready for analysis and storage
Data Reduction / Manipulation – Simplifying data for faster processing

Data Cleaning
“Data cleaning is one of the three biggest problems in data warehousing” – Ralph Kimball
Tasks in Data Cleaning:
Fill in missing values
Identify and smooth outliers and noisy data
Remove or replace inconsistent data
Resolve any problems that occurred when data was merged

Data Reduction / Manipulation


Data is not always balanced, it can be skewed towards one side or another
Sampling – A technique used for selecting data
Dimensions – More data isn’t always good, sometimes it can be too distracting
Major Tasks in Data Preprocessing
Core Concept: Handling Missing Data
Definition – There are many ways to handle missing values,
and each of them has their own advantages
Causes of Missing Data:
Hardware or Software Malfunction
Inconsistent Data Records
Misunderstandings in Data Collection and Entry
Important data may be categorized as unimportant during collection
Changes in the data may not be tracked

How to handle missing data


Dropping Data – If even one value is missing from the data,
then it is simply removed
Manually Inferred – Filling the data to avoid empty values We may take the average of all the known yes values
and use 200,000 and 205,000 and divide it by 3
Data Imputation – Filling data automatically with a “global
There are only 3 “Yes” in this section, that is why we
constant”
used it
Do note that this will change your data
How to handle noisy data Other techniques also involve using Mean
Binning – sort data and partition into bins then smooth by
bin means, median, and boundaries
Clustering – grouping data and removing outliers
Inspection – using human and computer verification to
check data
Activity:
Reflection on Data Analytics Framework

Instructions:
In a ½ yellow paper, answer the following questions:
Name:One Chicken Recipe you like:
1. What do you think of the Data Analytics Framework(2 – 3 Sentences)
2. How can you apply it in your daily life? (2 – 3 Sentences)
3. Is there a way to improve the Data Analytics Framework? (2 – 3 Sentences)
Resources Used
https://img.freepik.com/free-vector/tools_24908-54620.jpg?t=st=1714908125~exp=1714911725~hmac
=81aeccf884f01b6b1360639eae443253905199be610b6f4a04a5f0ca3da3366d&w=900
https://img.freepik.com/free-photo/busy-young-woman-typing-computer-writing-code-software-app-programmer-working-from-home-her-office-desk_662251-824.jpg?t=st=1714908176~exp=1714911776~hmac=62fd82aa901b87c200fb0940108a25f18e4c1b31eb381161013f75fac86f6136&w=740

https://img.freepik.com/free-photo/office-material-table_23-2148148305.jpg?t=st=1714910447~exp=1
714914047~hmac=173d44b220a6bf5fbfc2ed59c3659b33474f59564911877fcc08238871c48adf&w=7
40
https://img.freepik.com/free-vector/statistical-data-abstract-paper-tablet_3446-313.jpg?t=st=1714910
484~exp=1714914084~hmac=78099db76d8aec13a6af534783093c070153d445574777f88011f38b50
1808bc&w=826
https://media.moyaproject.com/willmcgugan/uploads/thumbnails/techblog/2eb2d66e-7280-11ea-9f38-
f23c91845b44.png/2eb2d66e-7280-11ea-9f38-f23c91845b44png.lg.3.jpeg
Girvin, M. (2022). Microsoft 365 Excel: the Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making and Insight. Holy Macro! Books.

You might also like