0% found this document useful (0 votes)
3 views27 pages

Module 6 - Class 1 Slides

The document discusses the various forms of data and the importance of understanding data in the analytics process, emphasizing the need for data cleaning and preparation. It highlights the role of Alteryx as a tool for data extraction, transformation, and loading (ETL), enabling users to automate workflows and perform analyses without extensive coding knowledge. Additionally, it addresses common pitfalls in data analysis, including cognitive biases and misrepresentation of data, urging analysts to maintain critical thinking throughout the process.

Uploaded by

Mary T. Scott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views27 pages

Module 6 - Class 1 Slides

The document discusses the various forms of data and the importance of understanding data in the analytics process, emphasizing the need for data cleaning and preparation. It highlights the role of Alteryx as a tool for data extraction, transformation, and loading (ETL), enabling users to automate workflows and perform analyses without extensive coding knowledge. Additionally, it addresses common pitfalls in data analysis, including cognitive biases and misrepresentation of data, urging analysts to maintain critical thinking throughout the process.

Uploaded by

Mary T. Scott
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

DATA ANALYTICS

PLATFORMS
EXTRACT, TRANSFORM AND LOAD
USING ALTERYX

ACC 4510
The Myths of Big Data
Vince Ebert

“Recognize what
computers and
artificial
intelligence do well.
Focus on what
humans do well”
DATA COMES IN MANY FORMS

Images Video

Credit: Instagram/chloeclem Credit: Youtube https://www.youtube.com/watch?v=sIlNIVXpIns

Audio Free Form Text


“I bought this chair four months ago and it has been…”

4
© 2022 ALTERYX, INC. All rights reserved.
DATA COMES IN MANY FORMS
Tabular Data
• Data in a table with rows and columns
◦ Examples include spreadsheets like Excel, Google Sheets, and Numbers

• Each row (record) represents an observation


• Each column (field) contains an attribute for those observations

Columns

Column Headers First Name Last Name Age GPA Address City

Will Byers 15 3.44 3915 Forest Street Hawkins

Mike Wheeler 14 3.25 2530 Piney Wood Lane Hawkins

Rows Lucas Sinclair 14 3.37 2550 Piney Wood Lane Hawkins

Max Mayfield 15 3.15 4819 Cherry Lane Hawkins

Dustin Henderson 15 3.62 2886 Oak Hill Drive Hawkins

5
© 2022 ALTERYX, INC. All rights reserved.
DATA AROUND THE WORLD
Data is collected on everything

Nature Broad Economic Corporate Data Sports and Entertainment


• Southern Sea Otter
Measures • Quarterly Sales • Box Office Sales
Populations • Consumer Price • Customer • Total Streams
• Australian Forest Index (Inflation) Satisfaction Surveys
• Advanced Performance
Fires • Housing Supply • Inventory Metrics
• Temperatures and • Unemployment
Rainfall

6
© 2022 ALTERYX, INC. All rights reserved.
DATA ABOUT YOU
You generate more data than you may know

Social Media Credit Score Shopping History Health and Genetics


Platforms • Total loans and • What are buying • Heart rate, blood
• What are you debt
• When are you
pressure, weight
watching? • Missed payments buying • Prior health incidents
• What do you like, • Recent credit • What vendors do • DNA
comment, or applications you use
reshare?

7
© 2022 ALTERYX, INC. All rights reserved.
DATA ANALYTICS LIFECYCLE

Step 3 –
Clean and
Prepare Data

Step 2 –
Identify, Collect, Step 4 –
and Investigate Perform Analyses
DATA
Data
References
CRISP-DM Methodology
https://www.northeastern.edu/graduate/blog/data-analysis-project-lifecycle/
https://www.linkedin.com/pulse/six-data-analysis-phases-gert-l%C3%B5hmus/

Step 1 – Step 5 – Step 6 –


Understand the Interpret and Visualize and
Business Problem Evaluate Present

8
© 2022 ALTERYX, INC. All rights reserved.
THE 80-20 Rule
Step 3 – By using technologies like Alteryx
Clean and 80% of analysts timethis
Designer, is typically
split can spent
shift here
Prepare Data

Step 2 –
Identify, Collect, Step 4 –
and Investigate Perform Analyses
Data

Step 1 – Step 5 – Step 6 –


Understand the Interpret and Visualize and
Business Problem Evaluate Present

9
© 2022 ALTERYX, INC. All rights reserved.
Know Your Data
Content and Context Matters

10
KNOW YOUR DATA
What is the meaning of the data Column Headers From an Air Travel Dataset
Div1TotalGTime DepartureDelayGroups DepDel15
Data Dictionary 9 -1 0
• Column headers can be vague or cryptic
• May not specify units
• A data dictionary is a separate
document that provides more detail on
the types of measurements or
classifications that are displayed within a
given column

12
© 2022 ALTERYX, INC. All rights reserved. Reference: https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGK&QO_fu146_anzr=b0-gvzr
KNOW YOUR DATA
What is the meaning of the data Column Headers From an Air Travel Dataset
Div1TotalGTime DepartureDelayGroups DepDel15
Data Dictionary 9 -1 0
• Column headers can be vague or cryptic
• May not specify units
• A data dictionary is a separate
document that provides more detail on
the types of measurements or
classifications that are displayed within a
given column
• Even if you know what the column
stands for, the values in the column can
still be abbreviations, or codes
• Separate tables often called Lookup
Tables are usually provided as a key to
make sense of these values

13
© 2022 ALTERYX, INC. All rights reserved.
KNOW YOUR DATA
Rows

What is an observation
• In the example provided, we can tell that each row represents a single student

First Name Last Name Age GPA Address City

Will Byers 15 3.44 3915 Forest Street Hawkins

Mike
Order_ID Wheeler
Product_ID 14 3.25
Customer_ID 2530 Piney Wood
Price Lane Date
Shipping Hawkins

Lucas
08993541 Sinclair
14562 14
945611 3.37 2550 Piney Wood
56.11 Lane
5/23/2022 Hawkins

Max
45255368 Mayfield
31098 15
303014 3.15 4819 Cherry Lane
98.19 5/24/2022 Hawkins

Dustin
01248973 Henderson
31098 15
415421 3.62 2886 Oak Hill Drive
117.63 5/24/2022 Hawkins

Nancy
36642859 Wheeler
42887 17
963478 3.71 2530 Piney Wood
102.48 Lane
5/25/2022 Hawkins

Steve
11429030 Harrington
03012 18
048401 2.85 1380 Glen Hills Road
60.23 5/25/2022 Hawkins

• However, its not always apparent what constitutes an observation


• What about a data table with shipping information for products that have been sold
• Is there a record for each product, or for each shipment?

14
© 2022 ALTERYX, INC. All rights reserved.
KNOW YOUR DATA
Data Investigation

Consistent Formatting Missing Values Summary Statistics


• Having consistent • Are there lots of missing • What are the range of values?
formatting is crucial values?
• What values are most common
◦ In a data table on sales, a • Are they limited to a few or least common?
field lists the salesperson. columns or do the records
◦ Are these the same person?
• Are there any values that are
with missing values have
much larger than the others?
similarities?
Rob Barker • How might you address
bob barker the missing values?
Robbie Barker
Robert Barker
Bobbie W. Barker
Mr. Bob William Barker

15
© 2022 ALTERYX, INC. All rights reserved.
Data Fallacies: Intentional or Not

“There are three kinds of lies: lies, damned lies, and statistics”
- Mark Twain, Chapters from My Autobiography, quoting British Prime Minister Benjamin Disraeli

16
INTENTIONAL MISREPRESENTATION
Picking facts to meet a narrative Use this table for an ice cream chain to portray a favorable picture
of the business’s performance over the past 4 years.
Now, use different statistics to portray a negative
picture of the business’s performance.

Total Sales Total Expenses Average Daily Peak Season


Total Locations
(millions) (millions) Value of Inventory Sales (million)

2019 2.73 5 2.18 12,158 1.16

2020 2.86 6 2.55 15,377 1.35

2021 2.93 6 2.61 16,046 1.37

2022 2.82 5 2.42 12,783 1.43

17
© 2022 ALTERYX, INC. All rights reserved.
UNINTENTIONAL MISREPRESENTATION
Be conscious of letting biases slip into your analysis, findings, or recommendations

Sampling Bias The Observer Effect Relying Solely on


• Drawing conclusions from a • The act of monitoring a
Summary Metrics
non-representative sample group affects their behavior • Summary statistics don’t
◦ Polling people outside of a ◦ As part of an efficiency study tell the whole story
bookstore to see if they for the shipping department, ◦ Two businesses with the
prefer reading from a monitor truck loading behavior same average sales can
physical book or an e-reader and procedures. have very different stories

Reference: https://www.geckoboard.com/best-practice/statistical-fallacies/
Reference: https://www.forbes.com/2009/02/19/incentives-compensation-bonuses-leadership_perverted_incentives.html?sh=328aa8ad5b3b 18
© 2022 ALTERYX, INC. All rights reserved.
UNINTENTIONAL MISREPRESENTATION
Be conscious of letting biases slip into your analysis, findings, or recommendations

False Causality Perverse Incentives Simpson’s Paradox


• Assuming if two events • Also known as the “Cobra • Lurking variables cause
occur together, one caused Effect”, incentivizing one false conclusions
the other behavior leads to
◦ The company starts a unintended consequences
morning bagel program in ◦ IBM at one point paid its
the spring and sales steadily programmers by the number
climb starting in Q2 of lines of code they wrote

Reference: https://www.geckoboard.com/best-practice/statistical-fallacies/
Reference: https://www.forbes.com/2009/02/19/incentives-compensation-bonuses-leadership_perverted_incentives.html?sh=328aa8ad5b3b 19
© 2022 ALTERYX, INC. All rights reserved.
Critical Thinking Module
Module 5 will be spread throughout the rest of the semester and
is different from other modules in that it is not about technical
skills.

This module will challenge you to think about your thinking and
challenge assumptions you make. These skills will make your
data analysis stronger and will also enhance your everyday
reasoning and analysis abilities.

The final project will also ask you to document at least one
cognitive bias you overcame while analyzing.
Key Takeaways

• Data is everywhere

• Data analytics is a process that helps us solve


problems using data techniques, but each
problem and data set is a unique challenge

• A thorough understanding of your data is


critical to any data project. This is where
human reasoning will always be paramount.

• There are many pitfalls to avoid when


performing data analysis. Becoming aware of
cognitive biases is a useful antidote to data
fallacies.
Class Organization

• • Prompt • Cognitive Bias

Critical Thinking
Artificial Intelligence
Technical Skills

ETL
• Alteryx Engineering Awareness
• Data Workflows • LLMs • Data Fallacies
• SQL • Data Analysis • Writing and
• APIs • Coding/ Communicating
• Data Cleansing, Programming
Transformation,
and
Manipulation
Alteryx
Designer
What Is It and Why Do I Need It?
Alteryx Designer Analytics
and Automation Platform
Easy-to-use, drag-and-drop,
code-free, and code-friendly.

Access any data source:


• Handles billions of records
• Flat files, databases, API calls, and more

Any skill level:


• No Coding knowledge required
• Can incorporate SQL, Python, and R
directly into Designer

24
© 2022 ALTERYX, INC. All rights reserved.
Designer’s Many Capabilities
• Automate data prep and analysis by building
dynamic, repeatable workflows that save time
• Easily integrates with visualization tools
• Visual workflows with easy to add notations
make cooperation and documentation simple
• Additional functionality:
− Build reports
− Perform spatial analysis
− Create predictive and time series models
− Build macros and analytic apps
− Work inside of a database
− Perform natural language processing
− Analyze images
− And more!

25
© 2022 ALTERYX, INC. All rights reserved.
Top 10 Skills of 2025

© 2022 ALTERYX, INC. All rights reserved.


ALTERYX
COMMUNITY
Let’s explore Alteryx Community!
Community.Alteryx.com

27
© 2022 ALTERYX, INC. All rights reserved.
Install Alteryx
You may use your own license key on a personal
windows computer, or you may use the license
key on Canvas on a lab computer (or both!)

28
Homework
Watch Data and Devices
and Formatting Data

29

You might also like