0% found this document useful (0 votes)

61 views6 pages

Principles of Data Literacy - Introduction To Data Cheatsheet - Codecademy

The document discusses key concepts in data literacy including data gaps, bias, statistics, categorical and quantitative variables, tidy and messy data. It provides examples and definitions for common types of missing data such as structurally missing, missing at random, and missing completely at random.

Uploaded by

john.nstat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views6 pages

Principles of Data Literacy - Introduction To Data Cheatsheet - Codecademy

Uploaded by

john.nstat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Cheatsheets / Principles of Data Literacy

Introduction to Data

Data Gaps
The ability to separate good, mediocre, and poor quality
data is a crucial data literacy skill. Data-driven
conclusions are only as strong, robust, and well-
supported as the data behind them. This is also often
referred to with the phrase “garbage in, garbage out.”

Addressing Bias
Bias in data collection leads to poorer quality data.
Recognizing bias in data is a crucial data literacy skill.
Some key questions about bias include “Who made the
data?”, “Who participated in the data?” and “Who is left
out of the data?”

What is Statistics?
Statistics helps to measure whether an event happens by
chance or by a systemic factor or factors. For example,
it’s statistically more likely to see traffic during peak rush
hour than outside of peak rush hour times.

Statistics at work
Statistics can reveal systemic patterns in a data set rather
than relying on individual experiences. This is important in
legal cases including those addressing discrimination or
class-action lawsuits.
Garbage In, Garbage Out
The quality of the predictions made during a predictive
analysis is deeply dependent on the quality of the data
used to generate the predictions.
For example, if a model is trained with mislabeled data, it
will produce inaccurate predictions no matter how good
the actual algorithm is. This is commonly referred to as,
“garbage in, garbage out.”

Binary Categorical Variables

Categorical variables can also be binary or dichotomous
variables. Binary variables are nominal categorical
variables that contain only two, mutually exclusive
categories. Examples of binary variables are if a person is
pregnant, or if a house’s price is above or below a
particular price.

Categorical Variables
Categorical variables consist of data that can be grouped
into distinct categories, and are ordinal or nominal.
Ordinal categorical variables which are groups that
contain an inherent ranking, such as ratings of plays or
responses to a survey question with a point scale e.g., on
a scale from 1-7, how happy are you right now? Nominal
categorical variables are made of categories without an
inherent order, examples of nominal variables are species
of ants, or people’s hair color.

Quantitative Vs. Categorical Variables

Variables can be either quantitative or categorical.
Quantitative variables are amounts or counts; for
example, age, number of children, and income are all
quantitative variables. Categorical variables represent
groupings; for example, type of pet, agreement rating, and
brand of shoes are all categorical variables.
Categorical Data Defined
Categorical Data refers to data represented by words
rather than numbers. Examples of categorical data are
tree species and survey responses (Agree, Neutral,
Disagree).

Ordinal and Nominal Categorical Data

Categorical variables can be either ordinal (ordered) or
nominal (unordered).
Examples of ordinal variables include places (1st, 2nd, 3rd)
and survey responses (on a scale of 1 to 5, how much do
you agree with a statement).
Examples of nominal variables include tree species,
student names, and account names.

Messy Data
Messy data is data that violates one of the tidy dataset
rules (1. Each variable forms a column; 2. Each
observation forms a row; 3. Each type of observational
unit forms a table).
Below is an example of messy data:

ID# Name ChemGrade2020 MathGrade2020 B

1 Brown F B

B smith
Saito,
3 A 90
K

Tabular Data
Tabular data is organized into rows, or observations, along
the vertical axis, and columns, also referred to as
variables or features, along the horizontal axis.
Row
Variable 1 Variable 2 Variable 3
#
1 Observation Observation Observation

2 Observation Observation Observation

3 Observation Observation Observation

Tidy Data Rules

A tidy dataset follows three fundamental rules:
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.

Below is an example of a tidy dataset:

ID# Student Year Class Grade

1 Brown 2020 Chem F

1 Brown 2021 Chem B

1 Brown 2021 Math A

2 Smith 2020 Bio C

2 Smith 2021 CompSci B

3 Saito 2020 Chem A

3 Saito 2021 Math B

Sample Set of Data

A sample set of data is a dataset that is representative of
the entire population of interest. Random sampling is the
best way to make sure the sample is representative of the
whole population but does not guarantee a
representative sample, especially if the sample is too
small.

Structurally Missing Data

Structurally Missing Data is data that is expected to be
missing.
For example, there are structurally missing data in the
‘Litters’ and ‘Pups/Litter’ columns for all the male dogs in
the table below because we would not expect male dogs
to have puppies.

ID# Name Breed Sex Litters Pups/L

1 Gnasher ACD M

2 Cassie Collie F 1 3
French
3 Pepper F 4 2
Bulldog
Golden
4 Jed M
Retreiver
5 Henry Spaniel M

6 Ruby ACD F 1 6

Missing at Random Data

Missing at Random (MAR) data is missing because of some
random characteristic about the person or thing being
studied. Often, this type of data is reliably missing based
on the value of another variable in the dataset.
In the table below, the bacterial cell counts for all the
stool samples are ‘NaN’. If we looked into this, we might
find that there were too many bacterial cells to count in
all those samples. Therefore, the bacterial cell counts for
stool samples would be MAR data.
Sample Sample Bacterial Cell
ID Type Counts
1 Hand Swab 1008

2 Stool NaN

3 Mouth Swab 7876

4 Hand Swab 657

5 Stool NaN

6 Hand Swab 2442

7 Mouth Swab 5444

8 Stool NaN

9 Hand Swab 4654

10 Stool NaN
Data Missing Completely at Random
Dat Missing Completely at Random (MCAR) data has no
detectable underlying reason causing the values to be
missing.
The table below has MCAR data. The # of fruits is missing
for some plants, but the missing fruit data seems
unrelated to the height of the plant. Short and tall plants
are both missing fruit data. In addition, we are missing the
height for one of our plants!

Plant Height (cm) # of Fruits

1 65 10

2 87

3 987

4 44

5 105 35

6 547 74

7 876

8 55

9 875 95

Saved Print Share

Stat For ds-1 (IITM BS Degree)
No ratings yet
Stat For ds-1 (IITM BS Degree)
109 pages
Stats 1 - IITM BS Notes - Part 1
No ratings yet
Stats 1 - IITM BS Notes - Part 1
16 pages
Week 1-4 Statistics Notes
No ratings yet
Week 1-4 Statistics Notes
91 pages
8609 Quiz
100% (3)
8609 Quiz
41 pages
Chapter1 StatisticsDeskriptive
No ratings yet
Chapter1 StatisticsDeskriptive
74 pages
CENG313 Introduction To Data Science: Lecture 3-4: Data Types and Datasets
No ratings yet
CENG313 Introduction To Data Science: Lecture 3-4: Data Types and Datasets
69 pages
Dav Theory
No ratings yet
Dav Theory
111 pages
Waqar Ansari's RISE QM Ch#07
No ratings yet
Waqar Ansari's RISE QM Ch#07
18 pages
Meteors Comets, Asteroids And: Science 8
100% (2)
Meteors Comets, Asteroids And: Science 8
30 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
MMW Stat 24 25
No ratings yet
MMW Stat 24 25
42 pages
Business Intelligence Data Analyst - Career Path
No ratings yet
Business Intelligence Data Analyst - Career Path
27 pages
Statistics For Business and Economics: 8 Global Edition
No ratings yet
Statistics For Business and Economics: 8 Global Edition
68 pages
Statistics and Basic Terms
No ratings yet
Statistics and Basic Terms
28 pages
Principles of Data Literacy - Introduction To Data and Data Literacy Cheatsheet - Codecademy
No ratings yet
Principles of Data Literacy - Introduction To Data and Data Literacy Cheatsheet - Codecademy
11 pages
Data Gathering, Organization, Presentation and Interpretation
No ratings yet
Data Gathering, Organization, Presentation and Interpretation
10 pages
Data Preparation-Part 1-231018-220411
No ratings yet
Data Preparation-Part 1-231018-220411
74 pages
04-05-2025 - INC JR IIT STAR CO SUPER CHAINA MODEL-A & B - Jee - Main - WTM-03 - KEY&SOL
No ratings yet
04-05-2025 - INC JR IIT STAR CO SUPER CHAINA MODEL-A & B - Jee - Main - WTM-03 - KEY&SOL
10 pages
Mse1 Stat Class
No ratings yet
Mse1 Stat Class
81 pages
Chapter 4 MMW Data Management 1
No ratings yet
Chapter 4 MMW Data Management 1
27 pages
Unit 2 1
No ratings yet
Unit 2 1
48 pages
2 Types of Data
No ratings yet
2 Types of Data
44 pages
Statistics Overview
No ratings yet
Statistics Overview
13 pages
Revision WS 1 - Grade 7
No ratings yet
Revision WS 1 - Grade 7
7 pages
Welcome To The BI Data Analyst Career Path: Introduction To Data Cheatsheet - Codecademy
No ratings yet
Welcome To The BI Data Analyst Career Path: Introduction To Data Cheatsheet - Codecademy
8 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
Scientific Data
No ratings yet
Scientific Data
22 pages
EBA2123 1.data and Statistics
No ratings yet
EBA2123 1.data and Statistics
36 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
BUS 4055 Week 5
No ratings yet
BUS 4055 Week 5
16 pages
Unseen Passage For Class 7 With Questions
100% (1)
Unseen Passage For Class 7 With Questions
8 pages
Data Preparation Notebook
No ratings yet
Data Preparation Notebook
14 pages
2035 CH1 Notes
No ratings yet
2035 CH1 Notes
32 pages
STAT. Lec.1
No ratings yet
STAT. Lec.1
30 pages
Data Types For Analyst
No ratings yet
Data Types For Analyst
8 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
2.1 Project Planning, Scheduling & Resource Leveling
No ratings yet
2.1 Project Planning, Scheduling & Resource Leveling
25 pages
GEC104 Lesson9-NEW
No ratings yet
GEC104 Lesson9-NEW
72 pages
Types of Data
No ratings yet
Types of Data
14 pages
Introduction To Statistics - c1
No ratings yet
Introduction To Statistics - c1
19 pages
L1 Introduction-Displaying Data
No ratings yet
L1 Introduction-Displaying Data
8 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Notes On Statistics
No ratings yet
Notes On Statistics
15 pages
Lecture 2-Introduction To Satistics
No ratings yet
Lecture 2-Introduction To Satistics
43 pages
Statistics - Handouts 1
No ratings yet
Statistics - Handouts 1
6 pages
1 Introduction To Statistics
No ratings yet
1 Introduction To Statistics
89 pages
Notes of Week-1 and Week-2
No ratings yet
Notes of Week-1 and Week-2
30 pages
Analyzing The Data
No ratings yet
Analyzing The Data
54 pages
Data and Types of Data
No ratings yet
Data and Types of Data
7 pages
Teaching Science in Elementary Grades (Physics, Earth and Space Science
73% (11)
Teaching Science in Elementary Grades (Physics, Earth and Space Science
8 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Statistics Introduction
No ratings yet
Statistics Introduction
26 pages
FLUMES Designs
No ratings yet
FLUMES Designs
16 pages
BBFH 103 Notes
No ratings yet
BBFH 103 Notes
38 pages
SBE - 11e ch01
No ratings yet
SBE - 11e ch01
36 pages
Chapter 2 Stat (MMW)
No ratings yet
Chapter 2 Stat (MMW)
13 pages
8614
No ratings yet
8614
12 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
Module - Data Management (Part 1)
No ratings yet
Module - Data Management (Part 1)
23 pages
Lab Qa Checklist For Quality Control
No ratings yet
Lab Qa Checklist For Quality Control
6 pages
Reviewer +Ch+1+Data+and+Data+Preparation+
No ratings yet
Reviewer +Ch+1+Data+and+Data+Preparation+
3 pages
1 Descriptive Part
No ratings yet
1 Descriptive Part
13 pages
Quantitative Methods - I (Statistics)
No ratings yet
Quantitative Methods - I (Statistics)
30 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
Q4 Week 1 and 2 Math 7
No ratings yet
Q4 Week 1 and 2 Math 7
7 pages
Art of Defining A Concept Paper
No ratings yet
Art of Defining A Concept Paper
22 pages
Written Report Gathering and Organizing Data
No ratings yet
Written Report Gathering and Organizing Data
13 pages
Syntekoclassic Eu en Msds
No ratings yet
Syntekoclassic Eu en Msds
28 pages
ESSAY WRITING FOR EXAMS - E-Atsakymai - 2015.02.20 - Su Copyraitu
No ratings yet
ESSAY WRITING FOR EXAMS - E-Atsakymai - 2015.02.20 - Su Copyraitu
55 pages
English Grade 8 Hoc Ki 2 Nam 2019 2020
No ratings yet
English Grade 8 Hoc Ki 2 Nam 2019 2020
24 pages
Statistics 2ND Sem Reviewer
No ratings yet
Statistics 2ND Sem Reviewer
5 pages
Types of Clash Detection
No ratings yet
Types of Clash Detection
8 pages
Statistics For Business Topic - Chapter 2 - Data Collection
No ratings yet
Statistics For Business Topic - Chapter 2 - Data Collection
1 page
WTMD MSR - 0614 508 PDF
No ratings yet
WTMD MSR - 0614 508 PDF
27 pages
Unit 12 - Day 3 - Presentation
No ratings yet
Unit 12 - Day 3 - Presentation
21 pages
4.leadership - Ethics and Values (ISu)
No ratings yet
4.leadership - Ethics and Values (ISu)
21 pages
Definition of Statistics
No ratings yet
Definition of Statistics
4 pages
Module 3 - DC Motor Drives Rectifier
No ratings yet
Module 3 - DC Motor Drives Rectifier
21 pages
Lecture - Water Reqmts Spreadsheet
No ratings yet
Lecture - Water Reqmts Spreadsheet
8 pages
Principles of Methods of Teaching - Complete Reviewer
No ratings yet
Principles of Methods of Teaching - Complete Reviewer
6 pages
Reading - Leve 2 June 2023
No ratings yet
Reading - Leve 2 June 2023
6 pages
Material Safety Data Sheet - Basecoat
No ratings yet
Material Safety Data Sheet - Basecoat
3 pages
Fulltext Chromatography v3 Id1041
No ratings yet
Fulltext Chromatography v3 Id1041
4 pages
Merging Technologies in North African An
No ratings yet
Merging Technologies in North African An
2 pages
MDB2013 Business Statistic (Set A) A202
No ratings yet
MDB2013 Business Statistic (Set A) A202
5 pages
Positi Vis M
No ratings yet
Positi Vis M
5 pages
End Term Examination IKS
No ratings yet
End Term Examination IKS
3 pages
MIS Automation For Accurate Sales Forecasting Sales Analysis and Stock Planning
No ratings yet
MIS Automation For Accurate Sales Forecasting Sales Analysis and Stock Planning
78 pages
Untitled - EH-AD-2.EHTW-VAC-2
No ratings yet
Untitled - EH-AD-2.EHTW-VAC-2
1 page
Energy Losses in Bends and Fittings - F1-22
No ratings yet
Energy Losses in Bends and Fittings - F1-22
1 page

Principles of Data Literacy - Introduction To Data Cheatsheet - Codecademy

Uploaded by

Principles of Data Literacy - Introduction To Data Cheatsheet - Codecademy

Uploaded by

Cheatsheets / Principles of Data Literacy

Binary Categorical Variables

Quantitative Vs. Categorical Variables

Ordinal and Nominal Categorical Data

ID# Name ChemGrade2020 MathGrade2020 B

2 Observation Observation Observation

3 Observation Observation Observation

Tidy Data Rules

Below is an example of a tidy dataset:

ID# Student Year Class Grade

1 Brown 2020 Chem F

1 Brown 2021 Chem B

1 Brown 2021 Math A

2 Smith 2020 Bio C

2 Smith 2021 CompSci B

3 Saito 2020 Chem A

3 Saito 2021 Math B

Sample Set of Data

Structurally Missing Data

ID# Name Breed Sex Litters Pups/L

Missing at Random Data

3 Mouth Swab 7876

4 Hand Swab 657

6 Hand Swab 2442

7 Mouth Swab 5444

9 Hand Swab 4654

Plant Height (cm) # of Fruits

Saved Print Share

You might also like