0% found this document useful (0 votes)

32 views5 pages

Chapter 1 & 2 - Stats

The document covers the fundamentals of data and data preparation, including types of data, branches of statistics, and methods of data collection. It discusses various measurement scales, the characteristics of big data, and the importance of data preparation techniques. Additionally, it outlines methods for visualizing data through tables and graphs, emphasizing the significance of effective communication of insights derived from data analysis.

Uploaded by

Jvnz Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views5 pages

Chapter 1 & 2 - Stats

Uploaded by

Jvnz Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Chapter 1: Data and Data Preparation

Types of Data
• Data: Compilations of facts, figures, or other contents, both numerical and non-numerical.
- All types/formats are generated from multiple sources
- Customers/businesses use data from to help make decisions.
- Statistics is the language of data.
• Statistics: is the science that deals with collecting, preparing, analyzing, interpreting, and
presenting data.

• First: find the right data and prepare it for the analysis.
• Second: use the appropriate statistical tool, which depends on the data.
• Third: clearly communicate information with actionable business insights.

Branches of Statistics
• Descriptive Statistics: Summarizes IMPORTANT ASPECTS OF DATA SET including
collecting, organizing, and presenting data in charts and tables.
- Often calculate numerical measures (typical value, variability).
• Inferential Statistics: Draws conclusions about a LARGER SET OF DATA (population) based
on the smaller set of data (sample). It involves analyzing sample data to make inferences about
the unknown population parameter.
- A population consists of all items/members of interest.
- A sample is a subset of the population.
GENERALLY: It is not feasible to obtain population data
- (ex. all the population in the Philippines using cellphone)

Two ways of collecting data

• Cross-sectional Data: refers to data collected by recording a characteristic of many subjects at
the SAME POIN IN TIME or without regard to differences in time.
o Example: NBA Eastern Conference standings for a specific season.
• Time Series Data: Data collected OVER SEVERAL TIME PERIODS focusing on certain
groups, events, or objects.
- Time series data can include hourly, daily, weekly, monthly, quarterly, or annual
observations.
o Example: Homeownership rates over several years.
Types of Data
• Structured Data: Resides in a PRE-DEFINED row-column format, such as spreadsheets or
databases. It is NUMERICAL AND OBJECTIVE.
- Today, only about 20% of all data used in business decisions are structured.
• Unstructured Data: DOES NOT conform to a PRE-DEFINED format and includes textual and
multimedia content, such as social media data.
- Do not conform to a row-column model required in most database systems
Example: social media data such as Twitter, YouTube, Facebook, and blogs.

Big Data
• 3 Characteristics of Big data:
o Volume: Immense amount of data compiled from multiple sources.
o Velocity: Data generated at a rapid speed.
o Variety: Different types, forms, and granularity of data.

• Additional characteristics:
o Veracity: Credibility and quality of the data.
o Value: Methodological plan for formulating questions and unlocking hidden potential.
• Challenges: Difficult in managing, processing, and analyzing large volumes of data using
traditional tools.

Variables and Scales of Measurement

2 types of variables
• Categorical Variables: Qualitative data representing categories (e.g., marital status).
• Numerical Variables: Quantitative data, either discrete (countable values) or continuous
(uncountable values within an interval).
NOTE: In order to choose the appropriate techniques for summarizing and analyzing variables, we
need to distinguish between the different measurement scales.

Measurement Scales
• Nominal Scale: LEAST SOPHISTICATED. Represents categories or groups without a specific
order (e.g., marital status).
• Ordinal Scale: STRONGER LEVEL OF MEASUREMENT. Categorizes and ranks data with
respect to some characteristic, but differences between ranks are not meaningful (e.g., star
ratings).
• Interval Scale: MEANINGFUL DIFFERENCES. Categorizes and ranks data with meaningful
differences, but zero is arbitrary (e.g., temperature). Ratios are NOT meaningful.
• Ratio Scale: STRONGEST LEVEL OF MEASUREMENT. CONSISTENT AND
MEANINGFUL with a true zero point, allowing meaningful ratios (e.g., weight, height, profits).
Arithmetic operations are valid on interval- and ratio-scaled variable.

Data Preparation
• Inspecting and Preparing Data: Involves counting, sorting, handling missing values, and
subsetting.
o Counting and Sorting: Helps verify data completeness or determine if there are missing
values and review value ranges.
o Strategies in handling missing values:
▪ Omission Strategy: EXCLUDE OBSERVATION with missing values.
▪ Imputation Strategy: REPLACE values with reasonable imputed values (e.g.,
average for numeric variables, predominant category for categorical variables).

• Numeric variables: replace with the average.

• Categorical variables: replace with the predominant category
o Subsetting: EXTRACTING RELEVANT PORTION of the data set for analysis,
eliminating low-quality data, and excluding redundant variables.
Chapter 2: Tabular and Graphical Methods
Introductory Case: House Prices in Punta Gorda
• Objective: Use sample information to:
1. Make summary statements concerning the range of house prices.
2. Comment on where house prices tend to cluster.
3. Examine the relationship between house price and size.

Methods to Visualize a Categorical Variable

• Frequency Distribution: Group data into categories and record the number of observations
in each category. Calculate relative frequency and percentage frequency.
• To calculate the frequency distribution: multiply the proportion by 100 to get percentage.
o Example: Myers-Briggs assessment personality types for 1,000 employees.
• Bar Chart: Depicts frequency or relative frequency for each category using horizontal or vertical
bars.
- Series of either horizontal or vertical bars.
- Bar lengths proportional to the values they are depicting.
Note: The vertical axis on a graph should not have excessively high values at the top.
• Pie Chart: Segmented circle portraying relative frequencies of categories.

Methods to Visualize the Relationship Between TWO Categorical Variables

• Contingency Table: Examines the relationship BETWEEN TWO categorical variables by
showing frequencies for each combination of values.
o Example: Myers-Briggs personality assessment and sex.
• Stacked Column Chart: Visualizes MORE THAN ONE categorical variable, allowing
comparison within each category.

Methods to Visualize a Numeric Variable

• Categorical - the raw data could be categorized in a well-defined way.
• Numerical variable - each observation represents a meaningful amount or count.
• Frequency Distribution: Summarizes a numerical variable by constructing intervals or classes.
o Example: House prices in Punta Gorda.
Intervals:
- Mutually exclusive
- The total number of intervals usually ranges from 5 to 20
- Intervals are exhaustive.
- Easy to recognize and interpret.
3 other items to compute:

• Relative frequency: PROPORTION or fraction of observation that falls into

EACH INTERVAL.
• Cumulative frequency: NUMBER OF OBSERVATION that falls BELOW THE
UPPER LIMIT.
• Cumulative relative frequency: PROPORTION or fraction of observation that
falls BELOW THE UPPER LIMIT.
• Histogram: Graphically represents a frequency distribution using rectangles with heights
representing frequency or relative frequency.
- Symmetric: mirror image of itself (same both sides)
- Skewed: Positive (elongated right tail) or negative (elongated left tail).
• Polygon: Connects midpoints of intervals with a straight line to show the shape of a
distribution.
• Ogive: Depicts cumulative frequency or cumulative relative frequency using points
connected by a straight line.

More Data Visualization Methods

• Scatterplot: Examines the relationship between two numerical variables, revealing linear,
nonlinear, or no relationship.
- Determine if two numerical variables are related in some systematic way.
- • Each point represents a pair of observations of the two variables.
- • Refer to one variable as x (x-axis) and the other as y (y-axis).
o Example: House prices and square footage in Punta Gorda.
• Scatterplot with a Categorical Variable: Incorporates a categorical variable using color to show
its category.
A scatterplot with a categorical variable modifies a basic scatterplot.
• Incorporate a categorical variable in addition to the two numeric variables.
• Encode the categorical variable with color.
• Giving each point a distinct hue makes it easy to show its category.
• Line Chart: Displays a numerical variable as a series of consecutive observations connected by a
line, useful for tracking changes or trends over time.
o Example: Monthly stock prices for Apple and Merck.

Stem-and-Leaf Diagram
• Stem(left-most digitis)-and-Leaf(the last digit) Diagram: Provides a visual method for
displaying a numerical variable, showing where observations are centered and dispersed.
o Example: Age of the wealthiest people in the world.

SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
Statistics and Probability Reviewer
77% (13)
Statistics and Probability Reviewer
6 pages
Pre Professional Examination For General Nurses MCQ
No ratings yet
Pre Professional Examination For General Nurses MCQ
29 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
SPSS - Unit I
No ratings yet
SPSS - Unit I
31 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Unit-2 MFAI
No ratings yet
Unit-2 MFAI
118 pages
Data Management
No ratings yet
Data Management
57 pages
Unit-1 Theory
No ratings yet
Unit-1 Theory
26 pages
Revision SB Chap 2 7
No ratings yet
Revision SB Chap 2 7
55 pages
Stats
No ratings yet
Stats
11 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
Stat Introduction Units 1& 2
No ratings yet
Stat Introduction Units 1& 2
108 pages
Data Management
No ratings yet
Data Management
44 pages
Topic 1 Introduction To Statistics
No ratings yet
Topic 1 Introduction To Statistics
35 pages
L1 QM02 High Yield Notes
No ratings yet
L1 QM02 High Yield Notes
10 pages
Introduction To Statistics - c1
No ratings yet
Introduction To Statistics - c1
19 pages
Data Managementmmw
No ratings yet
Data Managementmmw
26 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
Notes of Week-1 and Week-2
No ratings yet
Notes of Week-1 and Week-2
30 pages
SMDS Unit 1
No ratings yet
SMDS Unit 1
36 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
4 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
39 pages
Overview: Describing and Interpreting Data: Variable
No ratings yet
Overview: Describing and Interpreting Data: Variable
5 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
01 Data & Statistics
No ratings yet
01 Data & Statistics
35 pages
ADDB - Week 1
No ratings yet
ADDB - Week 1
44 pages
Statistics
No ratings yet
Statistics
2 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Ahsan Stats
No ratings yet
Ahsan Stats
9 pages
Organizing-Data 250120 180858
No ratings yet
Organizing-Data 250120 180858
32 pages
2. presenting of data - ١١١٠٥٩
No ratings yet
2. presenting of data - ١١١٠٥٩
39 pages
Understanding Comparative Politics PDF
No ratings yet
Understanding Comparative Politics PDF
20 pages
FIN10002 - Notes Master
No ratings yet
FIN10002 - Notes Master
44 pages
STA 111 Note
No ratings yet
STA 111 Note
12 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Ae 9 Reviewer
No ratings yet
Ae 9 Reviewer
7 pages
SASA Reviewer
No ratings yet
SASA Reviewer
4 pages
2/ Organizing and Visualizing Variables: Dcova
No ratings yet
2/ Organizing and Visualizing Variables: Dcova
4 pages
1 - 2 Biostatistics
No ratings yet
1 - 2 Biostatistics
24 pages
STAT Module I Notes
No ratings yet
STAT Module I Notes
10 pages
DS1 Section D
No ratings yet
DS1 Section D
14 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Data Types: and Its Representation Session - 2 & 3
No ratings yet
Data Types: and Its Representation Session - 2 & 3
33 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Statistic Reviewer
No ratings yet
Statistic Reviewer
9 pages
Written Report Gathering and Organizing Data
No ratings yet
Written Report Gathering and Organizing Data
13 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
No ratings yet
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
53 pages
Introduction To Data Viz Lecture 2
No ratings yet
Introduction To Data Viz Lecture 2
44 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Statistics For Research: Data and Variables
No ratings yet
Statistics For Research: Data and Variables
7 pages
Notes
No ratings yet
Notes
3 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Csec Math Sba Guidelines
No ratings yet
Csec Math Sba Guidelines
6 pages
Agriculture Statistics 2019 PDF
No ratings yet
Agriculture Statistics 2019 PDF
711 pages
Chapter 5 - Marketing Information System
No ratings yet
Chapter 5 - Marketing Information System
10 pages
DLL Week 1.2 - Stat and Proba Q3
100% (1)
DLL Week 1.2 - Stat and Proba Q3
8 pages
Education Mcqs
No ratings yet
Education Mcqs
6 pages
Introduction To Quantitative Methods For Economists
No ratings yet
Introduction To Quantitative Methods For Economists
8 pages
Jaba Elisabeta
0% (1)
Jaba Elisabeta
7 pages
AP Stats Chapter 9B Test
No ratings yet
AP Stats Chapter 9B Test
7 pages
Final Exam Sample Test
No ratings yet
Final Exam Sample Test
12 pages
Problems On Normal Distribution
No ratings yet
Problems On Normal Distribution
3 pages
Ips Math Assignment - 1 3rd Sem
No ratings yet
Ips Math Assignment - 1 3rd Sem
2 pages
Psychologicaltesting (Practical)
No ratings yet
Psychologicaltesting (Practical)
2 pages
Solution HW3
No ratings yet
Solution HW3
16 pages
The Effect of Using The Team Quiz Method On Student Learning Outcomes in Mathematics Subjects
No ratings yet
The Effect of Using The Team Quiz Method On Student Learning Outcomes in Mathematics Subjects
5 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
39 pages
Effects of Resistance Training in Children and Adolescents A Meta-Analysis
No ratings yet
Effects of Resistance Training in Children and Adolescents A Meta-Analysis
14 pages
Influence of Social Media Marketing On Brand Image of Mamaearth
No ratings yet
Influence of Social Media Marketing On Brand Image of Mamaearth
50 pages
Impact of Behavioral Finance On Stock Investment Decisions Applied Study On A Sample of Investors at Amman Stock Exchange
No ratings yet
Impact of Behavioral Finance On Stock Investment Decisions Applied Study On A Sample of Investors at Amman Stock Exchange
17 pages
Rose Proposal
No ratings yet
Rose Proposal
27 pages
Bks MaaHL 0502 gdc05 Xxti84
No ratings yet
Bks MaaHL 0502 gdc05 Xxti84
4 pages
0 A Critical Analysis of Agile and Lean Methodology To Fulfill The Project Management Gaps in NPOs
No ratings yet
0 A Critical Analysis of Agile and Lean Methodology To Fulfill The Project Management Gaps in NPOs
17 pages
Quantitative Methods For Management: Session 8
No ratings yet
Quantitative Methods For Management: Session 8
60 pages
Second Project - Rics
No ratings yet
Second Project - Rics
6 pages
Chapter-5 Introduction To Probability
No ratings yet
Chapter-5 Introduction To Probability
15 pages
Lunet: A Deep Neural Network For Network Intrusion Detection
No ratings yet
Lunet: A Deep Neural Network For Network Intrusion Detection
8 pages
Jurnal Titus
No ratings yet
Jurnal Titus
11 pages
PASS 2000: Power Analysis and Sample Size Software From NCSS
No ratings yet
PASS 2000: Power Analysis and Sample Size Software From NCSS
2 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet

Chapter 1 & 2 - Stats

Uploaded by

Chapter 1 & 2 - Stats

Uploaded by

Chapter 1: Data and Data Preparation

Two ways of collecting data

Variables and Scales of Measurement

• Numeric variables: replace with the average.

Methods to Visualize a Categorical Variable

Methods to Visualize the Relationship Between TWO Categorical Variables

Methods to Visualize a Numeric Variable

• Relative frequency: PROPORTION or fraction of observation that falls into

More Data Visualization Methods

You might also like