Chapter 1 Data and Data Preparation - Jaggia4e - PPT
Chapter 1 Data and Data Preparation - Jaggia4e - PPT
Business Statistics:
Communicating with Numbers, 4e
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written
1/22/23 consent of McGraw Hill.
1-1
Chapter 1 Learning Objectives (LOs)
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-2
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-2
Statistics
Merriam-Webster
• a branch of mathematics dealing with the
collection, analysis, interpretation, and
presentation of masses of numerical data
• a collection of quantitative data
• quantitative – can be measured
• The term statistics can refer to numerical facts
such as averages, medians, percentages, and
maximums that help us understand a variety
of business and economic situations.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-3
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-3
Applications in Business and
Economics
• Accounting
Public accounting firms use statistical
sampling procedures when conducting
audits for their clients, e.g., consumption,
earnings, cash flows, etc.
• Economics
Economists use statistical information in
making forecasts about the future of the
economy or some aspect of it.
• Finance
Financial advisors use price-earnings ratios and
dividend yields to guide their investment advice.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-4
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-4
Applications in Business and
Economics
• Marketing
Electronic point-of-sale scanners at retail
checkout counters are used to collect data for a
variety of marketing research applications.
• Production
A variety of statistical quality control charts are
used to monitor the output of a production
process.
• Information Systems
A variety of statistical information helps
administrators assess the performance of
computer networks.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-5
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-5
• Statistics is used to analyze large amounts
of data for making better decisions.
• Decision makers
– Managers - boost a company’s revenue or
deepen customer engagement, better
marketing strategies
– Consumers – better options
– Sports enthusiast – succeed in future events
– Politicians - elections
– Medical professional – better diagnoses and
cures
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-6
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-6
Introductory Case: Retail Customer Data (1)
• Design a marketing campaign for Organic Food Superstore.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-7
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-7
1.1: Types of Data (1)
• Data are compilations of facts, figures, or other
contents, both numerical and non-numerical.
– All types/formats are generated from multiple sources.
– Customers/businesses use data from to help make decisions.
– Statistics is the language of data.
• Statistics is the science that deals with the collection,
preparation, analysis, interpretation, and presentation
of data.
– First: find the right data and prepare it for the analysis.
– Second: use the appropriate statistical tool, which depends
on the data.
– Third: clearly communicate information with actionable
business insights.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-8
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-8
1.1: Branches of Statistics
• There are two branches of statistics: descriptive
and inferential statistics.
• Descriptive statistics refers to the summary of
important aspects of a data set.
– Includes collecting, organizing, and presenting the data
in the form of charts and tables.
– Often calculate numerical measures (typical value,
variability); Measures of Central Tendency: mean,
median, mode; Measures of Variability: Range,
Variance, Standard deviation
Statistic Cases Deaths Recovery
Min 32 3 0
Median 69,993 3,231 14,662
MeanBUSINESSCopyright
52,424
STATISTICS: COMMUNICATING
©2022 McGraw 2,937
Hill. All WITH NUMBERS,
rights reserved. 27,8664eor| distribution
No reproduction Jaggia, Kellywithout the prior written1-9
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-9
1.1: Types of Data (2)
• Inferential statistics refers to drawing
conclusions about a larger set of data (population)
based on a smaller set of data (sample).
– A population consists of all items/members of interest.
– A sample is a subset of the population.
– * Use sample data to make an inference or draw a
conclusion of the population.
– **Uses probability to determine how confident you are
that the conclusions are correct. (confidence intervals
and margin or errors)
• We rely on sample data to make inferences about
various characteristics of the population.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-10
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-10
Exercises: Descriptive or Inferential
1. By 2040 at least 3.5 billion people will run
short of water. (source: WFS)
2. Eight out of ten of the job fatalities are
men.
3. Experts say the mortgage rates may soon
hit bottom.
4. Steam inhalation cannot cure covid.
5. The national average annual medicine
expenditures per person is $1052.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-11
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-11
1.1: Types of Data (3)
• We analyze sample data and calculate a sample statistic to
make inferences about the unknown population parameter.
• It is generally not feasible to obtain population data.
– Obtaining information on the entire population is expensive.
– It is impossible to examine every member of the population.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-12
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-12
Exercises: Population or Sample
1. The grade point averages (GPAs) of all
students in the University.
2. The Income of a randomly selected
employee of FEU.
3. The status of every third customer who
enter the salon.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-13
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-13
Exercises
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-14
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-14
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-15
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-15
1.1: How Data are collected
• Cross-sectional data refers to data collected by recording
a characteristic of many subjects at the same point in time,
or without regard to differences in time.
• Example: 2018-2019 (1 year) NBA Eastern Conference
standings
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-16
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-16
1.1: Types of Data (5)
• Time series data refers to data collected over several time
periods focusing on certain groups of people, specific events, or
objects.
• Time series data can include hourly, daily, weekly, monthly,
quarterly, or annual observations.
• Example: homeownership rates (%) in the US.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-17
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-17
1.1: Types of Data (6)
• Structured data
– Reside in a pre-defined, row-column format.
– Spreadsheet or database applications - Enter, store, query, and analyze.
– Numerical information that is objective and not open to interpretation.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-18
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-18
1.1: Types of Data (7)
• Unstructured data
– Do not conform to a pre-defined, row-column
format.
– Textual (written reports, notes, survey) and
multimedia content ( photos, audio, videos).
– Do not conform to database structures.
– These data may have some implied structure.
• Still considered unstructured.
– Do not conform to a row-column model required in
most database systems.
– Example: social media data such as Twitter,
YouTube, Facebook, and blogs.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-19
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-19
1.1: Types of Data (8)
• Businesses generate and gather more and more data at an
increasing pace: Big Data.
– A massive volume of structured and unstructured data
– Extremely difficult to manage, process, and analyze using traditional data
processing tools
– Presents great opportunities to gain knowledge and game-changing intelligence
• BIG DATA ((www.gartner.com). “[H]igh-volume, high-velocity
and/or high-variety information assets that demand cost-
effective, innovative forms of information processing that enable
enhanced insight, decision making, and process automation”.
• Does not imply complete (population) data
• Big data may not be used when available
– Inconvenient and computationally burdensome
– Benefits may not justify costs
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-20
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-20
1.1: Types of Data (9)
• There are three characteristics of big data. 5Vs
– Volume: immense amount of data complied for a single or
multiple sources.
– Velocity: generated at a rapid speed, management is a
critical issue
– Variety: all types, forms, granularity, structured or
unstructured
• Additional characteristics
– Veracity: credibility and quality of the data, reliability.
– Values: methodological plan for formulating questions,
curating the right data and unlocking hidden potential
• Having an excessive amount of data does not
guarantee that useful insights or measurable
improvements will be generated.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-21
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-21
1.1: Types of Data (10)
• There is an abundance of data on the Internet.
• Many experts believe that 90% of the data in the world
today was created in the last two years alone.
• It is easy to access and find data by using a search
engine like Google.
• There are several sources of data.
– Bureau of Economic Analysis
– Bureau of Labor Statistics
– Federal Research Economic Data
– U.S. Census Bureau
– National Climate Data Center
– Yahoo Finance, Google Finance
– Zillow
– ESPN
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-22
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-22
Exercises
6. The sale price of 20 single-family homes sold in LA
Nevada, in the last 30 days. The data is represented in
a tabular format and include the sale, price, number of
bedrooms, square footage, and age of the house. Is
the data cross-sectional or time series?
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-23
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-23
Exercises
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-24
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-24
1.2: Variables and Scales of Measurement (1)
• A variable is a characteristic of interest that differs
in kind or degree among various observations
(records). Example: marital status, income
• There are two types of variables: categorical and
numeric.
• Categorical Data
– Also called qualitative
– Represent categories like marital status
– Labels or names to identify distinguishing characteristics
– Can be defined by two or more categories
– Coded into numbers for data processing
– Example: marital status, grade in a course
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-25
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-25
1.2: Variables and Scales of Measurement (2)
• For a numerical variable, we use numbers to identify the
distinguishing characteristic of each observation.
• Numeric Data
– Also called quantitative
– Represent meaningful numbers
– Either discrete or continuous
• A discrete variable assumes a countable number of values,
precise.
– The values need not be whole numbers
– Example: number of children in a family, score, stock price
• A continuous variable assumes an uncountable number of
values within an interval.
– In practice, often measure in discrete values
– Example: weight of a newborn baby
• In order to choose the appropriate techniques for summarizing
and analyzing variables, we need to distinguish between the
different measurement scales.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-26
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-26
Example
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-28
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-28
Exercises
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-29
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-29
1.2: Scales of Measurement (3)
• There are four major scales: nominal, ordinal, interval, ratio.
• Nominal and ordinal scales are used for categorical variables.
• Nominal
– Least sophisticated, lowest level of measurement, no natural order
– Represent categories or groups or dichotomous (only 2 categories say 0 and 1))
– Values differ by label or name
– Example: marital status, color, religion, gender: 0 – male, 1 – female ( numbers have no
real meaning; only differentiating between objects)
• Ordinal
– Stronger level of measurement, orders matters
– Categorize and rank data with respect to some characteristic
– Cannot interpret the difference between the ranked values, numbers are arbitrary
(unspecified)
– Example: reviews from 1 star (poor) to 5 starts (outstanding), Likert scales, Pain scale
• Categorical variable are typically expressed in words but are coded into
numbers for purposes of data processing.
– Typically count the number of observations that fall into each category (or find
percentages)
– Unable to perform meaningful arithmetic operations
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-30
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-30
• Nominal : Example
Color Survey Note that the numeral
Results value does not have
1 Red 5 50% meaning; order does not
matter
2 Yellow 3 30%
3 Blue 2 20%
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-31
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-31
1.2: Variables and Scales of Measurement (4)
• Interval and ratio scales are used for numerical variables.
• Interval
– Categorize and rank, differences are meaningful and can be measured
– Zero value is arbitrary and does not reflect absence of characteristic, it
represent additional measurement.
– Ratios are not meaningful
– Example: temperature 30 F – 60 F ( differences can be measured); grades
80-90, 0 score does not mean no knowledge; Elevation; Time
• Ratio
– Strongest level of measurement
– A true zero point, reflects absence of characteristic
– Ratios are meaningful (10lbs. Is twice as much as 5lbs. 10/5 = 2)
– Example: precise in physical measurement = weight, height, pulse, blood
pressure, Time
• Arithmetic operations are valid on interval- and ratio-scaled
variable.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-32
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-32
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-33
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-33
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-34
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-34
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-35
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-35
1.2: Variables and Scales of Measurement (5)
• Example: The owner of a ski resort gathers data on tweens.
• Music: nominal
• Food quality: ordinal
• Closing time: interval
• Own money spent: ratio
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-36
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-36
Exercises
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-37
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-37
Exercises
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-38
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-38
1.3: Data Preparation (1)
• We often spend a considerable amount of time inspecting
and preparing the data for the subsequent analysis.
– Counting and sorting
– Handling missing values
– Subsetting
• Counting and Sorting
– Among the very first tasks analysts perform
– Gain a better understanding and insights into the data
– Help to verify that the data set is complete or determine if there are
missing values
– Sorting allows us to review the range of values for each variable
– Sort based on a single or multiple variables
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-39
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-39
1.3: Data Preparation (2)
• There are two common strategies for dealing with
missing values.
• The omission strategy recommends that
observations with missing values be excluded from
subsequent analysis.
• The imputation strategy recommends that the
missing values be replaced with some reasonable
imputed values.
– Numeric variables: replace with the average
– Categorical variables: replace with the predominant
category
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-40
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-40
1.3: Data Preparation (3)
• Subsetting is the process of extracting a portion
of the data set that is relevant for subsequent
statistical analysis.
– The objective of the analysis is to compare two
subsets of the data.
– Eliminate observations that contain missing values,
low-quality data, or outliers.
– Excluding variables that contain redundant
information, or variables with excessive amounts of
missing values.
• We can also subset data based on data
ranges.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-41
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-41
Excel function
• COUNT function – counts the number of cells
that contain NUMERIC observations.
• COUNTA function – counts the number of
cells that are NOT EMPTY.
• COUNTIF function – counts cells in a range
that meet a single criteria.
• COUNTIFS function - counts the number of
cells in a range, that meets a single or
multiple criteria.
• COUNTBLANK – counts the number of cells
in a range that has no data on it.
BUSINESSCopyright
STATISTICS: COMMUNICATING
©2022 McGraw WITH NUMBERS,
Hill. All rights reserved. 4eor| distribution
No reproduction Jaggia, Kelly 1-42
without the prior written
consent of McGraw Hill.
Copyright ©2022 McGraw Hill. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill. 1-42