0% found this document useful (0 votes)
43 views

Chapter 1 Slides

The document discusses key concepts in statistics including data, descriptive statistics, inferential statistics, populations, samples, parameters, statistics, and sampling. It provides examples to illustrate population, sample, parameter, statistic, and how inferential statistics can be used to make inferences about populations based on sample data. It also covers different types of data, scales of measurement, sources of statistical data, and common data collection methods like surveys and experiments.

Uploaded by

Minh Ngọc
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Chapter 1 Slides

The document discusses key concepts in statistics including data, descriptive statistics, inferential statistics, populations, samples, parameters, statistics, and sampling. It provides examples to illustrate population, sample, parameter, statistic, and how inferential statistics can be used to make inferences about populations based on sample data. It also covers different types of data, scales of measurement, sources of statistical data, and common data collection methods like surveys and experiments.

Uploaded by

Minh Ngọc
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

CHAPTER 1

DATA AND STATISTICS


Learning Goals
Data
Statistics
Descriptive vs. Inferential Statistics
Types of Descriptive Statistics
Elements of Inferential Statistics
Data Collection Methods
Inference errors from nonrandom samples
Survey
A random sample of students taking a statistics class are asked,
“What is your age?”
Responses 26
35 21
23
22 25 20
19 25
27 24
20 26 45
30 31
25 23 23 21
Data

22 35 21 26
23 27 25 20
20 19 25 24
25 26 23 21
23 30 45 31
Data
In general data are facts and figures.
In reality, data are often very large
◦ Much larger than this example

Data is often stored in Large computer databases.


Data
For example, the United Parcel Service (UPS) tracks every
package it ships from one place to another around the
world and stores these records in giant databases.

The database is so large that it is the same size as a


database that contains every book in the Library of
Congress!
Information
The question is:
What can anyone hope to do with all these data?
In other words,
“how do we extract useful information from this data?”

STATISTICS plays a role in making sense of complex data –


world.
What is Statistics?

Statistics
is a way to get

INFORMATION
from

DATA
STATISTICS
Using STATISTICS we draw conclusions (extract useful
information!) from data.

With the extracted information, statistics help managers to


make valid business decisions in response to such questions
as:
STATISTICS (cont.)
What is the effect of advertising on sales?
What is the relationship between shelf location and cereal
sales?
Do aggressive high-growth mutual funds really have higher
returns than more conservative funds?
Do university students from different parts of the world
perceive business ethics differently?
How reliable are the quarterly forecasts for your firms?
etc…
Business Analytics
In this course, we emphasize the use of STATISTICS for
business and economic decision making.

This course could be viewed as introduction to BUSINESS


ANALYTICS.

Business Analytics is defined as the scientific process of


transforming data into insights for making better business
decisions.
Scales of measurement
Scales of data indicate the type of statistical
analyses that are most appropriate
3 major types of measurement scales:
◦ Nominal, Ordinal, Interval
Nominal Data: qualitative or categorical and labels
are used to denote the classes/categories
◦ Example: Students of a university are classified by the
school in which they are enrolled using a label such as
Business, Humanities, Education, and so on.
Scales of measurement (cont.)
Ordinal Data: Same as nominal data but there is an
ordering or ranking to them that is meaningful
◦ Example: Credit ratings of individuals can be classified
as Excellent, Good, Fair, Poor.
Interval Data: Quantitative or numerical in nature
◦ Example: SAT scores, grades on an exam (%), income,
height and weight, etc.
All arithmetic operations are possible for interval
data but not for nominal and ordinal data types
Cross-sectional vs. time series
data
Cross-sectional data are collected at the same or
approximately the same point of time
◦ Example: student grades, heights of 100 people
Time series data are collected over several time
periods
◦ Example: US average price per gallon of gasoline
between 2007 and 2012
◦ Graphs of time series data are frequently found in
business and economic publications
Types of Statistics
Two Types of Statistics

Statistics

Descriptiv
Inferential
e
Descriptive Statistics
Descriptive statistics refers to the summary of
important aspects of a data set.

This includes collecting data, organizing the data, and then presenting the
data in the forms of charts (figures), tables and numerical measures.
Inferential Statistics
Inferential statistics goes beyond the data at our
disposal.

More formally, it refers to drawing conclusions about a large set of data


– called the population – based on a smaller set of sample data.
Can you identify the type of
statistics?
1. Using a survey of a random sample of 5000 California
residents, a UCLA Economist told a local TV station that
over 55% of Californians have a positive view about the
future of the U.S. economy.

2. Say, from a survey of a random sample of 5000 CSUF


students, it has been learned that 80% of those sampled
are very excited about studying statistics.

Answers: 1) Inferential statistics; 2) Descriptive statistics


Basic Statistical Concepts in
Inferential Statistics
Population
◦ A set of items (experimental units) under study
Parameter (Variable)
◦ A descriptive measure of the population that is of interest
e.g. the mean (Unknown -- Use Greek letter)
(Random) Sample
◦ A (random) subset chosen from the population
Statistic
◦ A descriptive measure that is calculated from the sample,
e.g. the sample mean (Use regular letter)
Purpose of Inferential Statistics
Making inferences about a
parameter of a population

based on information obtained from a


statistic of the sample

(With a Certain Degree of Confidence)


Example 1
According to ABC Consulting (a made up company that does not exist), the
average age of viewers of “American Idol” is 23 years. But the producer of the
show thinks that the average age is higher than 23. To test her hypothesis, the
producer of the show samples 500 Idol viewers and determines the age of
each.
a) Describe the “population”
b) Describe the variable of interest and possibly the “parameter” of
interest
c) Describe the “sample”
d) Describe the “statistics”
e) Describe the “inference”
Answers
a) The population is all viewers of the American Idol TV
show.
b) The average age of the TV show viewers.
c) The 500 Idol viewers, who has been randomly selected.
d) The average age of the sampled viewers.
e) How to infer about the average age of all viewers using
the sampled data – with a certain degree of confidence.
For Example
After you complete this course, you will be able to make statements like
this…

At the 5% level of risk, we reject the hypothesis that the average age is
23 in favor of the alternative hypothesis (not equal to 23, greater than
23 or less than 23).
Example 2
An airline company is interested in the opinions of their frequent flyer customers
about their proposed new routes. Specifically, they want to know what proportion
of them plan to use one of their new hubs in the next 6 months. They take a
random sample of 10,000 from the database of all frequent flyers and send them
an e-mail message with a request to fill out a survey in exchange for 1500 miles.
a) Describe the “population”
b) Describe the variable of interest and possibly the “parameter” of interest
c) Describe the “sample”
d) Describe the “statistics”
e) Describe the “inference”
Answers
a) The population is all frequent flyer customers of the
airline.
b) The proportion of frequent flyer customers that plan to
use one of the new hubs in the next 6 months.
c) The 10,000 frequent flyer customers who has been
randomly selected.
d) The proportion that plan to use one of the new hubs in
the next 6 months out of the 10,000 selected sample.
For Example
After you complete this course, you will be able to make
statements like this…

With a margin of error of 4% and with the 95% confidence,


40% of frequent flyer customers will use one of the new hubs
in the next 6 months.
Sampling
1. Examine part of the whole or population
◦ impractical, prohibitive, costly
2. Randomized sample
◦ Every item in the population has an equal chance of being
in any particular sample.
◦ Silly example: How do you test a pot of soup for saltiness?

3. Sample size matters, NOT the size of the population


◦ A random sample of 100 students represents the student
body as well as 100 voters represents the entire electorate
in USA.
Goal of Data Collection
To obtain a “representative sample” that exhibits the characteristics of
the entire population.

Most common approach – taking random samples where each


experimental unit in the population theoretically has the same chance of
being selected for the sample.
Sources of Statistical Data
Data can be extracted from a public source
◦ Wall Street Journal, Orange County Business Journal

A designed experiment can be performed


◦ Test cavity prevention – divide subjects into groups

A survey can be taken


◦ Presidential poll (phone, mail), TV program (Nielsen)

Observation studies can be made


◦ Observe output of workers on morning/evening shifts
Example 3
How do consumers feel about using the Internet for online shopping?
To find out, a customer-experience software company commissioned a
nationwide survey of 1859 U.S. adults who had conducted at least one
online transaction in the past year. The findings, reported on
BusinessWeek.com (2006), revealed that 1655 respondents or 89%
experienced technical problems with an online transaction.
a. Identify the data-collection method
b. Identify the target population
c. Are the sample data representative of the population?
Answers
a. Data source – SURVEY
b. Population – all US online shoppers with at least one
online transaction last year
c. Are the sample data representative of the population?
No complete information is given. So, one wonders if there
may be a case of self selection bias and non-response bias.
Non-Random Sampling Errors
Selection bias
◦ One subset of experimental units in the population has
either no chance, less of a chance, or more of a chance of
being selected than another subset
Non-response bias
◦ When data is unavailable or unattainable for certain
experimental units in the population
Measurement errors
◦ Inaccuracies in getting/recording data; ambiguous
questions on questionnaires, etc.
For the online example
Some shoppers may have been excluded from the survey for
several reasons: did not see the survey at all, did not have
time to respond, etc. On the other hand, some people may
be eager to respond because they had the most difficulty
with the online shopping experience.

So, in the end we may have a non random sample.


Example 4
A local TV company with customers in 15 towns is
considering offering high-speed internet service on its cable
lines. Before starting the new service they want to find out
whether customers would pay $50 per month that they plan
to charge. A graduate of a business school who works for
the company has prepared several alternative plans for
assessing customer demand. For each, indicate what (if
any) biases might result.
Example 4 cont.
a) Put a big advertisement in the newspaper asking people to give
their opinions on the company website.
b) Randomly select one of the towns and contact every cable
subscriber by phone.
c) Send a survey to each customer and ask them to fill it out and
return.
d) Randomly select 20 customers from each town. Send them a
survey, and follow up with a phone call if they do not return the
survey within a week.
Answers
a. Problem of voluntary response. Only those who both see
and feel strongly enough will respond.
b. One town may not be typical of all – not representative.
c. Will have selection bias.
d. This is good and unbiased.
Using Non-Random Samples
• Unintentionally
• Leads to unjustified or false conclusions

• Intentionally
• Designed to skew results on purpose
• Unethical statistical practice
Is this a biased sample
situation?
Say, CSUF is contemplating to build a new stadium. This is
fake situation I made up!!

To gauge the excitement of students about this idea, the


office of dean of students conduct a survey asking students
if they approve of the idea and the survey was distributed
to all athletes in campus.

The response was an overwhelming 99% who are for the


idea.
In This Course
We do not study the various approaches of sampling that
aim at obtaining a representative sample for doing proper
inference.

As graduates of business, you should be very critical in how


the sample or data is obtained before rushing to read the
final report.

If the sample is biased in the sense that there are non-


random errors in it, then the conclusion is SUSPECT!!
Course Roadmap
• Descriptive Statistics
• Normal Distribution
• Sampling distributions
• Inference
• Estimation
• Hypothesis testing
• Two population Tests
• Experimental design and analysis of variance (ANOVA)
• Regression

You might also like