0% found this document useful (0 votes)
79 views32 pages

2035 CH1 Notes

Statistics is the science of collecting, organizing, summarizing, and interpreting numerical data. It begins with asking questions that can be answered with data. Data is collected and organized, often in tables or graphs, and then summarized using numerical values like averages or measures of variability. The data is then analyzed using statistical methods and interpreted. Probability concepts are also important for modeling real-world scenarios. Statistics distinguishes between parameters that describe populations and statistics that describe samples. It also categorizes variables as nominal, ordinal, interval, or ratio based on their level of measurement. An example is provided on studying business opportunities in rural India using survey data on product consumption, demographics, and preferences. Big data is also introduced as large, complex datasets

Uploaded by

kejacob629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views32 pages

2035 CH1 Notes

Statistics is the science of collecting, organizing, summarizing, and interpreting numerical data. It begins with asking questions that can be answered with data. Data is collected and organized, often in tables or graphs, and then summarized using numerical values like averages or measures of variability. The data is then analyzed using statistical methods and interpreted. Probability concepts are also important for modeling real-world scenarios. Statistics distinguishes between parameters that describe populations and statistics that describe samples. It also categorizes variables as nominal, ordinal, interval, or ratio based on their level of measurement. An example is provided on studying business opportunities in rural India using survey data on product consumption, demographics, and preferences. Big data is also introduced as large, complex datasets

Uploaded by

kejacob629
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

What is Statistics?

Statistics is the science of collecting,


organizing, summarizing and interpreting
numerical facts, which we call data

Statistics start with a question

1. What percent of students at UWO


smoke?
2. Are there differences in the number of
calories in different types of hot dogs?
3. Is there a relationship between size of an
engine and the gas mileage of an
automobile?
4. Is there any difference in the percentage
of students who work while going to
school compared to 10 years ago?
Collecting Data
• we need to keep in mind where the data
comes from and how it was collected

Organizing Data
• raw data does not tell us anything
• the data may first need to be organize in a
table

Summarizing Data
• data organized in a table may not tell us
anything
• best to display the data in a graph
(chapter 2)
• data can then be further summarized with
specific numerical values such as the
average or a number representing how
spread out the data are (chapter 3)
Analysis of the Data and Interpreting Your
Results
• Analyze the data using the appropriate
statistical method(s)
• How confident are you in the numerical
values you have calculated?
• What do they mean?
• Chapters 8, 9, 10, 11, 12, 13, 14 and 16
discuss these questions in more detail

However, prior to getting to the statistical


analysis part of the course, we must first
discuss probability

Probability rules will be discussed in


chapter 4
We then move on asking questions such as:

1. How do you mathematically model how


many electronic circuits in a large
shipment are defective?
2. What is the expected number of potholes
in a stretch of highway?
3. What percentage of soft drink cans have
a volume less than 95% of the volume
stated on the can?
4. What percentage of 10-kilogram bags of
cement have a weight greater than 10.2
kilograms?

To answer these questions, we need to know


something about probability distributions
(chapter 5, 6, 7)
Basic Statistical Concepts/Variables and
Data (sections 1.1)

What is a population?

• It is a set of all units that you are interested


in studying
• Units = people OR objects OR items of
interest

Examples
You are usually interested in studying a
certain characteristic of the population

• Any particular characteristic is called a


variable

Examples
Two Types of Variables

1. Quantitative (Measurement) Variable


• This is a variable that is assigned a
meaningful numerical value

2. Qualitative (Categorical) Variable


• This is a variable where the
characteristic can be assigned to
different categories
Example
Suppose we are interested in obtaining the
following data from a population of students.
Which of the variables are quantitative (Q) and
which are categorical (C)?

Age Gender
Weight Do you smoke
Height In what year at
Western are you
What cell phone How far is your
brand do you have parents’ home
from Western
How many text Do you currently
messages do you have a job where
send in a week you work 10 or
(approximately) more hours/week
If we examine every unit of the population
(for the variable of interest), we say we are
conducting a census of the population

• As you can imagine, many populations are


too large to study
• It would be too time consuming OR too
costly to conduct a census

Thus, it makes more sense to select and


analyze a subset (or portion) of the
population

This subset is called a sample


Once you have selected a sample from the
population you wish to study, you will want
to begin your analysis of the data by first
describing the sample data:

• Graph the sample data (chapter 2)


• Calculate some numbers that summarize
the data – these numbers are called
descriptive statistics (chapter 3)

Using the sample data to draw conclusions


about the population from which the sample
was drawn is a type of statistics called
inferential statistics (chapters 8 to 11)
Parameters vs. Statistics
Parameters
A descriptive measure of a population is
called a parameter
• Parameters are usually denoted by a
Greek letter

Examples:
μ = mean or average value of the population
σ 2 = measure of the spread of a population

Statistics
A descriptive measure of a sample is called
a statistic
• Statistics are usually denoted by
Roman letters

Examples:
x = mean or average value of the sample
s 2 = measure of the spread of a sample
In Statistical Inference
As mentioned earlier, we are often interested
in using a sample to draw conclusions (make
inferences) about the population from which
it was drawn

• Since the value of μ is often unknown,


we will end up using x as a point
estimate of μ

• Since the value of σ 2 is often unknown,


we will end up using s2 as a point
estimate of σ 2
Data Measurement: Nominal, Ordinal,
Interval and Ratio – Section 1.2

As stated before, a variable may be


qualitative (categorical) or quantitative

1. Qualitative Variables
There are two levels of measurements:

o Nominal (nominative) level


o Ordinal (ranked) level

a. Nominal Level

A nominal variable is used only for


categorizing a qualitative variable
• There is no meaningful order
b. Ordinal Level

When categorical variables are ranked in


order, the numbers assigned have meaning
• this is called the ordinal level of
measurement

For example, if a person is asked to rank


their favourite cities that they have visited in
Canada, it might be something like:

1. Vancouver (least favourite city)


2. Montreal
3. Halifax
4. Toronto
5. Calgary (favourite city)

This ranking does not have to have equal


distances between points: 1 to 2 is not the
same as 2 to 3
• Only the order is meaningful
Example
Which of the following situations are
nominal data and which are ordinal data?

• Stats course letter mark: A, B, C, D, F

• Tossing a coin: head/tail

• TV show rating: C, C8, G, PG, 14+, 18+

• Personal computer ownership: yes/no

• Restaurant rating: *****, ****, ***, **, *

• Income tax filing status: married, divorced,


common-law, separated, widowed, single

• Top ten realtors in a district

• The grocery store aisle where the soup is


2. Quantitative Variables

There are two levels of measurement:

o Interval level
o Ratio level

a. Interval Level

If you are rating quantitative variables, the


numbers have meaning AND the distances
between the points are fixed and meaningful

For example, professor ratings


• You are asked a series of questions about
your course/professor and are asked to
give a rating on the scale:

1 2 3 4 5 6 7

• The distance between 3 and 4 is the same


as the distance from 5 to 6
• However, the above scale could have been
written as

−1 0 1 2 3 4 5

• Here, the value of 0 is arbitrary; it does not


mean that the course is rated as nothing
(zero); instead it represents a rating of “not
very good”
b. Ratio Level

When a quantitative variable has a


meaningful zero value AND equal distance
between points, then the variable is at the
ratio level

An example would be grades on a test


• A grade of 0 has meaning
• The distance between a grade of 65 and 70
is the same as the distance between a
grade of 89 and 94

Another example would be possible annual


rates of return on an investment portfolio
• You could earn 9.6% in a year or 2.7% or
−3.4%, or even 0%
Case Study – The State of Business in Rural
India

This was a study done in the mid 1990’s to


determine if there were business
opportunities in rural India
• In particular, in the personal
care/household commodities market

Some background:
• India is the 2nd largest country in the world
with a population > 1 billion
• 75% of the population live in rural areas
• Yet the rural market accounts for only 1/3
of the total national sales
• Since the 1990’s, India’s rural market has
become more open for trade in consumer
goods
• This was an untapped market at the time,
offering potential for large companies to
enter the Indian market
Some Data – Rural India

Income Level

65% earn less than $574 annually


23% earn between $574 and $1146 annually

Literacy Rates

66% of women are illiterate


38% of men are illiterate

Where do these numbers come from?


Personal Care Product Consumption in
Rural India (1990-1994)

1990 1994
Toothpaste 8,825 metric 17,023 metric tons
tons
Laundry Soap 272,540 mts 422,741 mts
Bathroom 158,919 mts 231,084 mts
Soap
Shampoo 497,000 litres 2,116,000 litres

These values come from a survey of 2500


households in rural India
Other Survey Data

1. They collected information on the income


level and age of the head of the household

2. Households were also asked to rate the


likelihood of purchasing toothpaste on a
scale of 1 to 5

3. Households were also asked to rank a


variety of products in terms of which they
were likely to purchase

4. Households also provided geographic


information, such as what area of India
they were from
Big Data (section 1.3)

Big Data is defined as a collection of


large and complex datasets from
different sources that are difficult to
process using traditional data
management and processing
applications

• All data are not created in the same


way, nor do they represent the same
things

• As a result, there are at least four


characteristics or dimensions
associated with big data:
Variety
This refers to the many different
forms and sources of data

Velocity
This refers to the speed with which
the data are available and can be
processed
Veracity
This has to do with data quality, correctness,
and accuracy

Volume
This has to do with the ever-increasing size
of data and databases

Value
This is sometimes considered a fifth
characteristic; data that does not generate
value makes no contribution to an
organization
Business Analytics (section 1.4)

Business analytics refers to the application


of processes and techniques that transform
raw data into meaningful information to
improve decision-making

FIGURE 1.6
Business Analytics Add Value to Data
Categories of Business Analytics

1. Descriptive Analytics
• takes traditional data and describes what
has or is happening in a business
o Used to discover hidden
relationships and patterns
o Simplest and most commonly used
category
o Data visualization is key
o Also called reporting analytics

Topics include descriptive statistics,


frequency distributions, statistical inference,
correlation, clustering techniques, data
mining, and data visualization
2. Predictive Analytics
• finds relationships in the data that are
not readily apparent with descriptive
analytics
o Patterns or relationships are
extrapolated forward in time and the
past is used to make predictions
about the future

Topics include regression, time-series,


forecasting, simulation, data mining,
statistical modeling, machine learning
techniques, decision tree models, and neural
networks
3. Prescriptive Analytics
• examines current trends and likely
forecasts to make better decisions
o Takes uncertainty into account,
recommends ways to mitigate risks,
and tries to foresee the effects of
future decisions
o Uses a set of mathematical
techniques that determine optimal
decisions given a complex set of
objectives, requirements, and
constraints

Topics include management science or


operations research aimed at optimizing
performance of a system such as
mathematical programming, simulation, and
network analysis
Data Mining and Data Visualization (1.5)

Data Mining
This is the collecting, exploring, and
analyzing of large volumes of data to
uncover hidden patterns to enhance
decision-making
• Used by companies to turn raw data into
useful information

FIGURE 1.7 -- Process of Data Mining


Data Visualization
This is the study of the visual representation
of data and is employed to convey data or
information by imparting it as visual objects
displayed in graphics

Example
Here is some recent data of the top five
manufacturing firms to receive Canadian
Government funding

Company Name Dollars Funded

FCA Canada Inc. (Chrysler) $85,800,000

Bombardier Inc. $54,150,000

Produits Kruger $39,500,000

Sonaca Montréal Inc. $23,250,000

Hanwha L&C Canada Inc. $15,000,000


Visualization of the Above Data

Bar Chart

Bubble Chart

You might also like