Lecture 2
Lecture 2
There are two types of data sets you will use when studying
statistics. These data sets are called populations and samples.
Solution:
The population consists of the responses of all small
business owners in the United States, and the sample
consists of the responses of the 614 small business owners
in the survey.
Data sets can consist of two types of data: qualitative data and
quantitative data.
Two data sets are shown. Which data set consists of data at
the nominal level? Which data set consists of data at the
ordinal level? Explain your reasoning. (Source: The Numbers)
Solution
The first data set lists the ranks of five movies. The data set
consists of the ranks 1, 2, 3, 4, and 5. Because the ranks can
be listed in order, these data are at the ordinal level. Note
that the difference between a rank of 1 and 5 has no
mathematical meaning.
Discrete
• Data that can only take on integer values, such as
counts.
Synonyms: integer, count
Key Terms for Data Types
Categorical
•Data that can only take on a specific set of
values.
•Example: Sex, type of chocolate, color
•Synonyms: enums, enumerated, factors, nominal,
polychotomous
Binary
•A special case of categorical with just two
categories (0/1, True, False).
•Synonyms: dichotomous, logical, indicator
Ordinal
•Categorical data that has an explicit ordering.
•Synonyms: ordered factor
Data Types
Binary data is an important special case of
categorical data that takes on only one of two
values, such as 0/1, yes/no or true/false.
Synonyms: dichotomous, logical, indicator
Ordinal
•Categorical data that has an explicit ordering.
•Synonyms: ordered factor
An example of this is a numerical rating (1, 2, 3, 4,
or 5)
Data Types
There are two basic types of structured data:
numeric and categorical.
•Dark(1)
• Milk(2)
•White (3)
Sex
•Male(0)
• Female(1)
Color
• Red(1)
• Green(2)
• Blue(3)
• Yellow(4)
Ordinal scale
With ordinal scales, it is the order of the values is what’s
important and significant, but the differences between each
one is not really known.
Take a look at the example on below. In each case, we know
that option 4 is better than option 3 or option 2, but we
don’t know–and cannot quantify–how much better it is.
For example, is the difference between “OK” and “Unhappy”
the same as the difference between “Very Happy” and
“Happy” ? We can’t say.
Ordinal scales are typically measures of non-numeric
concepts like satisfaction, happiness, discomfort, etc.
Ordinal scale example
39
PAKISTAN: ROAD TRAFFIC ACCIDENTS
Deaths = 30,046
% = 2.42 (of total death in Pakistan)
Rate = 17.12
World Rank = 95
According to the latest WHO data published in 2018 Road
Traffic Accidents Deaths in Pakistan reached 30,046 or 2.42%
of total deaths. The age adjusted Death Rate is 17.12 per
100,000 of population ranks Pakistan #95 in the world.
Review other causes of death by clicking the links below or
choose the full health profile.
Reference:https://www.worldlifeexpectancy.com/pakistan-
road-traffic-accidents
8
Road injuries killed 1.4 million people in 2016, about three-
quarters (74%) of whom were men and boys.
Basic concepts
a. Getting a two
II. P(φ) = 0
III. P(S) = 1.
Basic concepts
When the probability of an event is close to zero, the
occurrence of the event is relatively unlikely. For
example, if the chances that you will win a certain
lottery are 0.00l or one in one thousand, you
probably won’t win, unless of course, you are very
‘‘lucky.’’
FRESHMEN 4
Sophomores 6
Juniors 8
Seniors 7
TOTAL 25
Empirical Probability [2]
Frequency of E
P(E ) =
Sum of the frequencies
P(E ) = 1/4
Questions:
What happens if we toss the coin 100 times ? Will
we get 50 heads?