Lesson1 - Data Definitions
Lesson1 - Data Definitions
3
The field of statistics divided into two parts:
1. Descriptive statistics:
Describe data that have been collected. Commonly used
descriptive statistics include frequency counts, ranges
(high and low scores or values), means, modes, median
scores, and standard deviations.
2. Inferential Statistics :
Generalizing from samples to populations using
probabilities. Performing hypothesis testing, determining
relationships between variables, and making predictions.
4
Definitions:
• Data:
Are observations (such as measurements,
genders, survey responses) that have been
collected.
• Variable:
Is a characteristic or attribute that can
assume (take) different values.
• Random Variable: A variable whose values
5
are determined by chance
• Population:
Is the complete collection of all elements
(scores, people, measurements, and so on)
to be studied
• Sample:
A subgroup or subset of the population.
Size N n
Mean µ
Variance σ2 S2
Standard Deviation σ S
8
Populations and Samples:
9
Let X1,X2,…,XN be the population
values (in general, they are unknown)
Categoric
al data
Nominal Ordinal
data data
Not Not
Binary Binary
binary binary
15
Nominal Data
• A type of categorical data in
which objects fall into unordered
categories.
16
Examples: Nominal Data
• Gender
– Male . Female .
• Nationality
– French , Japanese, Egyptian, Chinese,… etc
• Smoking status
– smoker, non-smoker
17
Ordinal Data
•A type of categorical
data in which order is
important.
18
Examples: Ordinal Data
• Class of degree
– 1st class, 2nd, 3rd class, fail
• Degree of illness
– none, mild, moderate, acute, chronic.
20
Quantity Data
• The objects being studied are ‘measured’
based on some quantitative trait.
21
Examples: quantity Data
• Pulse rate
• Height
• Age
• Exam marks
• Time to complete a statistics test
• Family Size
22
Quantity data can be classified as
‘Discrete or Continuous’
Quantity
data
Continuou
Discrete s
23
Discrete Data
If the values / observations belonging to it may take
only specific values[(integer) .
There are gaps between the possible values).
24
Continuous Data
If the values / observations belonging to it may
take on any value within a finite or infinite
interval (real).
25
Discrete data -- Gaps between possible values- count
0 1 2 3 4 5 6 7
Continuous data
no gaps between possible values- measure
0 1000
26
Examples: Discrete Data
• Number of children in a family
• Number of students passing a stats exam
• Number of crimes reported to the police
• Number of cars sold in a day.
Quantit
Category y
Continuou
Ordin Discret s
Nomin
al e (measurin
al (counting)
g)
Ordere
d
categori Ranks.
es
29
Interval and ratio variables
• Interval:
– Numerical data
– data can be ranked
– Data has equal intervals between data points
30
Interval and ratio variables
• Ratio:
– Numerical data
– data can be ranked
– Data has equal intervals between data points
– True zero
32
Organization and
Presentation of Data
Introduction
• After the data have been collected, the main
tasks a statistician must accomplish are the
organization and presentation of the data
• Raw data:
Data collected in original form (before it
has been organized).
• Example :
• The following data is raw data.
35
Definitions:
Class: Is quantitative or qualitative category
in which the raw data is placed .
39
Categorical Frequency Distribution
• The categorical frequency distribution is used
for data that can be placed in specific
categories, such as nominal or ordinal data.
40
Example
• The blood type of different students:
41
Example
Class Tally Frequency
A ///// 5
B ///// // 7
O ///// //// 9
AB //// 4
Total 25
42
Ungrouped Frequency Distribution
• When the range of data is small, the data must
be grouped into classes that are not more than
one unit in width.
Example
8 9 8 8 4
11 10 9 9 5
8 7 8 7 7
7 5 7 8 4
9 8 8 5 6 43
Example Cont.
• The range in the example is
R = highest value – lowest value
11 – 4 = 7
• Since the range is small, classes consisting
of single data value can be used.
44
Example.
Class Tally Frequency
4 // 2
5 /// 3
6 / 1
7 ///// 5
8 ///// // 7
9 //// 4
10 // 2
11 / 1
45
Grouped Frequency Distribution
• When the range of the data is large, the data
must be grouped into classes that are more
than one unit in width.
In this case we have additional conditions for the
classes:
1. The class width should be preferably an odd
number;
2. The classes must be equal in width.
3. The classes must be continuous.
46
Example
47
Example
Class limits Tally Frequency
1-3 ///// ///// 10
4-6 ///// ///// //// 14
7-9 ///// ///// 10
10-12 //// / 6
13-15 //// 5
16-18 //// 5
❖ n : total of frequency
❖ The interval must equal width.
❖Use for qualitative and discrete data.
❖You should cover all values and categories.
Example 2: Making a Frequency Table
The numbers of students enrolled in Western
Civilization classes at a university are given below.
Use the data to make a frequency table with
intervals.
12, 22, 18, 9, 25, 31, 28, 19, 22, 27, 32, 14
Step 1 Identify the least and greatest values.
Enrollment in Western
Step 3 List the intervals in Civilization Classes
the first column of the Number Frequency
Enrolled
table. Count the number of
data values in each interval 1 – 10 1
and list the count in the last 11 – 20 4
column. Give the table a 21 – 30 5
title. 31 – 40 2
Example:3
The number of days of Maria’s last 15 vacations are
listed below. Use the data to make a frequency table
with intervals.
4, 8, 6, 7, 5, 4, 10, 6, 7, 14, 12, 8, 10, 15, 12
Step 1 Identify the least and greatest values.
The least value is 4. The greatest value is 15.
Step 2 Divide the data into equal intervals.
56
Homework 1
For the STAT course it is found the degrees of the students are as follow
57