0% found this document useful (0 votes)

17 views56 pages

Chapter 1 - Descriptive Statistics - Frequency Distributions

The document covers data analysis and probability, focusing on descriptive statistics, statistical variables, and frequency distributions. It explains the importance of summarizing data through tables and graphs, and classifies statistical variables into categorical and numerical types. Additionally, it discusses methods for sampling, measurement levels, and presents exercises for practical application.

Uploaded by

leonor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views56 pages

Chapter 1 - Descriptive Statistics - Frequency Distributions

Uploaded by

leonor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

DATA ANALYSIS AND PROBABILITY

1. Descriptive Statistics
FREQUENCY DISTRIBUTIONS

MARIA JOÃO BRAGA | PATRÍCIA RAMOS

DATA ANALYSIS AND PROBABILITY

Agenda

1.1 – Introduction

1.2 – Statistical variables

1.3 – Frequency distributions

1.4 – Two-dimensional data

2/55
DATA ANALYSIS AND PROBABILITY

1.1 Introduction

3/55
DATA ANALYSIS AND PROBABILITY

Introduction

Descriptive statistics is a branch of statistics that

applies several techniques to describe and summarize a Graphical and
Descriptive numerical procedures
set of data. This task, crucial for great volumes of data, statistics to summarize and
materializes in building tables and charts, and in the process data
Statistics
computation of measures or indicators that represent
the information contained in the data. Statistical Use of data for
forecasts and estimates
Inference for decision making
Tables and graphs help us gain a better understanding
of data and provide visual support for improved
decision making.

4/55
DATA ANALYSIS AND PROBABILITY

Statistical process

Collect data

Define the Summarize

problem data

Infer and
decide
based on
data

5/55
DATA ANALYSIS AND PROBABILITY

Example (Newbold, p. 22): Before bringing a new

product to market, a manufacturer wants to arrive at population
some assessment of the level of demand and may
undertake a market research survey.
The manufacturer is interested in all potential buyers
(population) however, the survey is more likely to be
applied to a subset (sample).

sample
Why?

6/55
DATA ANALYSIS AND PROBABILITY

How should we select a sample?

The most common procedure for selecting a sample is

random sampling.
This procedure selects a set of 𝑛 objects from a
population in such a way that:
• each member of the population has the same
probability of being selected;
• the selection of one member does not influence the
selection of any other member.

7/55
DATA ANALYSIS AND PROBABILITY

1.2 Statistical variables

8/55
DATA ANALYSIS AND PROBABILITY

Statistical variables

Statistical variables are characteristics of interest to be

statistically studied, which are associated with the
population or with the sample. They are called like this
because they exhibit element to element variation in
the population or sample in study.
There are two ways of classifying statistical variables:
• by the type and amount of information they
contain; or
• by levels of measurement.

9/55
DATA ANALYSIS AND PROBABILITY

Classification of variables by type and amount of information

This method classifies variables into categorical or

numerical. Categorical
Categorical variables produce responses that belong to
groups or categories.
Variables
For example: gender, car brand, social class and marital
Discrete
status.
Numerical
Numerical variables are expressed by numbers and
include both discrete and continuous variables. Continuous

10/55
DATA ANALYSIS AND PROBABILITY

Discrete variables may take on values inside a finite or

countable infinite set.
For example: number of siblings, number of students
enrolled in a course and shoe size.
Categorical

Continuous variables may take on any value inside a

real interval. Variables
For example: height, weight, salary and age.
Discrete

Numerical

Continuous

11/55
DATA ANALYSIS AND PROBABILITY

Categorical

Exercise Variables

Discrete

Numerical
Classify the following variables according to their type:
Continuous

• Blood type. • Number of shares one has from a company.

• Grade in DAP's exam. • Inflation rate.
• Education degree. • Ink amount in litres, needed to paint a building.
• Nationality. • Course final grade, rounded to units.
• Medal awarded to an athlete at the Olympics. • Daily temperature, in Celsius degrees, in Nepal.
• Score given to a new beverage on a scale 1-10. • Duration of a trip, in hours and minutes.
• Occupation. • Number of people in a household.
• Production cost of a tablet.

12/55
DATA ANALYSIS AND PROBABILITY

Classification of variables by levels of measurement

The second method to classify statistical variables Nominal data is considered to be the weakest type of
distinguishes between qualitative and quantitative data. Numerical identification is used strictly for
variables. convenience and does not imply ranking of responses.
For example: gender, eye colour and occupation.
In qualitative variables there is no meaning for the Ordinal data indicate the rank ordering of items.
difference between values. This type of variable Numerical identification indicates order but the
includes nominal and ordinal measurement levels. difference between values has no meaning. For
example: social class (low, medium, high) and product
quality ranking (poor, average, good).

13/55
DATA ANALYSIS AND PROBABILITY

As for quantitative variables, they include interval and

Nominal
ratio measurement levels.
Qualitative

An interval scale indicates rank and distance from an

arbitrary zero. For example: temperature and IQ. Ordinal

Finally, a ratio scale indicates both rank and distance Variables

from a natural zero, with ratios between two measures
having meaning. For example: weight, salary and age.
Interval

Quantitative

Ratio

14/55
DATA ANALYSIS AND PROBABILITY Nominal
Qualitative

Ordinal

Exercise Variables

Interval

Classify the following variables according to their Quantitative

measurement levels: Ratio

• Blood type. • Number of shares one has from a company.

15/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 2)

Introduction
Statistical variables
Frequency Distributions
Two-dimensional data

Chapter 1 - Proposed exercise 2: Consider the following information about

the 10 richest American citizens, published on Forbes magazine in 2019:
Consider the following information about the 10 richest
American citizens, published in Forbes magazine in Rank Net worth Age Marital status Company

2019:
Je↵ Bezos 1 114.0 56 Divorced Amazon
William Gates 2 106.0 64 Married Microsoft

a) In this dataset, how many observations do we have?

Warren Bu↵ett 3 80.8 89 Married Berkshire Hathaway
Mark Zuckerberg 4 69.6 35 Married Facebook

b) And how many statistical variables?

Larry Ellison 5 65.0 75 Divorced Oracle
Larry Page 6 55.5 46 Married Google

c) How do you classify each variable?

Sergey Brin 7 53.5 46 Married Google
Michael Bloomberg 8 53.4 77 Divorced Bloomberg
Steve Balmer 9 51.7 63 Married Microsoft
d) For each variable, what was the measurement scale Jim Walton 10 51.6 71 Married Walmart
used?

(a) In this dataset, how many observations do we have?

(b) And how many statistical variables?
(c) How do you classify each variable? 16/55
(d) For each variable, what was the measurement scale used?
DATA ANALYSIS AND PROBABILITY

1.3 Frequency distributions

CATEGORICAL DATA. NUMERICAL DATA.

17/55
DATA ANALYSIS AND PROBABILITY

Frequency distributions
CATEGORICAL DATA

A frequency distribution is a table used to organize Example: Favorite airline of a group of 552 individuals
data.
Airline Absolute frequency Relative frequency
The first column includes all the possible values of the TAP 178 0.322
variable.
Iberia 45 0.082
The absolute frequency is the number of elements in KLM 20 0.036
each category. Easyjet 200 0.362
The relative frequency is the percentage of elements Air France 38 0.069
in each category. It is obtained by dividing the absolute Vueling 71 0.129
frequency by 𝑛. Total 552 1

18/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of categorical data

Categorical data is usually represented by two types of 250

charts: 200
13%

TAP
7% 32%
150 Ibéria
Bar charts – when one wishes to draw attention to the KLM

absolute frequency in each category 100 Easyjet

Air France
Vueling
50 36% 8%

Pie charts – when the goal is to give more emphasis to

0
the proportion of frequencies in each category TAP Ibéria KLM Easyjet Air France Vueling

19/55
DATA ANALYSIS AND PROBABILITY

“Although the absolute

numbers make it look like the
most critical situation is in
Lisbon, with 3150 confirmed
cases, this data might not be
the best indicator to mirror
the real impact of the disease.”
https://www.publico.pt/interactivo/como-
esta-evoluir-pandemia-covid19-onde-vivo#/

The choice of the chart is crucial!

20/55
DATA ANALYSIS AND PROBABILITY

Misleading graphics

https://twitter.com/partidochega/status/1617176950002401281?s=48&t=OqUQ_cFYUTeO5bhMDkFbLA

21/55
DATA ANALYSIS AND PROBABILITY

Misleading graphics

3,5

2,5

1,5

0,5

0
Média EU Portugal

22/55
DATA ANALYSIS AND PROBABILITY

Frequency distributions
NUMERICAL DATA – DISCRETE VARIABLES

While dealing with numerical data, it is also possible to Example: Number of cell phones purchased in the last
compute the cumulative frequencies. two years.
N. cell phones Abs.
Abs. freq.
freq Rel. freq. Abs. cum. freq. Rel. cum. freq.

The absolute cumulative frequency, is the number of 0 18 0.090 18 0.090

observations smaller than or equal to a given value of 1 39 0.195 57 0.285
the variable. 2 52 0.260 109 0.545
The relative cumulative frequency, is the percentage of 3 67 0.335 176 0.880
observations smaller than or equal to a given value of 4 24 0.120 200 1
the variable. Total 200 1

23/55
DATA ANALYSIS AND PROBABILITY

Some notation
𝑿𝒊 Absolute frequency - 𝒏𝒊 Relative frequency – 𝒇𝒊 Abs. cumulative frequency - 𝑵𝒊 Rel. cumulative frequency - 𝑭𝒊

𝑋" 𝑛" 𝑓" 𝑁" 𝐹"

𝑋# 𝑛# 𝑓# 𝑁# 𝐹#

… … … … …

𝑋$ 𝑛$ 𝑓$ 𝑁$ 𝐹$

… … … … …

𝑋% 𝑛% 𝑓% 𝑁% 𝐹%

Total 𝒏 1

𝑘 - number of categories (or groups) of 𝑋

24/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of simple frequencies

0
0 1 2 3 4

Bar chart of the number of cell phones purchased.

25/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of cumulative frequencies

How many individuals purchased 2 or less cell phones?

𝑿𝒊 𝒏𝒊 𝒇𝒊 𝑵𝒊 𝑭𝒊
How many purchased 2.5 or less cell phones? 0 18 0.090 18 0.090
1 39 0.195 57 0.285
How many purchased 2.8 or less cell phones? 2 52 0.260 109 0.545
3 67 0.335 176 0.880
How many purchased 3 or less cell phones?
4 24 0.120 200 1
Total 200 1
How many purchased 3.1 or less cell phones?

How many purchased 4.7 or less cell phones?

26/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of cumulative frequencies

250

200 𝑿𝒊 𝒏𝒊 𝒇𝒊 𝑵𝒊 𝑭𝒊
0 18 0.090 18 0.090
150
1 39 0.195 57 0.285
2 52 0.260 109 0.545
100
3 67 0.335 176 0.880
4 24 0.120 200 1
50
Total 200 1

0
-3 -1 1 3 5 7

Absolute cumulative frequency distribution

27/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 6)

It was asked to 360 individuals about the number of a) Considering this data, state the maximum number of
banks they have an account in. Based on this banks that an individual has an account in.
information, the following bar chart was built:

Number of banks an individual has an account in

43.33%

30.83%

13.89%

6.67%
2.78% 2.50%

1 2 3 4 5 6

28/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 6)

It was asked to 360 individuals about the number of b) Create a table with both the absolute and relative
banks they have an account in. Based on this frequencies. Add to your table the cumulative values.
information, the following bar chart was built:

Number of banks an individual has an account in

43.33%

30.83%

13.89%

6.67%
2.78% 2.50%

1 2 3 4 5 6

29/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 6)

It was asked to 360 individuals about the number of c) Graphically represent the relative cumulative
banks they have an account in. Based on this frequency.
information, the following bar chart was built:

Number of banks an individual has an account in

43.33%

30.83%

13.89%

6.67%
2.78% 2.50%

1 2 3 4 5 6

30/55
DATA ANALYSIS AND PROBABILITY

Frequency distributions
NUMERICAL DATA – CONTINUOUS VARIABLES

While dealing with continuous variables1 it is very By grouping data into classes, we lose its individuality,
frequent to group the data into classes. but we expect gains in terms of interpretation.

This is done because the possible values of a

continuous variable are infinite and classifying them
individually would be useless and little informative.

1It is also common to group data if we are dealing with a discrete variable with many different values.

31/55
DATA ANALYSIS AND PROBABILITY

Example (Newbold, p. 42): The following data set refers

to the time (in seconds) a group of 110 employees took
to complete a task.

271 236 294 252 254 263 266 222 262 278 288
262 237 247 282 224 263 267 254 271 278 263
262 288 247 252 264 263 247 225 281 279 238
How should we group this data set?
252 242 248 263 255 294 268 255 272 271 291
263 242 288 252 226 263 269 227 273 281 267
263 244 249 252 256 263 252 261 245 252 294
288 245 251 269 256 264 252 232 275 284 252
263 274 252 252 256 254 269 234 285 275 263
263 246 294 252 231 265 269 235 275 288 294
263 247 252 269 261 266 269 236 276 248 299

32/55
DATA ANALYSIS AND PROBABILITY

One way of answering the previous question is using Example (Newbold, p. 42): Time to complete a task.
the Sturges’ rule.
This is a practical rule to find an appropriate number of 271 236 294 252 254 263 266 222 262 278 288
classes, 𝑘, and the width of each class, ℎ: 262 237 247 282 224 263 267 254 271 278 263
262 288 247 252 264 263 247 225 281 279 238
252 242 248 263 255 294 268 255 272 271 291
𝑘 = 𝐼 log ! 𝑛 +1
263 242 288 252 226 263 269 227 273 281 267
263 244 249 252 256 263 252 261 245 252 294
∆ 288 245 251 269 256 264 252 232 275 284 252
ℎ=𝐼 + 1,
# 263 274 252 252 256 254 269 234 285 275 263
263 246 294 252 231 265 269 235 275 288 294
where: 𝐼(𝑥) stands for the integer part of 𝑥, and Δ 263 247 252 269 261 266 269 236 276 248 299
stands for the difference between the maximum and
minimum observations on the data set. 𝑘 = 𝐼 log ! 110 + 1 = 𝐼 6.78 + 1 = 7 7 classes

299 − 222
ℎ=𝐼 + 1 = 12 width 12
7

33/55
DATA ANALYSIS AND PROBABILITY

Example (Newbold, p. 42): Time to complete a task. After grouping the values, the individuality of each
observation is lost.

We will therefore assume that the all the observations

Classes ni fi Ni Fi are uniformly distributed within each class.
[222,234[ 7 0,064 7 0,064
[234,246[ 11 0,100 18 0,164 For example, what is the percentage of employees that
[246,258[ 30 0,273 48 0,436 completed the task in less than 250 seconds?
[258,270[ 32 0,291 80 0,727
[270,282[ 15 0,136 95 0,864
[282,294[ 9 0,082 104 0,945 250 − 246
[294,306] 6 0,055 110 1,000 0.164 + 0.273 × = 0.255 = 25.5%
258 − 246
Total 110 1

34/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of simple frequencies

0,35

The most common chart used to plot numerical 0,3

continuous data is called histogram. 0,25

Unlike bar charts, the width of the classes of the 0,2

histogram is not irrelevant: the areas of the rectangles

should be proportional to the frequencies.
0,15

If all the classes have the same width, this requirement 0,1

is automatically satisfied; if not, it is necessary to make 0,05

an adjustment to the heights of the rectangles. 0

[222, 234[ [234, 246[ [246, 258[ [258, 270[ [270, 282[ [282, 294[ [294, 306]

Histogram of relative frequencies

35/55
DATA ANALYSIS AND PROBABILITY

If all the classes have the same width, we have that:

• Area of the 1st rectangle: 12×0.064 = 0.768 0,35

• Area of the 2nd rectangle: 12×0.100 = 1.2 0,3

• Area of the 3rd rectangle: 12×0.273 = 3.276 0,25 0,273

0,291

• … 0,2

• Total area: 12×(0.064 + 0.1 + … ) = 12 0,15

0,136
0,1
0,1
The areas of the rectangles are proportional to the 0,082
0,05 0,064
frequencies: 0,055

0
!.!#$ !.&#' !.% %.(
, …
[222, 234[ [234, 246[ [246, 258[ [258, 270[ [270, 282[ [282, 294[ [294, 306]

= , =
% %( % %(

36/55
DATA ANALYSIS AND PROBABILITY

If, for some reason, the first two classes were Using the new frequencies to build a “histogram” we
aggregated, we would obtain a new frequency get:
distribution:
0,35

Classes fi 0,3

Classes fi
[222,234[ 0,064
[222,246[ 0,164 0,25

[234,246[ 0,100
[246,258[ 0,273
[246,258[ 0,273 0,2

[258,270[ 0,291
[258,270[ 0,291
[270,282[ 0,136 0,15

[270,282[ 0,136
[282,294[ 0,082
[282,294[ 0,082 0,1
[294,306] 0,055
[294,306] 0,055
Total 1 0,05
Total 1
0
[222, 246[ [246, 258[ [258, 270[ [270, 282[ [282, 294[ [294, 306]

37/55
DATA ANALYSIS AND PROBABILITY

Let us see why this is not a histogram:

0,35

• Area of the 1st rectangle: 24×0.164 = 3.936

0,3

• Area of the 2nd rectangle: 12×0.273 = 3.276 0,291

0,273
0,25

• Area of the 3rd rectangle: 12×0.291 = 3.492

• … 0,2

• Total area: 24×0.164 + 12×(0.273 + ⋯ ) = 13.98 0,15

0,164
0,136
0,1

The areas of the rectangles are not proportional to the 0,05

0,082

frequencies: 0,055

0.164 3.936 0
[222, 246[ [246, 258[ [258, 270[ [270, 282[ [282, 294[ [294, 306]
≠
1 13.98

38/55
DATA ANALYSIS AND PROBABILITY

The previous chart is not a histogram because it violates

its assumptions 0,03

0,025

Classes with twice the width should have a height equal

to half of its frequency to preserve the proportions in 0,02

the chart.
0,015

In general, if we are dealing with classes of different 0,01

widths, histograms should be built using frequency

densities (absolute or relative): 0,005

$! &!
or 0
%! %! [222, 246[ [246, 258[ [258, 270[ [270, 282[ [282, 294[ [294, 306]

39/55
DATA ANALYSIS AND PROBABILITY

Property:
Proof:

A histogram plotted with the absolute frequency

densities has an area equal to 𝑛. 𝑛& 𝑛' 𝑛(
𝐴 !"#$% = ℎ& × + ℎ' × + … + ℎ( ×
ℎ& ℎ' ℎ(
(
𝑛)
= ( ℎ) ×
ℎ)
)*&
(

= ( 𝑛)
)*&
=𝑛

40/55
DATA ANALYSIS AND PROBABILITY

Property: Proof:

A histogram plotted with the relative frequency

densities has an area equal to 1. 𝑓& 𝑓' 𝑓(
𝐴 !"#$% = ℎ& × + ℎ' × + … + ℎ( ×
ℎ& ℎ' ℎ(
(
𝑓)
= ( ℎ) ×
ℎ)
)*&
(

= ( 𝑓)
)*&
=1

41/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of simple frequencies

The frequency polygon is a polygonal line obtained by 0.35

joining the centres of the tops of the histogram bars 0.3

and also the centre of the “imaginary” classes created 0.25

at the start and at the end of the histogram. 0.2

0.15

0.1

0.05

0
[210, 222[ [222, 234[ [234, 246[ [246, 258[ [258, 270[ [270, 282[ [282, 294[ [294, 306] ]306, 318[

42/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of simple frequencies

The frequency polygon is particularly useful if we wish 0.4

to compare two distributions using the same chart. 0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
[210, 222[ [222, 234[ [234, 246[ [246, 258[ [258, 270[ [270, 282[ [282, 294[ [294, 306] ]306, 318[

43/55
DATA ANALYSIS AND PROBABILITY

Graphical representation of cumulative frequencies

The cumulative frequencies of continuous variables are 120

graphically represented by a chart called ogive. 100

Classes Ni 80

[222,234[ 7
60
[234,246[ 18
[246,258[ 48 40

[258,270[ 80 20
[270,282[ 95
0
[282,294[ 104 222 234 246 258 270 282 294 306
[294,306] 110

44/55
DATA ANALYSIS AND PROBABILITY

Summary - Graphical representation of statistical variables

Simple Bar chart

Categorical
frequencies Pie chart

Simple Bar chart

frequencies Pie chart
Discrete
Cumulative
Cumulative
frequency
frequencies
distribution
Numerical
Simple Histogram
frequencies Frequency polygon
Continuous
Cumulative
Ogive
frequencies

45/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 11)

We asked to 300 retail employees (who work in a) Find the missing information. Justify.
downtown Lisbon) on the effective number of hours b) Plot the histogram of area 1.
worked per week. The table below presents the results.

Classes ci ni fi Ni Fi
[a, 20[ 15 4 i 4 0.013
[20, 35[ 27.5 f 0.050 19 o
[35, 40[ e 101 0.337 120 p
[40, b[ 45 g j m 0.800
[50, 60[ 55 50 k n q
[c, d] 65 h l 300 1
Total 300 1

46/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 11)

We asked to 300 retail employees (who work in c) Clarify the following questions, made by an
Downtown, Lisbon) on the effective number of hours amateur:
worked per week. The table below presents the results. i. From the above table can I check the values for
all the 300 employees?
Classes ci ni fi Ni Fi
[a, 20[ 15 4 i 4 0.013 ii. Where are placed the employees that answered
[20, 35[ 27.5 f 0.050 19 o 40 hours?
[35, 40[ e 101 0.337 120 p iii. Where are placed the employees that answered
[40, b[ 45 g j m 0.800 d hours and one second?
[50, 60[ 55 50 k n q
[c, d] 65 h l 300 1 iv. How do the observations behave within each
Total 300 1 class?

47/55
DATA ANALYSIS AND PROBABILITY

1.4 Two-dimensional data

48/55
DATA ANALYSIS AND PROBABILITY

Two-dimensional data

There are situations in which we need to describe

relationships between two variables.
In those cases, each element of the data set is evaluated
regarding both, thus, it is represented by a pair of
observations, one for each variable.
For example, an individual can be asked about his age
and height. In this case, the individual is represented by
a pair of numbers: one corresponding to his age, and
the other to his height.

49/55
DATA ANALYSIS AND PROBABILITY

While dealing with numerical raw data, the observations Example: Height (in m) and weight (in kg) of a group of
are usually listed side by side. 10 individuals.

As for its graphical representation, it is usual to use a

scatter plot. This graphic uses the pairs of observations 95

as coordinates and each observation is represented by 90

a point 𝑥' , 𝑦' , 𝑖 = 1, … , 𝑛. 85

Weight (kg)
75

50
1,5 1,55 1,6 1,65 1,7 1,75 1,8 1,85 1,9 1,95
Height (m)

50/55
DATA ANALYSIS AND PROBABILITY

While dealing tabulated data, either categorical or Example (Newbold, p. 30): The following cross-table
numerical, two-dimensional data is usually organized in contains data from a survey on health and nutrition of
a cross-table (or contingency table) which lists the the U.S. population, conducted in 2005. The table
number (or percentage) of observations for each contains information on the gender and activity level
possible combination of the values of the two variables. of a group of 4460 individuals.
The values inside a cross-table are the joint absolute
frequencies (𝒏𝒊𝒋 ) or the joint relative frequencies (𝒇𝒊𝒋 )
as they refer to counts or percentages, respectively.
Male Female
Sedentary 957 1226
Active 340 417
Very active 842 678
Joint frequency distribution of gender and activity level.

51/55
DATA ANALYSIS AND PROBABILITY

By adding the frequencies in each row and column we The graphical representation of cross-tables is usually
get the marginal frequencies for each variable. made by a component bar chart or by a cluster bar
chart.

Male Female Total 2500 1400

1200
2000 678

Sedentary 957 1226 2183 1500

842 1000

417 800

Active 340 417 757

340
1000 600 1226
957
400 842
1226 678
500 957

Very active 842 678 1520 0

200

0
340 417

Male Female Male Female

Total 2139 2321 4460 Sedentary Active Very active Sedentary Active Very active

Component bar chart Cluster bar chart

52/55
DATA ANALYSIS AND PROBABILITY

We can also analyse the conditional frequencies. They For example, using the previous data set, we can
are obtained by dividing the joint frequency by the answer questions like:
marginal frequency.
• What is the percentage of men who are sedentary?

Male Female Total 957

= 0.477
2139
Sedentary 957 1226 2183
Active 340 417 757
• What is the percentage women inside the active
Very active 842 678 1520 group?
Total 2139 2321 4460
417
= 0.551
757

53/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 17)

Introduction
Statistical variables
Frequency Distributions
Two-dimensional data
A real estate agency wants to analyze the number of Classify each of the following sentences into True or False,
ter 1 years of experience
- Proposed exerciseof17:their agents
A real (X) agency
estate and thewants
number quantifying
to analyze the your justification.
ers ofofyears
houses they sold
of experience of last
theirmonth (Y).) The
agents (X information
and the number of houses
regarding the 100 agents working in the agency is a) 10% of the agents work in the agency for more than 6
old last month (Y ). The information regarding the 100 agents working in
summarized in the next table. years.
gency is summarized in the next table.
Sales (Y ) b) 20% of the agents work in the agency for more than 4
Experience (X )
[0, 2] ]2, 4] ]4, 6] ]6, 8] years and sold more than 6 houses in the last month.
[0, 2] 4 6 8 7
]2, 4] 2 6 10 17
]4, 6] 3 6 9 12
]6, 8] 1 2 3 4

fy each of the following sentences into True or False, quantifying your

cation.
10% of the agents work in the agency for more than 6 years.
20% of the agents work in the agency for more than 4 years and sold 54/55
DATA ANALYSIS AND PROBABILITY

Exercise (chapter 1, proposed exercise 17)

Introduction
Statistical variables
Frequency Distributions
Two-dimensional data
A real estate agency wants to analyze the number of c) Among the agents that sold more than 6 houses in the
ter 1 years of experience
- Proposed exerciseof17: their agents
A real (X) agency
estate and thewants
number
to analyze last
the month, 5% of them work in the agency for more
ers ofofyears
houses they sold
of experience of last
theirmonth (Y).) The
agents (X information
and the number of housesthan 6 years.
regarding the 100 agents working in the agency is
old last month (Y ). The information regarding the 100 agents working in
summarized in the next table. d) There is no relationship between the experience and the
gency is summarized in the next table.
performance of the agents.
Sales (Y )
Experience (X )
[0, 2] ]2, 4] ]4, 6] ]6, 8]
[0, 2] 4 6 8 7
]2, 4] 2 6 10 17
]4, 6] 3 6 9 12
]6, 8] 1 2 3 4

fy each of the following sentences into True or False, quantifying your

cation.
10% of the agents work in the agency for more than 6 years.
20% of the agents work in the agency for more than 4 years and sold 55/55
DATA ANALYSIS AND PROBABILITY

https://www.gapminder.org/fw/world-health-chart/

56/55

Quantum Mechanics. Theory and Experiment PDF
100% (2)
Quantum Mechanics. Theory and Experiment PDF
529 pages
Statistics 1 - Descriptive Statistics
No ratings yet
Statistics 1 - Descriptive Statistics
130 pages
STA642 Handouts Topic 1 To 187 by Mahar Afaq Safdar Muhammad
No ratings yet
STA642 Handouts Topic 1 To 187 by Mahar Afaq Safdar Muhammad
1,739 pages
Biostatistics A Foundation For Analysis in The Health Sciences 9th Edition Wayne W. Daniel Instant Download
100% (1)
Biostatistics A Foundation For Analysis in The Health Sciences 9th Edition Wayne W. Daniel Instant Download
51 pages
Chapter1 StatisticsDeskriptive
No ratings yet
Chapter1 StatisticsDeskriptive
74 pages
Introductory Probability Theory PDF
100% (6)
Introductory Probability Theory PDF
299 pages
MAT 361 Lecture 15 16
No ratings yet
MAT 361 Lecture 15 16
40 pages
First Year Computer Engg. 1
No ratings yet
First Year Computer Engg. 1
38 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
EA311 Lecture Note One
No ratings yet
EA311 Lecture Note One
33 pages
Mathematics Syllabus
No ratings yet
Mathematics Syllabus
33 pages
CE320 Module 14
No ratings yet
CE320 Module 14
29 pages
Variables & Chart
No ratings yet
Variables & Chart
60 pages
STA132 Complete Note
No ratings yet
STA132 Complete Note
110 pages
Lecture 1 Statistics and Lecture2
No ratings yet
Lecture 1 Statistics and Lecture2
44 pages
VW - 10131 - en
100% (1)
VW - 10131 - en
64 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
Statistics-Introduction - 9-11-2021
100% (1)
Statistics-Introduction - 9-11-2021
97 pages
Lecture 1
No ratings yet
Lecture 1
94 pages
Mse1 Stat Class
No ratings yet
Mse1 Stat Class
81 pages
Statistics - Review
No ratings yet
Statistics - Review
57 pages
MATH2203 Statistics I - Week 1
No ratings yet
MATH2203 Statistics I - Week 1
27 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
Statistics Note 1to 4 2
No ratings yet
Statistics Note 1to 4 2
25 pages
Topic 1 Descriptive Statistics SV
No ratings yet
Topic 1 Descriptive Statistics SV
113 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
STA132 Lecture Notes - 1
No ratings yet
STA132 Lecture Notes - 1
6 pages
Descriptive Statistics I
No ratings yet
Descriptive Statistics I
44 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
Engineering Math Class Note II-1
No ratings yet
Engineering Math Class Note II-1
26 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
Business Statistics: Prof. Lancelot JAMES
No ratings yet
Business Statistics: Prof. Lancelot JAMES
103 pages
Lect 1 Descriptive Statistics
No ratings yet
Lect 1 Descriptive Statistics
38 pages
Lecture 1
No ratings yet
Lecture 1
39 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
30 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Topic 1 Introduction To Statistics
No ratings yet
Topic 1 Introduction To Statistics
35 pages
7a1a96f31c748dbb0763fa4427dffe7b
No ratings yet
7a1a96f31c748dbb0763fa4427dffe7b
66 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Normal and Lognormal Data Distribution in Geochemistry
100% (1)
Normal and Lognormal Data Distribution in Geochemistry
2 pages
1-Branches of Statistics, Types of Variables
No ratings yet
1-Branches of Statistics, Types of Variables
18 pages
2-Lcture-Types of Variables
No ratings yet
2-Lcture-Types of Variables
32 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Unit One Graphing and Descriptive Statis-1
No ratings yet
Unit One Graphing and Descriptive Statis-1
12 pages
Normal Distribution
No ratings yet
Normal Distribution
33 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
(Ebook PDF) First Course in Statistics A 11Th Edition Download
No ratings yet
(Ebook PDF) First Course in Statistics A 11Th Edition Download
44 pages
Dynamically Managing A Profitable Email Marketing Program
No ratings yet
Dynamically Managing A Profitable Email Marketing Program
50 pages
Acc 212 Statistics Lecture Note 1
No ratings yet
Acc 212 Statistics Lecture Note 1
22 pages
Statistics Analysis With Software Application
No ratings yet
Statistics Analysis With Software Application
22 pages
Basic Statistics: Chapter One
No ratings yet
Basic Statistics: Chapter One
15 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
39 pages
Dr. Nguyen Thi Van Anh Department of Biotechnology-Pharmacology
No ratings yet
Dr. Nguyen Thi Van Anh Department of Biotechnology-Pharmacology
48 pages
Kosi
No ratings yet
Kosi
8 pages
Statistics Introduction
No ratings yet
Statistics Introduction
26 pages
Quantitative Analysis For Business (A)
No ratings yet
Quantitative Analysis For Business (A)
57 pages
Introduction To Data Viz Lecture 2
No ratings yet
Introduction To Data Viz Lecture 2
44 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
Ba Lecture 2
No ratings yet
Ba Lecture 2
54 pages
Statistic Reviewer
No ratings yet
Statistic Reviewer
9 pages
Mathematica Solution
No ratings yet
Mathematica Solution
38 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
CH 03
No ratings yet
CH 03
19 pages
BBA Semester 1
No ratings yet
BBA Semester 1
15 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
CH 11 - CF Estimation Mini Case Sols Word 1514ed
No ratings yet
CH 11 - CF Estimation Mini Case Sols Word 1514ed
13 pages
Stats Bio Supp. 1
No ratings yet
Stats Bio Supp. 1
11 pages
Instant Download Quantitative Investment Analysis, 4th Edition Cfa Institute PDF All Chapter
100% (1)
Instant Download Quantitative Investment Analysis, 4th Edition Cfa Institute PDF All Chapter
54 pages
Grade 11 Post Test
No ratings yet
Grade 11 Post Test
5 pages
The Nature of Probability and Statistics
No ratings yet
The Nature of Probability and Statistics
22 pages
Data Types: and Its Representation Session - 2 & 3
No ratings yet
Data Types: and Its Representation Session - 2 & 3
33 pages
Full Download From Statistical Physics To Data-Driven Modelling: With Applications To Quantitative Biology Simona Cocco PDF
100% (3)
Full Download From Statistical Physics To Data-Driven Modelling: With Applications To Quantitative Biology Simona Cocco PDF
57 pages
Statistical Techniques in Business & Q Economics: Professor: Mamdouh Hamza Ahmed
No ratings yet
Statistical Techniques in Business & Q Economics: Professor: Mamdouh Hamza Ahmed
16 pages
116news EntranceTest PHD Dec12
No ratings yet
116news EntranceTest PHD Dec12
37 pages
ECO202 Practice Notes
No ratings yet
ECO202 Practice Notes
2 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
3 pages
BCom - Second Year (External) (15 To 27) Annual Pattern
No ratings yet
BCom - Second Year (External) (15 To 27) Annual Pattern
13 pages
iQRM Warm Up Week 5 February 17 Corrected
No ratings yet
iQRM Warm Up Week 5 February 17 Corrected
39 pages
UNIVERSE-pertains The WHO
No ratings yet
UNIVERSE-pertains The WHO
4 pages
Lesson 2. Constructing Probability Distributions
No ratings yet
Lesson 2. Constructing Probability Distributions
28 pages
Sta 227: Probability and Statistics Iii: Cat 2 Question One: Elsewhere X e X X F
No ratings yet
Sta 227: Probability and Statistics Iii: Cat 2 Question One: Elsewhere X e X X F
2 pages
Statistic Matlab Example
No ratings yet
Statistic Matlab Example
7 pages
Cusat MCA Syllubus - 2012
No ratings yet
Cusat MCA Syllubus - 2012
46 pages
2D Geometric Shapes Dataset - For Machine Learning and Patte - 2020 - Data in BR
No ratings yet
2D Geometric Shapes Dataset - For Machine Learning and Patte - 2020 - Data in BR
5 pages
Statistics Statistics and Distributions On The TI-83/84
No ratings yet
Statistics Statistics and Distributions On The TI-83/84
7 pages
Practical 3
No ratings yet
Practical 3
3 pages