0% found this document useful (0 votes)

5 views34 pages

Data Tabulation and Frequencies

The document provides an introduction to business analytics focusing on data tabulation and frequency distributions. It covers the creation of frequency tables for categorical and numeric data, including methods for constructing histograms and calculating cumulative and relative frequencies. Additionally, it discusses the importance of data visualization and techniques for summarizing data effectively.

Uploaded by

gerald.tanwh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views34 pages

Data Tabulation and Frequencies

Uploaded by

gerald.tanwh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

BT 1 1 0 1 I n t r o d u c t i o n t o B u s i n e s s

Analy tics

Data Tabulation & Frequencies

for data )
L e a r n i n g o b j e c t i ve s ( Tldr : learn to create tables

• Appreciate the importance and role of data visualization through

tabulation
• Be able to describe and summarize data using tabular techniques
(e.g. frequency tables, contingency tables)
• Be able to use and construct frequency distributions, relative
☆ frequency distributions, histogram and to compute cumulative
relative frequencies, percentiles and quartiles for a data set
Frequency Table
also known as

• Frequency distribution - a table that shows number of

observations in each of several non-overlapping groups

• Categorical variables naturally define the groups in a frequency

distribution.
✓

that un be
Minha

split into cntgoies !

Frequency Distributions for Categorical Data – An
Example
Home_Market_Value_Type

One-way frequency table for house type

House Type Frequency or Number of Observations

the
This table shows
for
number of observation

of
housing
type
.

each
in tabular form
for categorical variable
distribution
F- ✗pressing the
frequency
a

Frequency Distributions for Categorical Data – An Example

Home_Market_Value_Type

WMM
☆ All tuition
of
th number
METHOD 1
^

Using “dplyr” package, “group_by” and “summarise” functions observation hor

alk non
-
overlapping

group

METHOD 2 S Convert the table into a

dutnfnne
↓

Using Base R “table” function

t distribution
cents a frequency "

from
"
Home dataset
for the variable Type

1 Mtv's
when nut with
multiple Line a

""
inti tabulation table → until
it Mihiel
-
worth a

of uhh awh true

wasnt at combination
tht tho" the
Franny Distribution distribution for a categorical minke

ynph'm torn The frequency

1. Za in
form [ apart from tubular torn]

can also be expressed in

gnphiul

Frequency Distributions for Categorical Data – An Example

Home_Market_Value_Type

METHOD 1
Using “dplyr” package, “group_by” and “summarise” functions

the
✓
the tabulated frequency
bar from
☆ creating a plot
Plotting Frequency Distributions for Categorical Data – An
ExaHome_Market_Value_Type
mple

METHOD 2
Using Base R “table” function
Frequency Distributions for Numeric Data
• Histogram: A graphical depiction of a
frequency distribution for numerical data in
the form of a bar chart
• Terminologies:
]
• Class*: a category for grouping data [ bar is class
each a

Represent
• Frequency: Number of data values in a class [ ] by height

• Density: Relative frequency

of bar

• Upper class limit: largest value that can go in a

class
class width = 100
• Lower class limit: smallest value that can go in a ^
class
• Class width: Difference between lower class limit
of a given class and the lower class limit of the
next higher class.
• Class midpoint: Midpoint of a class
7TEur ad " '
Creating a

Histogram
• Plotting histogram using “hist”
function

{
""
"" " " " "" "" ᵗᵈ

except for :

Break parameter
H width = 1
width of
specify the
each bar
<
(default value is based on Sturges’s rule)
Sturge’s Rule:
k = 1 + 3.322(log n) (k is the number of classes; n is the size of the data)
Eg: k=1+3.322(log 42) = 5.39 --> 6

☆ The notes are quite > 1) Breaks as a sigh numhe

diff between min
data [ the
WRONG th mrp of
g.
balls = 6 1) R caudata art rent mines in dataset]

confusing and
" "

whirl is
6
the of data by the number of bins speckle
2) R divides rinse
to mid
/ bars width th minimum mine
3) R crates 6 bins of equal ,
stutiy from

hike of distant
bin will be winter and represent
tan within each
of data pink that
4) The number
for crit bin
the frequently
""

,,
" " anon

defining the breakpoints [ the eawt

10,30, so 60 so ) we are
explicitly edges of the bars]

ey . trans = cc 0
, , ,

will here bins / hw at the following

The histogram
-50 60 bo to
30,30 so
-

lo
-

from 0-10 -
, ,
intents :
,

to ditton
The width of earth bw corresponds the

tonite rite]
between consent're ballpoints [g. o - w hair is

]
th not
bar represents

Histograms for Numerical Data

at each
The taping

[ Tip a
dat for each bar
pinn
M
, so if Zhan
left
- inclusive ,
Note
: bin are
in the
of 1° it win hunter
a mhie ,

!
bin 1-30 NOT 0-10
,

Some rules of thumb:

1. Number of groups - Choose between 5 to 15 groups; more for larger n; range of each
should be equal.
2. Choose lower limit of first group (LL) as a whole number smaller than the
minimum data value and the upper limit of last group (UL) as a whole
number larger than the maximum data value.

3. Group or bar width =

↑
to determine
Formula
of our bar
the size

↑
refer to the mop of Aintree

for each bar !

it takes :
Different arguments
v

Histograms for Numerical Data

•a single number giving the number of cells
for the histogram
•a vector giving the breakpoints between
histogram cells,
•a function to compute the vector of
bar width = breakpoints,
•seq (from,to,by)

breaks = 2 breaks = 4 breaks = 6

☆
Histograms for Numerical Data of intent between
the number
Allow us to specify
Changing the tick marks on x axis with `xaxp` argument →
as defined by ✗ 1in
min mi ✗ mine
the ma

hist(Home$`House Age`, hist(Home$`House Age`,

main="Histogram of House Age", main="Histogram of House Age",
col="purple", col="purple",
xlim = c(min(Home$`House Age`)-1, max(Home$`House Age`)+1), xaxp=c(26,34,8),
xlab="House Age") xlim = c(min(Home$`House Age`)-1, max(Home$`House Age`)+1),
xlab="House Age")
A vector of the form c(x1, x2, n) giving the coordinates
of the extreme tick marks and the number of intervals
between tick-marks
numeric data
table for each group of
distribution
fretfully
to
↓
a

un htltojmn govern
☆
Wi un
>
numeric vector
the
split up
[ intents]

Histogram – to frequency distribution tables

numeric wines
Into group -1
OF
'

he seen at cntogoriml
that can

① H1 <- hist(Home$`House Age`,

main="Histogram of House Age",
col="purple",
xlim = c(min(Home$`House Age`)-1, max(Home$`House Age`)+1),
xlab="House Age") then it ran tht
that
a numeric vector
→ hint bin
>
at the the hiitynn
shows boundaries
② > H1$breaks the with the following

[1] 27 28 29 30 31 32 33 bars in the hilton innit ,

[ 27,2h )
"

[ Ur 4) ,
,

① homeagegp<-cut(Home$`House Age`,H1$breaks, include.lowest=TRUE)

[ 27,10 ) ,

32,33 ]
table(homeagegp)
[
cut(x, breaks, include.lowest = FALSE, right = TRUE,
dig.lab = 3, …) [27,28] (28,29] (29,30] (30,31] (31,32] (32,33]
22 0 0 0 14 6
x a numeric vector which is to be converted to a factor by cutting
É≤ -i%"±-
breaks either a numeric vector of two or more unique cut points or a single
number (greater than or equal to 2) giving the number of intervals into
which x is to be cut.
Include.lowest logical, indicating if an ‘x[i]’ equal to the lowest (or highest,
for right = FALSE) ‘breaks’ value should be included.
right logical, indicating if the intervals should be closed on the right (and open
on the left) or vice versa.
dig.lab integer which is used when labels are not given. It determines the
number of digits used in formatting the break numbers.
"" Ellis "" } )
"""
"

iriio
-
.

i-uua.im "
-
-

Nun
_

tuna

Th ]/
=

cut Ca bran ,

of
hi "7mm
cut
14Mt syntax
,
,
nureiu mines
taken by
of
i refers to the mp
interns Lumbini )
→
cut into
way to
: the numeric vector that you
N
divided into intervals legitimation]
thrill be
defines how the data
brant :

should be utter
honest html
whether th
logical nine indicating
inelvh.HU : a
Histogram (label and density) [ Barely
though
went
LOL
]

11
Cumulative and Relative Frequencies

• Cumulative frequency is the sum of all previous frequencies up to the

current point [ For dataset /take]
a

• Relative frequency is the proportion of observations associated with

each value (or group)
Frequency of each group

Total Number of
observations
• Cumulative Relative Frequency is the proportion of total number of
observations that fall at or below the upper limit of each group.
previous !
to the burnt pint
turn all relative frequencies up
Dainty the of
a
Cumulative and Relative Frequencies
• Compute Cumulative Frequency, Relative Frequency and Cumulative Relative
Frequency

cumulative
sum of
cunrlfry
=

rl try

= reltry + rltyz
,
+
' ' -

`T1` dataframe =

snag ,
+
É¥
-

,
t

,
!?÷ .

-7
[ fqttohttgt
=
¥m ,

( try ]
Cumsum
I

Ctrey )
Sum

T1.cumfreq <- T1 %>% dplyr:: mutate(cumfreq=cumsum(Freq), relfreq=Freq/sum(Freq), cumrelfreq=cumfreq/sum(Freq))

↓ ↓ ↓
µ * i.sent:n = % '

we mutate to cnn.mn ( Frey ) FI ==

a- th soup

variables µ sync Fry ) Total observation

add these
cncrkke
cumulative
!
to th dntatnre frequency
P l o t t i n g C u m u l a t i v e F re q u e n c y ( O g i v e )
tuition
pithy
Application or

H
creating a plot
for Chmulht'm
trying
limit train
syntax of his Hmm , plot ,
☆ Relive

`T1.cumfreq` dataframe

L
#create vector of y coordinates to plot
cumfreq1 <- c(0, T1.cumfreq$cumfreq) # start with 0
plot(H1$breaks,
cumfreq1,
xlab = "Home Age",
ylab = “Cumulative Frequency”,
main="Cumulative Frequency for Home Age")
lines(H1$breaks, cumfreq1)
Pareto Analysis

} An Italian economist, Vilfredo Pareto, observed in 1906 that a large proportion

of wealth in Italy was owned by a small proportion of people.
} Similarly, businesses often find a large proportion of sales come from a small
percentage of customers, a large percentage of quality defects stems from just
a couple of sources, or a large percentage of inventory value corresponds to a
small percentage of items
} A Pareto analysis involves sorting data and calculating cumulative proportions.
Applying the Pareto Principle Sort by
Relative Cumulative Relative
Frequencies in % Frequencies in %

About 80% of the bicycle inventory value comes from 42% (10/24) of items.
!
Quartile ] snippet
[ Percentile -
was

Percentiles
• kth percentile is a value at or below
which at least k percent of the
observations lie.
• Most common way to compute the
kth percentile is to order the data
values from smallest to largest and
calculate the rank of the kth
percentile using the formula:
Computing Percentiles
• Compute the kth percentile for a variable in sample size n
• Rank of kth percentile = nk/100 + 0.5
• n = 94; k = 90
• For the 90th percentile, rank is
= 94(90)/100+0.5 = 85.1 (round to 85)
• Value of the 85th observation
Now let’s use R to compute the 32th, 57th, 98th percentile for
Room Size
Quartiles
• Quartiles break the data into four parts.
• 25th percentile is first quartile,Q1;
• 50th percentile is second quartile, Q2;
• 75th percentile is third quartile, Q3; and
• 100th percentile is fourth quartile, Q4.
• One-fourth of the data fall below the first quartile, one-half
are below the second quartile, and three-fourths are below
the third quartile.

Let’s use R to compute the 4 quartiles for Home Size

distribution piste
normal frequency
to Jut a

ftp.k
Contingency Tables
• One of most basic statistical tool for summarizing categorical
data
• A tabular method that displays number of observations in a
data set for different subcategories of two or more categorical
variables.
• Contingency tables can accept numerical variables but grouping
variable must be categorical.
• Subcategories of variables must be mutually exclusive and
exhaustive (i.e. each observation can be classified into only one
subcategory, and, taken together over all subcategories, they
must constitute the complete data set)
to
Examples of Contingency Tables 3 ate
→ mys
tides below
these
R
① Base function

Constructing a Contingency table for 2 categorical variables

DATA: Home_Market_Value_Type_R (assigned to HomeTR)
Categorical variables } Count number of units by type and region

row var column var

② DMR

Constructing a Contingency table for 2 categorical variables

DATA: Home_Market_Value_Type_R (assigned to HomeTR)

} Count number of units by type and region using dplyr::

group_by and dplyr::summarize (or dplyr::count)

tab1 <- HomeTR %>%

group_by(Type, Region) %>%
summarise(n=n())

tab1 <- HomeTR %>%

count(Type, Region)
function
a
formatting

spread ( ) in tidyr
• Long to Wide dataset
spread() distributes the cells of the former value column across the cells of the new columns and truncates any non-key, non-value columns in a
way that prevents duplication.
Column that
Column we
Dataset contains values to
want to spread
spread against

tab1w <- tab1 %>% spread(key=Region, value=n)

A-

27
is
'M
it spree
0PM

gather ( ) in tidyr
• Wide to Long dataset
Column name that Column name to
Dataset we want to gather gather values into
the columns into

Rev_tab1w<- gather (tab1w, key=Region, value=n, -`Type`)

Column that we
don’t want to gather

28
③ Ugh rpiwttnhk Patty → needs to be installed an imported

Constructing Contingency Tables using rPivotTable

p a c ka g e

Options: Count, Count Unique Values,

List Unique Values, Sum, Integer Sum,
Average, Sum over Sum, 80% Upper
Bound, 80% Lower Bound, Sum as
Fraction of Total, Sum as Fraction of
Rows, Sum as Fraction of Columns,
Count as Fraction of Total, Count as
Fraction of Rows, Count as Fraction of
Columns

Reference: Help rpivotTable & https://cran.r-project.org/web/packages/rpivotTable/vignettes/rpivotTableIntroduction.html

Constructing Contingency Tables using rPivotTable (1) ask )
p a c ka g e
} Count number of units by type and region.

Sub-Categories of Region

Sub-Categories of Unit Type

https://cran.r-project.org/web/packages/rpivotTable/vignettes/rpivotTableIntroduction.html
Constructing Contingency Tables using rPivotTable Manipulating
p a c ka g e aggregator
} Percentage of units over total by type and region. name

Sub-Categories of Region

Sub-Categories of Unit Type

Constructing a Pivot table for 3 categorical vars
} Count number of units by type, region, and sub-region.

Sub-Categories of sub-region
Slicers
• for drilling down to “slice” a PivotTable and display a subset of data
Slicers
• for drilling down to “slice” a PivotTable and display a subset of data

Histogram Tools
No ratings yet
Histogram Tools
18 pages
CH 1
No ratings yet
CH 1
40 pages
CS 459 Chapter 2
No ratings yet
CS 459 Chapter 2
84 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
23 pages
Equency Tables and Diagrams
No ratings yet
Equency Tables and Diagrams
26 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
F (3.5)
No ratings yet
F (3.5)
23 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 2
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 2
9 pages
Chapter 02 - Fundamentals of Data Visualization
No ratings yet
Chapter 02 - Fundamentals of Data Visualization
39 pages
Mathematics Revision Guide S1 - Compressed
No ratings yet
Mathematics Revision Guide S1 - Compressed
76 pages
Introduction To Probability and Statistics Thirteenth Edition
No ratings yet
Introduction To Probability and Statistics Thirteenth Edition
30 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
48 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Math PM
No ratings yet
Math PM
17 pages
Quantitative Methods For Decision Making-1
No ratings yet
Quantitative Methods For Decision Making-1
61 pages
AEM Lecture 2
No ratings yet
AEM Lecture 2
71 pages
R Code
No ratings yet
R Code
13 pages
C. Grouped Discrete and Continuous Data (H)
No ratings yet
C. Grouped Discrete and Continuous Data (H)
7 pages
Chapter 2 - Tabular and Graphical Technique - Send
No ratings yet
Chapter 2 - Tabular and Graphical Technique - Send
59 pages
STAT 1000 - Worksheet 2
No ratings yet
STAT 1000 - Worksheet 2
14 pages
Business Statistics II
No ratings yet
Business Statistics II
32 pages
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
No ratings yet
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
45 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
37 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Probability+&+Statistics Formulas
No ratings yet
Probability+&+Statistics Formulas
47 pages
Statistics Notes Part 1
No ratings yet
Statistics Notes Part 1
26 pages
Unit 2
No ratings yet
Unit 2
18 pages
MATH 322: Probability and Statistical Methods
No ratings yet
MATH 322: Probability and Statistical Methods
27 pages
STAT 214-T241-Lab 2
No ratings yet
STAT 214-T241-Lab 2
23 pages
2035 CH2 Notes
No ratings yet
2035 CH2 Notes
42 pages
Word File For Prob and Stats
No ratings yet
Word File For Prob and Stats
25 pages
PS2 Sol
No ratings yet
PS2 Sol
11 pages
Fundamentals of Ststisitics
0% (1)
Fundamentals of Ststisitics
102 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
S1 Chp3 RepresentationsOfData
No ratings yet
S1 Chp3 RepresentationsOfData
41 pages
Principle of Biostatistic Marcello Pagano Principle & Method Richard A Jhonson & Gouri K. Bhattacharyya
No ratings yet
Principle of Biostatistic Marcello Pagano Principle & Method Richard A Jhonson & Gouri K. Bhattacharyya
45 pages
Fyybsc - CS Sem 1 FMS Journal
No ratings yet
Fyybsc - CS Sem 1 FMS Journal
43 pages
STAT 1000 - Worksheet 2
No ratings yet
STAT 1000 - Worksheet 2
14 pages
Introduction To Probability and Statistics
No ratings yet
Introduction To Probability and Statistics
30 pages
STAT201 SlideSet2 Summarization of Data Spring2023 Feb03 (Final)
No ratings yet
STAT201 SlideSet2 Summarization of Data Spring2023 Feb03 (Final)
67 pages
Summry Biostatstics
No ratings yet
Summry Biostatstics
32 pages
CE-613 - DOC - 02 Descriptive Stat, Frequency Plot
No ratings yet
CE-613 - DOC - 02 Descriptive Stat, Frequency Plot
62 pages
Frequency Distribution & Graghs
No ratings yet
Frequency Distribution & Graghs
28 pages
V2 Chapter3 Summer 2020 - 21 - Tagged
No ratings yet
V2 Chapter3 Summer 2020 - 21 - Tagged
36 pages
Chapter 2 Math
No ratings yet
Chapter 2 Math
19 pages
Summation FD MCT
No ratings yet
Summation FD MCT
77 pages
Statistics: Afrah Umran
No ratings yet
Statistics: Afrah Umran
27 pages
CH 02
No ratings yet
CH 02
38 pages
2.fundamentals of Ststisitics
No ratings yet
2.fundamentals of Ststisitics
126 pages
Stat 101
100% (4)
Stat 101
25 pages
Mathematical Statistics: Instructor: Dr. Deshi Ye
No ratings yet
Mathematical Statistics: Instructor: Dr. Deshi Ye
42 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
Math
No ratings yet
Math
138 pages
STAT 1000 - Worksheet 2
No ratings yet
STAT 1000 - Worksheet 2
14 pages
Frequency Distribution
100% (2)
Frequency Distribution
25 pages
Descriptive Lec
No ratings yet
Descriptive Lec
7 pages
Lecture (1) - Statistics
No ratings yet
Lecture (1) - Statistics
31 pages
1c - Business Letter Rules
No ratings yet
1c - Business Letter Rules
1 page
Embr 1 PDF
No ratings yet
Embr 1 PDF
32 pages
Data Structure - AVL Tree
No ratings yet
Data Structure - AVL Tree
6 pages
DLP Cot2
No ratings yet
DLP Cot2
3 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
Vlsi Term Paper Topics
100% (1)
Vlsi Term Paper Topics
7 pages
Caries Detection
No ratings yet
Caries Detection
7 pages
Germination Value A New Formula: Pinus Radiata
No ratings yet
Germination Value A New Formula: Pinus Radiata
5 pages
CCS 2124 2202 Operating Systems I Course Outline January 2025 Se
No ratings yet
CCS 2124 2202 Operating Systems I Course Outline January 2025 Se
3 pages
Lecture-3.1.5
No ratings yet
Lecture-3.1.5
14 pages
DSC / (MW/MG) Flow / (Ml/min) Exo: 330.4 J/G 133.2 °C Complex Peak: Area: Peak
No ratings yet
DSC / (MW/MG) Flow / (Ml/min) Exo: 330.4 J/G 133.2 °C Complex Peak: Area: Peak
1 page
Fuel and Control System - Schematic Diagram: From Neighboring Engine
100% (2)
Fuel and Control System - Schematic Diagram: From Neighboring Engine
1 page
Kruger Ventilation Industries Pte LTD: A B C D N°xØ
No ratings yet
Kruger Ventilation Industries Pte LTD: A B C D N°xØ
1 page
Breccia Types: Hydrothermal, Fault, Volcanic, ETC: June 2016
No ratings yet
Breccia Types: Hydrothermal, Fault, Volcanic, ETC: June 2016
40 pages
Intership
No ratings yet
Intership
40 pages
Dissertation Kant
100% (2)
Dissertation Kant
15 pages
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
No ratings yet
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
7 pages
General Physics 1: Phys100
No ratings yet
General Physics 1: Phys100
20 pages
2 Resume
No ratings yet
2 Resume
2 pages
LAB Manual-1
No ratings yet
LAB Manual-1
47 pages
The Leadership Chellenge
100% (1)
The Leadership Chellenge
9 pages
Standardization For Oil and Gas Sector: S.M. Bhatia Deputy Director General Bureau of Indian Standards
No ratings yet
Standardization For Oil and Gas Sector: S.M. Bhatia Deputy Director General Bureau of Indian Standards
41 pages
Ministry of Corporate Affairs: Only For Pay Later Payment. Not For Payment at Branch Counter E-Challan For Paying Later
No ratings yet
Ministry of Corporate Affairs: Only For Pay Later Payment. Not For Payment at Branch Counter E-Challan For Paying Later
2 pages
2annual Student Outcome Goal Plan
No ratings yet
2annual Student Outcome Goal Plan
4 pages
Printpix CX 400
No ratings yet
Printpix CX 400
53 pages
0 387 28942 9
No ratings yet
0 387 28942 9
703 pages
2024 NEW Myg Catalogue
No ratings yet
2024 NEW Myg Catalogue
8 pages
Octavia Manual Running Gear Part4
No ratings yet
Octavia Manual Running Gear Part4
136 pages
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
No ratings yet
The Motor Spirit and High Speed Diesel (Regulation of Supply and Distribution and Prevention of M - 0
32 pages
Sony kv-27fs13 27fs17 27fv17 29fv17-c 32fs13 32fs17 34fs13c 34fs17 CH Ba-5
No ratings yet
Sony kv-27fs13 27fs17 27fv17 29fv17-c 32fs13 32fs17 34fs13c 34fs17 CH Ba-5
299 pages

Data Tabulation and Frequencies

Uploaded by

Data Tabulation and Frequencies

Uploaded by

BT 1 1 0 1 I n t r o d u c t i o n t o B u s i n e s s

Data Tabulation & Frequencies

• Appreciate the importance and role of data visualization through

• Frequency distribution - a table that shows number of

• Categorical variables naturally define the groups in a frequency

split into cntgoies !

One-way frequency table for house type

House Type Frequency or Number of Observations

Frequency Distributions for Categorical Data – An Example

Using “dplyr” package, “group_by” and “summarise” functions observation hor

METHOD 2 S Convert the table into a

Using Base R “table” function

of uhh awh true

ynph'm torn The frequency

can also be expressed in

Frequency Distributions for Categorical Data – An Example

• Density: Relative frequency

• Upper class limit: largest value that can go in a

☆ The notes are quite > 1) Breaks as a sigh numhe

defining the breakpoints [ the eawt

will here bins / hw at the following

Histograms for Numerical Data

Some rules of thumb:

3. Group or bar width =

for each bar !

Histograms for Numerical Data

breaks = 2 breaks = 4 breaks = 6

hist(Home$`House Age`, hist(Home$`House Age`,

Histogram – to frequency distribution tables

① H1 <- hist(Home$`House Age`,

[1] 27 28 29 30 31 32 33 bars in the hilton innit ,

① homeagegp<-cut(Home$`House Age`,H1$breaks, include.lowest=TRUE)

• Cumulative frequency is the sum of all previous frequencies up to the

• Relative frequency is the proportion of observations associated with

T1.cumfreq <- T1 %>% dplyr:: mutate(cumfreq=cumsum(Freq), relfreq=Freq/sum(Freq), cumrelfreq=cumfreq/sum(Freq))

we mutate to cnn.mn ( Frey ) FI ==

variables µ sync Fry ) Total observation

} An Italian economist, Vilfredo Pareto, observed in 1906 that a large proportion

Let’s use R to compute the 4 quartiles for Home Size

Constructing a Contingency table for 2 categorical variables

row var column var

Constructing a Contingency table for 2 categorical variables

} Count number of units by type and region using dplyr::

tab1 <- HomeTR %>%

tab1 <- HomeTR %>%

tab1w <- tab1 %>% spread(key=Region, value=n)

Rev_tab1w<- gather (tab1w, key=Region, value=n, -`Type`)

Constructing Contingency Tables using rPivotTable

Options: Count, Count Unique Values,

Reference: Help rpivotTable & https://cran.r-project.org/web/packages/rpivotTable/vignettes/rpivotTableIntroduction.html

Sub-Categories of Unit Type

Sub-Categories of Unit Type

You might also like