0% found this document useful (0 votes)

38 views31 pages

DOM105 Session 1

Uploaded by

Vidit Dixit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views31 pages

DOM105 Session 1

Uploaded by

Vidit Dixit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

DOM105 2019

Session 1
Reading: SfM Ch.2,3
Categorical and Numerical Data
Categorical data is data that is separated
into various groupings or categories for
display.
Takes form of tables, bar charts, pie
charts, etc.
Numerical data comprises of numbers that
have not been separated into categories.
Displays of numerical data include arrays,
frequency distributions, scatter plots, etc.
Both types of data can be displayed using
some types of tables such as Pivot Tables.
Summary table
Tallies the values of various categories as frequencies and
percentages for each category.
Contingency Table
Cross-tabulates, or tallies jointly, the value
of two or more categorical variables,
allowing the study of patterns. Tallies of
frequency, or percentages.
Display Categorical Data – Bar Chart

Investor's Portfolio

Savings
CD

Bonds
Stocks

0 10 20 30 40 50
Amount in K$
Pie Chart – Investor’s portfolio
Savings
15%
Stocks
42%
CD
14%

Percentages are rounded to

Bonds the nearest percent.
29%
Side by side Bar Chart
Uses sets of bars to show joint responses from two or more
categorical variables.

Comparing Investors

Savings

Bonds

Stocks

0 10 20 30 40 50 60

Investor A Investor B Investor C

Tabulating Numerical Data: Frequency Distributions

Sort raw data in ascending order:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
Frequency Distributions, Relative Frequency
Distributions and Percentage Distributions

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100
© 2002 Prentice-Hall, Inc.
Graphing Numerical Data:
The Histogram

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram

7 6
Frequency 6 5
5 4 No Gaps
4 3 Between
3 2 Bars
2
1 0 0
0
5 15 25 36 45 55 More

Class Boundaries
Class Midpoints
Tabulating Numerical Data:
Cumulative Frequency

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency % Frequency
10 but under 20 3 15
20 but under 30 9 45
30 but under 40 14 70
40 but under 50 18 90
50 but under 60 20 100
Graphing Numerical Data:
The Ogive (Cumulative % Polygon)

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Ogive

100
80
60
40
20
0
10 20 30 40 50 60

Class Boundaries (Not Midpoints)

Graphing Two Numerical Variables - Scatter Plot

Mutual Funds Scatter Plot

40
Total Year to
Date Return
(%) 30
20
10
0
0 10 20 30 40
Net Asset Values
Time-Series Plot
Numerical variable on Y-axis, associated time period on X-
axis.
Errors in visualizing data
Using “chart junk”, visual effects that distort
or distract from the data to be presented, eg:
garish graphics, irrelevant visuals.
Failing to provide a relative basis in
comparing data between groups. For
example, two separate pie charts showing the
operations of two companies does not help if
we’re trying to compare the two.
Compressing the vertical axis – using an axis
going up to 100 when the highest value is 30.
Make sure to include zero on the axes.
Measures of Central Tendency
Most sets of data show a tendency to
group around a central value, this is the
‘central tendency’.
The most common measures of central
tendency are mean, median, and mode.
The mean, also called the arithmetic
mean, is the average of all values in the
sample space.
n

X i
X1  X 2    X n
X i 1

n n
n is the size of the sample.
Median

 Robust measure of central tendency

 Not affected by extreme values
 In an ordered array, the median is
the “middle” number
 Median: (n+1)/2 ranked value.
 If n is odd, the median is the middle
number.
 If n is even, the median is the
average of the two middle numbers.
Mode
A measure of central tendency
It is the value that occurs most often
in the sample.
Not affected by extreme values
Used for either numerical or
categorical data
There may be no mode if all values
have the same frequency
There may be several modes if more
than one value are tied for the
highest frequency.
Variation and shape of data: Range
 Measure of variation
 Difference between the largest and the
Range
smallest  X Largest  X Smallest
observations:

 Ignores the way in which data are distributed

 Does not consider how the values cluster

between extremes.
Quartiles

Quartiles split data into 4 parts.
1st Quartile splits the lowest 25% of the values from the rest.

3rd Quartile splits the lowest 75% of the values from the rest.

Q2 is the median.

If the rank is a half (2.5th, 7.5th etc.) then the quartile is
average of the two values on either side. If the rank is a
fraction other than half, round to nearest integer.
Interquartile Range
Measure of variation
Also known as midspread
Spread in the middle 50%
Difference between the first and third quartiles

Not affected by extreme values

Data in Ordered Array: 11 12 13 16 16 17 17 18 21

Interquartile Range  Q3  Q1  17.5  12.5  5

Percentile
To find top xth percentile, we use same method as quartile.
List data in ascending order
xth percentile = Data in rank (n+1)x/100, where n is
number of data points.
In case of fractional value of rank, use unitary method to
find value.
Eg: 80th percentile out of 30 data points would be
31*0.8=24.8th rank.
Value would be 24th data point * 0.2 + 25th data point * 0.8
Variance
Important

measure of variation
Shows variation about the mean
Is the average of the square of the difference between an element and the
mean
n

 X X
2
i
S2  i 1
Sample variance: n 1 , is the sample mean.
N

 Xi   
2

2  i 1

N
Population variance: , µ is the population
mean.
Standard Deviation
Most important measure of variation
Shows variation about the mean
Has the same units as the original data
Is the square root of the variance
n

 X X
2
Sample standard deviation: i
S i 1

n 1
N

 Xi   
2
Population standard deviation:
 i 1

N
Why do we divide by (n-1) for sample variance?
 For a sample variance to be unbiased, the average variance for
all possible samples for a given population has to be equal to
the population variance.
 It was mathematically shown that if the sample variance was
calculated using n instead of n-1, the average variance of all
possible samples was not equal to population variance.
 This is called Bessel’s correction.
 Only used when the population mean and variance is unknown.
Shape of a Distribution
Describes how data is distributed
Measures of shape
Symmetric or skewed

Left-skewed or negative Right-skewed or positive

Symmetric
Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean
5-number summary and the Box Plot
The 5 numbers: smallest X, Q , Q , Q , largest X
1 2 3
Boxplot
Graphical display of data using 5-number summary

Median( Q2 ) Xlargest
X smallest Q Q3
1

4 6 8 10 12
Distribution Shape and the Boxplot

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2Q3 Q1 Q2 Q3
Measuring skewness
 Skewness is the measure of asymmetry in a data
distribution. One method of calculating it is adjusted Fisher
Pearson coefficient, as follows:

A symmetrical distribution like a normal distribution will

have . Negative value indicates left-skewed data, positive
indicates right-skewed.
Presence of extreme outliers can distort value of G, giving
erroneous results.
Measuring kurtosis
Kurtosis is a measure of how ‘heavy’ the tails of a data set
are, i.e., how many outliers are present, relative to a
normal distribution.

Normal distribution has kurtosis = 0. Higher kurtosis

indicates large number of outliers, lower means few
outliers.
Like skewness, extreme outliers can distort kurtosis values.
Pitfalls and Ethical Considerations
Data analysis is objective
Should report the summary measures that best meet the
assumptions about the data set
Data interpretation is subjective
Should be done in fair, neutral and clear manner

Numerical descriptive measures:

Should document both good and bad results
Should be presented in a fair, objective and neutral manner
Should not use inappropriate summary measures to distort facts

Statistics For Css
No ratings yet
Statistics For Css
73 pages
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
100% (1)
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
14 pages
Brief Reminder On Statistics - Rev0
No ratings yet
Brief Reminder On Statistics - Rev0
128 pages
Ch.2 PPT - Descriptive Stat
No ratings yet
Ch.2 PPT - Descriptive Stat
49 pages
2 - Data Analysis
No ratings yet
2 - Data Analysis
57 pages
Statistics
No ratings yet
Statistics
81 pages
Statistical Analysis 2023
No ratings yet
Statistical Analysis 2023
56 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Data Management
No ratings yet
Data Management
43 pages
FROM DR Neerja Nigam
No ratings yet
FROM DR Neerja Nigam
75 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
12 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
34 pages
Secproject - Itskillsanddataanalysis 2
No ratings yet
Secproject - Itskillsanddataanalysis 2
69 pages
PC 2 Statistics by Praveen Mathur
No ratings yet
PC 2 Statistics by Praveen Mathur
44 pages
Assignment (EMBA 502)
No ratings yet
Assignment (EMBA 502)
15 pages
Descriptive Stats
No ratings yet
Descriptive Stats
39 pages
Math 133 - Unit 10 Summary Statistics
No ratings yet
Math 133 - Unit 10 Summary Statistics
21 pages
Elementary Statistics: Davis Lazarus Assistant Professor ISIM, The IIS University
No ratings yet
Elementary Statistics: Davis Lazarus Assistant Professor ISIM, The IIS University
73 pages
Statistics 1232445944520487 1
No ratings yet
Statistics 1232445944520487 1
101 pages
Class 1
No ratings yet
Class 1
52 pages
CH 02
No ratings yet
CH 02
38 pages
MÔ TẢ BIẾN SỐ
No ratings yet
MÔ TẢ BIẾN SỐ
48 pages
Unit 4
No ratings yet
Unit 4
152 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
40 pages
CH 1 Central Tendency Class
No ratings yet
CH 1 Central Tendency Class
34 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Lecture 1: Introduction: Statistics Is Concerned With
No ratings yet
Lecture 1: Introduction: Statistics Is Concerned With
45 pages
Statistics For Managers Using Microsoft Excel: Chapter 1 (Textbook Ch1-Ch2)
No ratings yet
Statistics For Managers Using Microsoft Excel: Chapter 1 (Textbook Ch1-Ch2)
36 pages
Business Statistics - Session 1 - 3
No ratings yet
Business Statistics - Session 1 - 3
63 pages
SCSA1606 - Predictive and Advanced Analytics - Unit II
No ratings yet
SCSA1606 - Predictive and Advanced Analytics - Unit II
50 pages
Stat Quick Overview
No ratings yet
Stat Quick Overview
35 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
DOM503 Session 1
No ratings yet
DOM503 Session 1
19 pages
Lecture 2-Descriptive Statistics
No ratings yet
Lecture 2-Descriptive Statistics
74 pages
Frequency Distributions and Graphs2
No ratings yet
Frequency Distributions and Graphs2
8 pages
2 Research - 2ND QT - Week 1 - 10 14 2024
No ratings yet
2 Research - 2ND QT - Week 1 - 10 14 2024
13 pages
Methods of Data Presentation
No ratings yet
Methods of Data Presentation
10 pages
Statistics
No ratings yet
Statistics
46 pages
Week 02 Data Organizatiion and Presentaion
No ratings yet
Week 02 Data Organizatiion and Presentaion
51 pages
Data Management
No ratings yet
Data Management
31 pages
Displaying and Describing Quantitative Data
No ratings yet
Displaying and Describing Quantitative Data
49 pages
Or Lecture 202209
No ratings yet
Or Lecture 202209
21 pages
Statistics
No ratings yet
Statistics
5 pages
Basic Biostats Part
No ratings yet
Basic Biostats Part
59 pages
Lesson 01
No ratings yet
Lesson 01
52 pages
Quantitative Methods For Decision Making: Dr. Akhter
No ratings yet
Quantitative Methods For Decision Making: Dr. Akhter
100 pages
Elementary Statistics
No ratings yet
Elementary Statistics
73 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Math Unit Test Study Guide
No ratings yet
Math Unit Test Study Guide
12 pages
Principle of Biostatistic Marcello Pagano Principle & Method Richard A Jhonson & Gouri K. Bhattacharyya
No ratings yet
Principle of Biostatistic Marcello Pagano Principle & Method Richard A Jhonson & Gouri K. Bhattacharyya
45 pages
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
34 pages
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
No ratings yet
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
24 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
Population vs. Sample
100% (1)
Population vs. Sample
44 pages
Math Reviewer
No ratings yet
Math Reviewer
6 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
Modul Statistika Untuk Bisnis Dan Manajemen
No ratings yet
Modul Statistika Untuk Bisnis Dan Manajemen
11 pages
BDC Project Real Time
No ratings yet
BDC Project Real Time
14 pages
Shostakovich - 5 Info From Nelson
100% (2)
Shostakovich - 5 Info From Nelson
22 pages
Evolution of Concept of Tala
No ratings yet
Evolution of Concept of Tala
14 pages
Yanmar 4lha STP
No ratings yet
Yanmar 4lha STP
2 pages
Machining Process - I
No ratings yet
Machining Process - I
30 pages
Bingham Yield Slurry
No ratings yet
Bingham Yield Slurry
124 pages
Meq Model Questions
0% (1)
Meq Model Questions
4 pages
ASTM International Constructuring Smooth Hot Mix Asphalt 2003 PDF
100% (1)
ASTM International Constructuring Smooth Hot Mix Asphalt 2003 PDF
274 pages
A Short Analysis of Music of Bengal
100% (2)
A Short Analysis of Music of Bengal
14 pages
Stock Market Prediction Using MLP and Random Forest
No ratings yet
Stock Market Prediction Using MLP and Random Forest
18 pages
A Short Survey of Dances of India
No ratings yet
A Short Survey of Dances of India
11 pages
ASRJC H2 Chem 2021 P1 Solutions
No ratings yet
ASRJC H2 Chem 2021 P1 Solutions
29 pages
Bhakti Saints and Their Contribution To Indian Music
No ratings yet
Bhakti Saints and Their Contribution To Indian Music
11 pages
PTDLKD Final Report 2 PDFF
No ratings yet
PTDLKD Final Report 2 PDFF
60 pages
Syllabus Computer
No ratings yet
Syllabus Computer
105 pages
CMAX-DM60-CPUSEV53: Electrical Specifications
No ratings yet
CMAX-DM60-CPUSEV53: Electrical Specifications
3 pages
M1L3 LN
No ratings yet
M1L3 LN
7 pages
LD7538 LeadtrendTechnology
No ratings yet
LD7538 LeadtrendTechnology
17 pages
77 12th IT
No ratings yet
77 12th IT
19 pages
Full Download Fundamentals of Renewable Energy Processes 4th Edition Aldo Vieira Da Rosa PDF
100% (3)
Full Download Fundamentals of Renewable Energy Processes 4th Edition Aldo Vieira Da Rosa PDF
52 pages
Gravitation Test Series
No ratings yet
Gravitation Test Series
3 pages
Generalized Minimum Miscibility Pressure Correlation: SPE, Petroleum Technology Research LNST
No ratings yet
Generalized Minimum Miscibility Pressure Correlation: SPE, Petroleum Technology Research LNST
10 pages
Sara Abid
No ratings yet
Sara Abid
66 pages
Fermiones, Bosones
No ratings yet
Fermiones, Bosones
10 pages
Edexcel Magnetism 1 QP
No ratings yet
Edexcel Magnetism 1 QP
17 pages
Experiment-1 Visible Spectroscopy of Hexaaquacobalt (II) Ion
No ratings yet
Experiment-1 Visible Spectroscopy of Hexaaquacobalt (II) Ion
4 pages
Turbulent Flow Between Two Parallel Plates
No ratings yet
Turbulent Flow Between Two Parallel Plates
7 pages
Chy-222 Quiz 2
No ratings yet
Chy-222 Quiz 2
9 pages
IE210 Int. To Systems and Mathematical Modeling For Ind. Eng
No ratings yet
IE210 Int. To Systems and Mathematical Modeling For Ind. Eng
15 pages
COVID - National Resources - UP
No ratings yet
COVID - National Resources - UP
5 pages
Biology Photosynthesis A-Level OCR Notes
No ratings yet
Biology Photosynthesis A-Level OCR Notes
13 pages
CDS I 2021 Previous Year Paper: General Knowledge: WWW - Gradeup.co
No ratings yet
CDS I 2021 Previous Year Paper: General Knowledge: WWW - Gradeup.co
26 pages
CDS I 2021 Previous Year Paper: English: WWW - Gradeup.co
No ratings yet
CDS I 2021 Previous Year Paper: English: WWW - Gradeup.co
30 pages
Quest For A COVID-19 Cure by Repurposing Small-Molecule Drugs: Mechanism of Action, Clinical Development, Synthesis at Scale, and Outlook For Supply
No ratings yet
Quest For A COVID-19 Cure by Repurposing Small-Molecule Drugs: Mechanism of Action, Clinical Development, Synthesis at Scale, and Outlook For Supply
37 pages
Resource 20240428125627 Doc-20240422-Wa0002.
No ratings yet
Resource 20240428125627 Doc-20240422-Wa0002.
2 pages
Reliability
No ratings yet
Reliability
10 pages
Liebert Apm 30 600 KW Brochure English
No ratings yet
Liebert Apm 30 600 KW Brochure English
8 pages
L14: Optimal Linear Filtering - Wiener Filtering: Lennart Svensson
No ratings yet
L14: Optimal Linear Filtering - Wiener Filtering: Lennart Svensson
12 pages
Tabulasi Data Hasil Uji Coba Kuesioner
No ratings yet
Tabulasi Data Hasil Uji Coba Kuesioner
11 pages
Advanced Materials For Space Applications
No ratings yet
Advanced Materials For Space Applications
9 pages
Inventory Items: Part Number Part Description Material Material Code
No ratings yet
Inventory Items: Part Number Part Description Material Material Code
5 pages
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

DOM105 Session 1

Uploaded by

DOM105 Session 1

Uploaded by

DOM105 2019

Percentages are rounded to

Investor A Investor B Investor C

Sort raw data in ascending order:

Data in ordered array:

Data in ordered array:

Data in ordered array:

Data in ordered array:

Class Boundaries (Not Midpoints)

Mutual Funds Scatter Plot

 Robust measure of central tendency

 Ignores the way in which data are distributed

 Does not consider how the values cluster

Q2 is the median.

Not affected by extreme values

Interquartile Range  Q3  Q1  17.5  12.5  5

Left-skewed or negative Right-skewed or positive

Left-Skewed Symmetric Right-Skewed

A symmetrical distribution like a normal distribution will

Normal distribution has kurtosis = 0. Higher kurtosis

Numerical descriptive measures:

You might also like