0% found this document useful (0 votes)

3 views58 pages

Unit 1 Computational Statistics

The document provides an introduction to statistics, covering key concepts such as data types (categorical and numerical), univariate and bivariate analysis, and measures of central tendency and variability. It also distinguishes between data science and related fields, outlining the roles of various professionals in the domain. Additionally, it includes examples of calculating mean, variance, and standard deviation, emphasizing the importance of these measures in data analysis.

Uploaded by

anseltemp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views58 pages

Unit 1 Computational Statistics

Uploaded by

anseltemp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

UNIT-1

Introduction to Statistics
What is Statistics
• The practice or science of collecting,
interpreting and analyzing numerical data in
large quantities, especially for the purpose of
inferring proportions in a whole from those
in a representative sample.
• Collection of methods for planning
experiments, obtaining data and then
organizing, summarizing, presenting,
analyzing, interpreting & drawing conclusions.
IT IS AFFECTED BY OUTLIERS….
Statistical Data- Categorical, Numerical (Continuous)

• Categorical data includes categories or groups:

- Car brands- VW, TATA, Suzuki
- Have enrolled for a course- Yes, No
• Numerical data includes discrete and
continuous values:
- No. of vehicles: 0, 1, 2, 3,… (discrete data)
- Weight: 72.23, 68.7, … (continuous data)
They cannot be ordered
Consists of group & categories that follow a
strict order
The difference between interval and ratio scales comes from
their ability to dip below zero.
Interval scales hold no true zero and can represent values
below zero.
Eg, you can measure temperature below 0 degrees Celsius, such as -10
degrees.
Ratio variables never fall below zero.
Eg, Height and weight are measured from 0 and above, but never fall below
it.
Univariate and Bivariate Analysis
• Univariate data –
- This type of data consists of only one variable.
- The analysis of univariate data is thus the
simplest form of analysis since the information
deals with only one quantity that changes.
- It does not deal with causes or effect
relationships and the main purpose of the
analysis is to describe the data and find
patterns that exist within it.
- The example of a univariate data can be height.
- The description of patterns found in this type
of data can be made by drawing conclusions
using central tendency measures (mean, median
and mode), dispersion or spread of data (range,
minimum, maximum, quartiles, variance and
standard deviation) and by using frequency
distribution tables, histograms, pie charts and bar
charts.
• Bivariate data-
- This type of data involves two different
variables.
- The analysis of this type of data deals with
causes or effect relationships and the analysis
is done to find out the relationship among the
two variables.
• Multivariate data –
- When the data involves three or more
variables, it is categorized under multivariate.
- It is similar to Bivariate but contains more
than one dependent variable.
- The ways to perform analysis on this data
depends on the goals to be achieved.
- Example, Dataset on cholesterol, blood
pressure and weight to predict heart attack.
Distinction between Data Science & Other
related domains
Mean median mode solution
Task 1: Annual income

Mean $ 189,848.18

Median $ 55,000.00

Mode $ 64,000.00
Task-2:
-Income is an example where averages are meaningless. You should be
aware that the correct measure to use depends on the research that you
are conducting.
-Usually, whenever we have research on income, we use the median
income, instead of the mean income.
-There are certain individuals that are earning much more than others.
They are the outliers which deviate the mean value drastically.
Measure of Asymmetry- SKEWNESS

• Skewness indicates whether the data is

concentrated on one side.
• Skewness describes where does most of the
data lies.
• Positive/ Right skew:

• Zero/ No skew:
• Negative/ Left skew:
Measures of Variability
• Variance
• Standard Deviation
• Coefficient of variance
Standard Deviation
• Variance values are large.
• SD is much more small and meaningful.
• SD is the preferred measure of variability (for a
single dataset), as it is directly interpretable.

• But, if we have two or more datasets and we

want to compare the their variability
• Comparing SDs of two datasets is
meaningless
The higher the coefficient the greater is the dispersion around the mean.
Harmonic Mean
• Harmonic mean is the reciprocal of the
arithmetic mean of the reciprocals of the
values.
• It is calculated by dividing the number of
observations by the sum of reciprocal of each
number in the series.
• So, if x1, x2, x3,…xn are observations of any
variable X, then
Use of Harmonic mean
• It is used to find average of classes or
groups in a frequency distribution.
• It gives equal weight to each data point.
• Eg. Four category of typists take 4, 5, 8 and
10 minutes respectively to type a letter. Find
average time taken to type the letter.

• H.M. of any frequency distribution =

mean, variance , standard
deviation:
Mean • μ if working with population
(or Average) • x̄ if working with samples
denoted by

Variance • σ2 (for population)

denoted by • s2 (for sample)

Standard • σ or σ (for
deviation population)
X
denoted by • sX or s (for sample)
Mean – is a simple average of given data values:

●Example: 4,5,9,2,14,6

●Mean x̄ =
(4+5+9+3+15+6) /6
= 42/6
=7
Variance: a measure of how data-
points differ from the mean
● Marks of Student A : 30, 50, 70, 100, 100
● Marks of Student B: 70, 70, 70, 70, 70
●The mean (average) of 2 students’ marks are:
○Marks of Student A : mean = 70
○Marks of Student B : mean = 70
●But we know that the two data sets are not
identical !
●So, variance will show how they are different.
●We want to ﬁnd a way to represent these two
datasets numerically.
How to Calculate variance?

●If we conceptualize the spread of a

distribution as the extent to which the
values in the distribution differ from the
mean and from each other, then a
reasonable measure of spread might be
the average deviation, or difference of
the values from the mean.
How to Calculate variance?

● The average of the squared deviations about the mean is called

the variance.

For population variance

For sample variance

Example 1- Variance
Score ( )2

A
1 30
2 50
3 70
4 100
5 100
Total 350

The mean is 350/5 = 70

Example 1- Variance
Score ( )2

X
1 30 30-70=-40
2 50 50-70=-20
3 70 70-70=0
4 100 100-70=30
5 100 100-70=30
Total 350
Example 1- Variance
Score ( )2

X
1 30 30-70=-40 1600
2 50 50-70=-20 400
3 70 70-70=0 00
4 100 100-70=30 900
5 100 100-70=30 900
Total 350 3800
Example 1- Variance
Score ( )2
X

1
30 30-70=-40 1600
2
50 50-70=-20 400
3
70 70-70=0 00
4
100 100-70=30 900
5
100 100-70=30 900
Totals 350 3800

= 3800/5 =
760
Example 1- Variance
Score ( )2
B

1
70 70-70=0 0
2
70 70-70=0 0
3
70 70-70=0 0
4
70 70-70=0 0
5
70 70-70=0 0
Totals 350 0

0/5 =0
Example 2- Variance
Drive Mark Mathe
w
1 28 27
2 22 27
3 21 28
4 26 6
5 18 27
Which driver was more
consistent?
Example 2- Variance
Drive Mark's ( )2

Score X

1 28 5 25
2 22 -1 1
3 21 -2 4
4 26 3 9
5 18 -5 25
Totals 115 64

_
X = (28+22+21+26+18)/5 = 23
Example 2- Variance
Drive Mathew's ( )2

Score X

1 27 4 16
2 27 4 16
3 28 5 25
4 06 -17 289
5 27 4 16
Totals 115 362
Mark’s Variance = 64 / 5 = 12.8
Mathew’s Variance = 362 / 5 = 72.4

Conclusion: Mark has a lower variance therefore, he is more

consistent.
Standard Deviation -a measure of
variation of scores about the mean
●We Can think of standard deviation as the
average distance to the mean
●Higher standard deviation indicates higher
spread, less consistency, and less clustering.
●Sample standard deviation:

●Population standard
deviation:
Example – Standard
Deviation
Drive Mark's Score ( )2
X

1 28 5 25

2 22 -1 1 Mark’s Variance = 64 / 5
= 12.8
3 21 -2 4

4 26 3 9

5 18 -5 25

Totals 115 64
Example- Variance & Standard
Deviation
●You have just measured the heights of your dogs (in mm)
● The heights (at the shoulders) are: 600mm, 470mm,
170mm, 430mm and 300mm.
●Find out the Mean, the Variance, and the Standard Deviation.
Example- Variance & Standard
Deviation
●Your ﬁrst step is to ﬁnd the Mean:
● Mean (600 + 470 + 170 + 430 + 300)/ 5
=
=
● Mean 1970/5
● Mean = 394
Example- Variance & Standard
Deviation
●Now we calculate each dog's difference from the
Mean
Example- Variance & Standard
Deviation
●To calculate the Variance, take each difference, square it, and
then average the result:
● Variance

= 2062 + 762 + (−224)2 + 362 + (−94)2 / 5

σ2
= 42436 + 5776 + 50176 + 1296 + 8836 / 5
= 108520 / 5
= 21704

● So the Variance σ2 is 21,704

Example- Variance & Standard Deviation

●And the Standard Deviation is just the square root of Variance,

so:
● Standard Deviation

σ = √21704
= 147.32...
= 147 (to the nearest mm)
Example- Variance & Standard
Deviation
●And the good thing about the Standard Deviation is that it is useful. Now we
can show which heights are within one Standard Deviation (147mm) of
the Mean:

● So, using the Standard Deviation we have a "standard" way

of knowing what is normal, and what is extra large
or extra small.
Home Assignment: Database of Real Estate Company
• We need all statistical properties of the data.
• Compute measures of central tendency and
variability and comment on each value.
• Plot Line, Scatter, Box plots, Histogram for
insightful use-cases such as a scatter plot for
age vs price.
Defining Data Science and Big Data
• Data science is an umbrella term that
encompasses data analytics, data mining,
machine learning, and several other related
disciplines.
• It includes collection, ingestion, retrieval and
transformation of large amounts of data
(collectively known as big data).
• The best way to describe data science is via a
Venn diagram created by Hugh Conway in
2010:
Data Science Roles
• Data Scientist: Understands data from a specific
business point of view, establishes experimental setup
and provides accurate predictions and insights that
can be used to power critical business decisions.
• Data Analyst: Data Analysts takes a technical role in
developing, implementing, and maintaining analytic
systems.
• Business Analyst: The Business Analyst is responsible
to use data to drive business decisions.
• Statistician: He is responsible for creating data-driven
surveys, opinion polls, and questionnaires and
interpreting them.
• Data and Analytics Manager: Plays the role of
assigning duties and operations to the data
science team.
• Database Administrator
• Data Engineer: Data Engineer is responsible
for transforming data into an easily analyzable
format.
• Data Architect: The role of a Data Architect is
to integrate, protect, maintain and expand the
data sources of an organization.

Measures of Central Tendency and Dispersion/ Variability
No ratings yet
Measures of Central Tendency and Dispersion/ Variability
35 pages
ML Unit 2 MCQ
100% (2)
ML Unit 2 MCQ
3 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Measures of Variability
No ratings yet
Measures of Variability
20 pages
Data Science & Machine Learning Algorithms - A CONCISEtasets, and Free Text Books) - Ananthu S Chakravarthi
100% (3)
Data Science & Machine Learning Algorithms - A CONCISEtasets, and Free Text Books) - Ananthu S Chakravarthi
90 pages
Bayesian Approach For Animal Breeding Data Analysis
50% (2)
Bayesian Approach For Animal Breeding Data Analysis
42 pages
Statpro Reporting Finaaaaal
No ratings yet
Statpro Reporting Finaaaaal
22 pages
Ids Unit 2 Notes Ckm-1
No ratings yet
Ids Unit 2 Notes Ckm-1
30 pages
Class 5.2 B Business Statistics Measures of Dispersion
No ratings yet
Class 5.2 B Business Statistics Measures of Dispersion
63 pages
Measure of Central Tendency and Variability
No ratings yet
Measure of Central Tendency and Variability
73 pages
Measure of Dispression
100% (1)
Measure of Dispression
36 pages
Introduction To Probability and Statistics For Engineers and Scientists 5th Edition Ross Fast Access
No ratings yet
Introduction To Probability and Statistics For Engineers and Scientists 5th Edition Ross Fast Access
338 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Unit 2 - Mechanical Measurement & Control - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Mechanical Measurement & Control - WWW - Rgpvnotes.in
11 pages
Week Probability and Statistics
No ratings yet
Week Probability and Statistics
17 pages
Statistics - Compendium - DMS IIT DELHI - 2025
No ratings yet
Statistics - Compendium - DMS IIT DELHI - 2025
18 pages
Cental Tendency
No ratings yet
Cental Tendency
20 pages
Statistic Part 2
No ratings yet
Statistic Part 2
22 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
Unit 5 BRM
No ratings yet
Unit 5 BRM
17 pages
Reporting Statistics in Psychology
No ratings yet
Reporting Statistics in Psychology
7 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
GROUP 5 Interpretation and Use of Measures of Variation Standard Deviation and Variance 1
No ratings yet
GROUP 5 Interpretation and Use of Measures of Variation Standard Deviation and Variance 1
5 pages
BAA Class Notes
No ratings yet
BAA Class Notes
16 pages
Qsar Stastistical Method in Drug Design
No ratings yet
Qsar Stastistical Method in Drug Design
54 pages
Math
No ratings yet
Math
6 pages
L01: Basic Statistics: 2007 Winter Math Course, COMP Department, HKUST
No ratings yet
L01: Basic Statistics: 2007 Winter Math Course, COMP Department, HKUST
18 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
34 pages
Basic Statistics
No ratings yet
Basic Statistics
24 pages
Measures of Dispersion and Relative Standing
No ratings yet
Measures of Dispersion and Relative Standing
11 pages
Standard Deviation and Coefficient Variation: Presidency University BANGALORE-560 064
No ratings yet
Standard Deviation and Coefficient Variation: Presidency University BANGALORE-560 064
18 pages
BS Lect 05
No ratings yet
BS Lect 05
35 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Surgical Safety Checklist
No ratings yet
Surgical Safety Checklist
103 pages
Module 5
No ratings yet
Module 5
51 pages
Maths
No ratings yet
Maths
30 pages
Descriptive Statistics-1
No ratings yet
Descriptive Statistics-1
7 pages
Assignment 2
No ratings yet
Assignment 2
19 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Ed216 Chapter 7
No ratings yet
Ed216 Chapter 7
31 pages
Week 2 Central Tendency and Variability
No ratings yet
Week 2 Central Tendency and Variability
28 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
Stats Notes
No ratings yet
Stats Notes
29 pages
3-Measures of Central Tendency
No ratings yet
3-Measures of Central Tendency
59 pages
Data Management
No ratings yet
Data Management
48 pages
Variance, Standard Variance
No ratings yet
Variance, Standard Variance
33 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Chapter Four: Measures of Variation
No ratings yet
Chapter Four: Measures of Variation
26 pages
Methods of Center Measurement: X N X X X
No ratings yet
Methods of Center Measurement: X N X X X
85 pages
Standard Deviation
No ratings yet
Standard Deviation
37 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Measures of Dispersion or Variability
No ratings yet
Measures of Dispersion or Variability
15 pages
Assessment in Learning 1 Unit 4 Presentation Quantitative Analysis and Interpretation
No ratings yet
Assessment in Learning 1 Unit 4 Presentation Quantitative Analysis and Interpretation
86 pages
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Dsur I Chapter 18 Categorical Data
No ratings yet
Dsur I Chapter 18 Categorical Data
47 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
79 pages
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
No ratings yet
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
48 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Business Statistics,: 9e, GE (Groebner/Shannon/Fry) Chapter 3 Describing Data Using Numerical Measures
No ratings yet
Business Statistics,: 9e, GE (Groebner/Shannon/Fry) Chapter 3 Describing Data Using Numerical Measures
43 pages
Statistics Notes
No ratings yet
Statistics Notes
46 pages
Parametric and Nonparametric
No ratings yet
Parametric and Nonparametric
2 pages
Studiul I Meta Analysis Investment Model Le Agnew 2003
100% (1)
Studiul I Meta Analysis Investment Model Le Agnew 2003
21 pages
Basic Concepts
No ratings yet
Basic Concepts
9 pages
2 Mean Median Mode Variance
No ratings yet
2 Mean Median Mode Variance
29 pages
Regression Analysis: Brand Association and Replica
No ratings yet
Regression Analysis: Brand Association and Replica
5 pages
Measures of Central Tendency and Variability
No ratings yet
Measures of Central Tendency and Variability
38 pages
ch3 SEM Methods of Estimation - 105548
No ratings yet
ch3 SEM Methods of Estimation - 105548
17 pages
RKD Unit 4
No ratings yet
RKD Unit 4
4 pages
Scatter Diagrams
No ratings yet
Scatter Diagrams
12 pages
JASP Manual: Seton Hall University Department of Psychology 2018
No ratings yet
JASP Manual: Seton Hall University Department of Psychology 2018
48 pages
Aps 6 3 Notes
No ratings yet
Aps 6 3 Notes
6 pages
Non Parametric Tests
No ratings yet
Non Parametric Tests
26 pages
FTE - Unit 5 - Air Compressor
No ratings yet
FTE - Unit 5 - Air Compressor
51 pages
C4 English
No ratings yet
C4 English
27 pages
A Study On Digital Transformation in Southeast Bank PLC. (Uttara Branch) : Impacts On Customer Experience.
No ratings yet
A Study On Digital Transformation in Southeast Bank PLC. (Uttara Branch) : Impacts On Customer Experience.
59 pages
Unit 3-Engg Materials (Part A - Polymers) - Engg Chemistry
No ratings yet
Unit 3-Engg Materials (Part A - Polymers) - Engg Chemistry
45 pages
RKD Unit 6
No ratings yet
RKD Unit 6
4 pages
Get Introduction To Probability and Statistics For Engineers and Scientists, 6th Edition Sheldon M. Ross PDF Ebook With Full Chapters Now
No ratings yet
Get Introduction To Probability and Statistics For Engineers and Scientists, 6th Edition Sheldon M. Ross PDF Ebook With Full Chapters Now
40 pages
High Speed Extrusion
No ratings yet
High Speed Extrusion
7 pages
BBT 3106 - Probability & Statistics II - August 2023EC
No ratings yet
BBT 3106 - Probability & Statistics II - August 2023EC
3 pages
Part A Statistics Class 3
No ratings yet
Part A Statistics Class 3
3 pages
Simplified ML Algorithms
No ratings yet
Simplified ML Algorithms
3 pages
RKD Unit 5
No ratings yet
RKD Unit 5
4 pages
RKD Unit 3
No ratings yet
RKD Unit 3
4 pages
Module 8-3 Inference About Two Populations
No ratings yet
Module 8-3 Inference About Two Populations
64 pages
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
No ratings yet
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
2 pages
Six Spot Step Test
No ratings yet
Six Spot Step Test
7 pages
Sfa Coelli 95
No ratings yet
Sfa Coelli 95
12 pages
Nu - Edu.kz Econometrics-I Assignment 6 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 6 Answer Key
8 pages
The Pajama Survey
No ratings yet
The Pajama Survey
10 pages
Business Analytics Chapter 5
No ratings yet
Business Analytics Chapter 5
2 pages
Homework 1 - Simple Linear Regression - Neal Pania
No ratings yet
Homework 1 - Simple Linear Regression - Neal Pania
4 pages
Summary On One Sample Hypothesis Testing
No ratings yet
Summary On One Sample Hypothesis Testing
1 page
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
SAT Math Shortcuts
From Everand
SAT Math Shortcuts
Bella Biscotti
No ratings yet

Unit 1 Computational Statistics

Uploaded by

Unit 1 Computational Statistics

Uploaded by

UNIT-1

• Categorical data includes categories or groups:

• Skewness indicates whether the data is

• But, if we have two or more datasets and we

• H.M. of any frequency distribution =

Variance • σ2 (for population)

●If we conceptualize the spread of a

● The average of the squared deviations about the mean is called

For population variance

For sample variance

The mean is 350/5 = 70

Conclusion: Mark has a lower variance therefore, he is more

= 2062 + 762 + (−224)2 + 362 + (−94)2 / 5

● So the Variance σ2 is 21,704

●And the Standard Deviation is just the square root of Variance,

● So, using the Standard Deviation we have a "standard" way

You might also like