0% found this document useful (0 votes)

50 views62 pages

Unit 4 Part 1

The document provides information on descriptive statistics and probability theory. [1] Descriptive statistics summarize and describe characteristics of data through measures of central tendency, dispersion, symmetry and peakedness. [2] Probability theory studies phenomena that have uncertain outcomes and aims to quantify that uncertainty. It distinguishes between deterministic, random and haphazard phenomena. Probability is defined as the number of favorable outcomes over the total number of outcomes.

Uploaded by

Girraj Dohare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views62 pages

Unit 4 Part 1

Uploaded by

Girraj Dohare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Descriptive Statistics

for
Data Sciences
what is statistics?
● It is the collection, organization, analysis and interpretation of
data.
● Statistics are mainly used to give numerical conclusions.
● For example, if anyone asks you how many people are
watching youtube, in this case, we can’t say: “many people
are watching youtube”, we have to answer in numerical terms
that give more meaning to you. We can say there are 2 billion+
monthly active users, in the same way; the users spend a daily
average of 18 minutes. This is the numerical way to conclude
the questions, and statistics is the medium used to make such
inference.
Statistics include
● Design of experiments: Used to understand Characteristics of
the dataset
● Sampling: Used to understand the samples
● Descriptive statistics: Summarization of data
● Inferential Statistics: Hypothesis way of concluding data
● Probability Theory: Likelihood estimation
Main statistical methods
● Descriptive statistics uses tools like mean and standard
deviation on a sample to summarize data.
● Inferential statistics, on the other hand, looks at data that can
randomly vary, and then draw conclusions from it.
Descriptive Statistics is distinguished from inferential statistics
by its aim to summarize the sample rather than use the data to
learn more about the Population
Descriptive statistics
● A raw dataset is difficult to describe.
● Descriptive statistics describe the dataset in a way simpler
manner through:
– The measure of central tendency (Mean, Median, Mode)
– Measure of spread (Range, Quartile, Percentiles, absolute deviation,
variance and standard deviation)
– Measure of symmetry (Skewness)
– Measure of Peakedness (Kurtosis)
Descriptive statistics using python
● import statistics as s
● s.mean(collection)
● s.mode(collection)
● s.median(collection)
● s.harmonic_mean(collection)
● s.median_low(collection)
● s.median_high(collection)
● s.variance(collection)
● s.stdev(collection)
Probability Theory
Phenomena

Deterministic Non-deterministic
Deterministic Phenomena
●
There exists a mathematical model that allows “perfect”
prediction the phenomenon's outcome.
●
Many examples exist in Physics, Chemistry (the exact
sciences).
Non-deterministic Phenomena
●
No mathematical model exists that allows “perfect” prediction
the phenomenon's outcome.
Non-deterministic Phenomena
●
may be divided into two groups.

1. Random phenomena
– Unable to predict the outcomes, but in the long-run, the
outcomes exhibit statistical regularity.

2. Haphazard phenomena
– unpredictable outcomes, but no long-run, exhibition of
statistical regularity in the outcomes.
Phenomena

Non-deterministic
Deterministic
Haphazard

Random
Random Phenomena

Examples
1. Tossing a coin – outcomes S ={Head, Tail}
Unable to predict on each toss whether is Head or Tail.
In the long run can predict that 50% of the time heads will
occur and 50% of the time tails will occur
2. Rolling a die – outcomes
S ={ , , , , , }

Unable to predict outcome but in the long run can one can
determine that each outcome will occur 1/6 of the time.
Use symmetry. Each side is the same. One side should not occur
more frequently than another side in the long run. If the die is
not balanced this may not be true.
Terminology
● The sample Space, S: for a random phenomena is the set of
all possible outcomes.

Examples
1. Tossing a coin – outcomes S ={Head, Tail}
2. Rolling a die – outcomes
S ={ }
={1, 2, 3, 4, 5, 6}
Terminology
● The event, E, is any subset of the sample space, S. i.e. any set
of outcomes (not necessarily all outcomes) of the random
phenomena
Venn
S diagram
E
Examples

1. Rolling a die – outcomes

S ={ }
={1, 2, 3, 4, 5, 6}
E = the event that an even number is rolled
= {2, 4, 6}

={ , , }
Special Events

The Null Event, The empty event - 

 = { } = the event that contains no outcomes

The Entire Event, The Sample Space - S
S = the event that contains all outcomes
The empty event, , never occurs.
The entire event, S, always occurs.
Set operations on Events
Union
Let A and B be two events, then the union of A and B is the event
(denoted by AB) defined by:

A  B = {e| e belongs to A or e belongs to B}

AB

A B
The event A  B occurs if the event A occurs or the event and B
occurs .

AB

A B
Intersection

Let A and B be two events, then the intersection of A and B is the

event (denoted by AB) defined by:

A  B = {e| e belongs to A and e belongs to B}

AB

A B
The event A  B occurs if the event A occurs and the
event and B occurs .

AB

A B
Complement

Let A be any event, then the complement of A (denoted by )

defined by: A

A = {e| e does not belongs to A}

A
A
The event Aoccurs if the event A does not occur

A
A
Definition: mutually exclusive
Two events A and B are called mutually exclusive if:

A B 

A B
If two events A and B are are mutually exclusive then:

1. They have no outcomes in common.

They can’t occur at the same time. The outcome of the random experiment can not
belong to both A and B.

A B
Definition: probability of an Event E.
Suppose that the sample space S = {o1, o2, o3, … oN} has a finite
number, N, of oucomes.
Also each of the outcomes is equally likely (because of
symmetry).
Then for any event E

n E n E no. of outcomes in E
P E =  
n S  N total no. of outcomes
Note : the symbol n  A  = no. of elements of A
Thus this definition of P[E], i.e.
n E n E no. of outcomes in E
P E =  
n S  N total no. of outcomes

Applies only to the special case when

1. The sample space has a finite no.of outcomes, and
2. Each outcome is equi-probable
If this is not true a more general definition of
probability is required.
Rule The additive rule
(Mutually exclusive events)
P[A  B] = P[A] + P[B]
i.e.
P[A or B] = P[A] + P[B]

if A  B = 
(A and B mutually exclusive)
If two events A and B are are mutually exclusive then:

1. They have no outcomes in common.

They can’t occur at the same time. The outcome of the random experiment can not
belong to both A and B.

A B
P[A  B] = P[A] + P[B]
i.e.
P[A or B] = P[A] + P[B]

A B
Rule The additive rule
(In general)

P[A  B] = P[A] + P[B] – P[A  B]

or
P[A or B] = P[A] + P[B] – P[A and B]
Logic A B
A B

A B

When P[A] is added to P[B] the outcome in A  B are counted twice

hence
P[A  B] = P[A] + P[B] – P[A  B]
P  A  B   P  A  P  B   P  A  B 

Example:
Saskatoon and Moncton are two of the cities competing for the
World university games. (There are also many others). The
organizers are narrowing the competition to the final 5 cities.
There is a 20% chance that Saskatoon will be amongst the final
5. There is a 35% chance that Moncton will be amongst the
final 5 and an 8% chance that both Saskatoon and Moncton
will be amongst the final 5. What is the probability that
Saskatoon or Moncton will be amongst the final 5.
Solution:
Let A = the event that Saskatoon is amongst the final 5.
Let B = the event that Moncton is amongst the final 5.
Given P[A] = 0.20, P[B] = 0.35, and P[A  B] = 0.08
What is P[A  B]?
Note: “and” ≡ , “or” ≡  .
P  A  B   P  A  P  B   P  A  B 
 0.20  0.35  0.08  0.47
Rule for complements

2. P  A  1  P  A

or
P  not A  1  P  A
Complement

Let A be any event, then the complement of A (denoted by )

defined by: A

A = {e| e does not belongs to A}

A
A
The event Aoccurs if the event A does not occur

A
A
Logic:
A and A are mutually exclusive.
and S  A  A

A
A

thus 1  P  S   P  A  P  A
and P  A  1  P  A
Conditional Probability
●
Frequently before observing the outcome of a random
experiment you are given information regarding the
outcome
●
How should this information be used in prediction of the
outcome.
●
Namely, how should probabilities be adjusted to take
into account this information
●
Usually the information is given in the following form:
You are told that the outcome belongs to a given event.
(i.e. you are told that a certain event has occurred)
Definition
Suppose that we are interested in computing the probability of
event A and we have been told event B has occurred.
Then the conditional probability of A given B is defined to be:

P  A  B if P  B  0
P  A B 
P  B
Rationale:
If we’re told that event B has occurred then the sample space is
restricted to B.
The probability within B has to be normalized, This is achieved by
dividing by P[B]
The event A can now only occur if the outcome is in of A ∩ B.
Hence the new probability of A is:

A
P  A  B B
P  A B 
P  B A∩B
An Example
The academy awards is soon to be shown.
For a specific married couple the probability that the husband
watches the show is 80%, the probability that his wife watches
the show is 65%, while the probability that they both watch the
show is 60%.
If the husband is watching the show, what is the probability that
his wife is also watching the show
Solution:
The academy awards is soon to be shown.
Let B = the event that the husband watches the show
P[B]= 0.80
Let A = the event that his wife watches the show
P[A]= 0.65 and P[A ∩ B]= 0.60
P  A  B 0.60
P  A B    0.75
P  B 0.80
Definition
Two events A and B are called independent if

P  A  B   P  A P  B 
Note if P  B   0 and P  A  0 then
P  A  B P  A P  B 
P  A B    P  A
P  B P  B
P  A  B P  A P  B 
and P  B A
    P  B
P  A P  A
Thus in the case of independence the conditional probability of an event is not affected by the knowledge of
the other event
Difference between independence
and mutually exclusive
mutually exclusive
Two mutually exclusive events are independent only in the special case where

P  A  0 and P  B   0. (also P  A  B   0

Mutually exclusive events are highly

A B dependent otherwise. A and B
cannot occur simultaneously. If one
event occurs the other event does not
occur.
Independent events
P  A  B   P  A P  B 

P  A  B P  A
or  P  A 
P  B P S
S
A B
A B The ratio of the probability of the set
A within B is the same as the ratio of
the probability of the set A within
the entire sample S.
The multiplicative rule of probability
 P  A P  B A if P  A  0
P  A  B  
 P  B  P  A B if P  B   0

and
P  A  B   P  A P  B 

if A and B are independent.

Random Variables

• In
an experiment, a measurement is usually
denoted by a variable such as X.

• In a random experiment, a variable whose

measured value can change (from one replicate of
the experiment to another) is referred to as a
random variable.
Probability
●
A probability is usually expressed in terms of a random
variable.
• For the part length example, X denotes the part
length and the probability statement can be written
in either of the following forms

●
Both equations state that the probability that the
random variable X assumes a value in [10.8, 11.2] is
0.25.
Probability Properties
Continuous Random Variables
Probability Density Function
• Theprobability distribution or simply distribution of
a random variable X is a description of the set of the
probabilities associated with the possible values for X.
Cumulative Distribution Function
Mean and Variance
Normal Distribution
Undoubtedly, the most widely used model for
the distribution of a random variable is a
normal distribution.

• Central limit theorem

• Gaussian distribution
Normal
Distribution
Normal Distribution
Central Limit Theorem
Discrete Random Variables
●
Only measurements at discrete points are possible

Probability
Mass Function
Cumulative Distribution Function

Probability Interview Questions
80% (5)
Probability Interview Questions
15 pages
Short Quiz Stat
69% (13)
Short Quiz Stat
21 pages
Probability Theory: Probability - Models For Random Phenomena
No ratings yet
Probability Theory: Probability - Models For Random Phenomena
116 pages
Probability Theory: Probability - Models For Random Phenomena
No ratings yet
Probability Theory: Probability - Models For Random Phenomena
82 pages
Probability
No ratings yet
Probability
116 pages
Comm-Optical Instrumentation - Unit 2
No ratings yet
Comm-Optical Instrumentation - Unit 2
106 pages
Probability Theory
No ratings yet
Probability Theory
90 pages
File 1691289494 5000672 L2-IntroductiontoProbability
No ratings yet
File 1691289494 5000672 L2-IntroductiontoProbability
74 pages
Introduction To Probability Radio
No ratings yet
Introduction To Probability Radio
47 pages
Anirban Roy 35500721006 Mathematics
No ratings yet
Anirban Roy 35500721006 Mathematics
24 pages
Conditional Probability
No ratings yet
Conditional Probability
84 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Chapter-4-Basic-Probability Study Notes
No ratings yet
Chapter-4-Basic-Probability Study Notes
15 pages
5 Probability
No ratings yet
5 Probability
51 pages
Lecture - 02 - TP
No ratings yet
Lecture - 02 - TP
28 pages
UNIT3
No ratings yet
UNIT3
5 pages
Course Instructors: 1. Dr. R. Archana Reddy 2. Mr. B. Ravindar 3. Dr. G. Ravi Kiran
No ratings yet
Course Instructors: 1. Dr. R. Archana Reddy 2. Mr. B. Ravindar 3. Dr. G. Ravi Kiran
22 pages
Module 01 PPT Class Final 02-03-2023
No ratings yet
Module 01 PPT Class Final 02-03-2023
67 pages
Chap03 2024
No ratings yet
Chap03 2024
47 pages
Probability
No ratings yet
Probability
66 pages
Probability
No ratings yet
Probability
7 pages
Chapter I
No ratings yet
Chapter I
68 pages
Lesson 5 Probability
No ratings yet
Lesson 5 Probability
77 pages
Probability Theory Random Experiment
No ratings yet
Probability Theory Random Experiment
4 pages
Probability and Statistics
No ratings yet
Probability and Statistics
35 pages
Probability Tutorial: Basic Concepts
No ratings yet
Probability Tutorial: Basic Concepts
25 pages
UNIT II Notes
No ratings yet
UNIT II Notes
31 pages
23MT2005-Session 1 Basic Concepts of Probability
No ratings yet
23MT2005-Session 1 Basic Concepts of Probability
21 pages
Unit 4 - Probability: by Name of The Creator-Vikas Ranjan Designation - Trainer Department - CTLD
No ratings yet
Unit 4 - Probability: by Name of The Creator-Vikas Ranjan Designation - Trainer Department - CTLD
17 pages
WWW Tutors 4 You Com
No ratings yet
WWW Tutors 4 You Com
18 pages
STAE Lecture Notes - LU4
No ratings yet
STAE Lecture Notes - LU4
16 pages
Probability Theories
No ratings yet
Probability Theories
21 pages
Chapter 15 Notes
No ratings yet
Chapter 15 Notes
22 pages
CH 6 - Probability
No ratings yet
CH 6 - Probability
7 pages
Week 0 Part 2 (1) - Experiment, Outcome, Sample Space, Events
No ratings yet
Week 0 Part 2 (1) - Experiment, Outcome, Sample Space, Events
46 pages
Final - Probability
No ratings yet
Final - Probability
19 pages
5
No ratings yet
5
100 pages
1probability Notes
No ratings yet
1probability Notes
4 pages
Prob 1
No ratings yet
Prob 1
4 pages
Probability & Statistics
No ratings yet
Probability & Statistics
54 pages
Supplement To Chapter 2 - Probability and Statistics
No ratings yet
Supplement To Chapter 2 - Probability and Statistics
34 pages
Stats 241.3: Probability Theory
No ratings yet
Stats 241.3: Probability Theory
67 pages
Unit 3 - Probability
No ratings yet
Unit 3 - Probability
33 pages
Chapter 03
No ratings yet
Chapter 03
18 pages
E.G. in Rolling A Die, Impossible Event Is That
No ratings yet
E.G. in Rolling A Die, Impossible Event Is That
3 pages
Probability
No ratings yet
Probability
28 pages
Sheet - 01 - Probability NJ - 247
No ratings yet
Sheet - 01 - Probability NJ - 247
17 pages
Lec2 Probability
No ratings yet
Lec2 Probability
23 pages
Probability Tutorial: Basic Concepts
No ratings yet
Probability Tutorial: Basic Concepts
32 pages
Lecture 05
No ratings yet
Lecture 05
70 pages
Notes of St. and Pro.
No ratings yet
Notes of St. and Pro.
35 pages
Probability: OR Probability Is The Extent To Which Something Is Likely To Happen
No ratings yet
Probability: OR Probability Is The Extent To Which Something Is Likely To Happen
6 pages
57probability and Statistics
No ratings yet
57probability and Statistics
31 pages
Probability
No ratings yet
Probability
53 pages
Lecture - 3 Probability Theory
No ratings yet
Lecture - 3 Probability Theory
25 pages
Stat For Economics Chapter 1-3
No ratings yet
Stat For Economics Chapter 1-3
78 pages
Math1532Chapter4 (PartOne)
No ratings yet
Math1532Chapter4 (PartOne)
51 pages
Statistics and Probability: Appendix
No ratings yet
Statistics and Probability: Appendix
13 pages
Bai1 - Bien Co Va Xac Suat - SV - en
No ratings yet
Bai1 - Bien Co Va Xac Suat - SV - en
41 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Probability Distributions: Six Sigma Thinking, #5
From Everand
Probability Distributions: Six Sigma Thinking, #5
Sumeet Savant
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
18 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
24 pages
Unit 2 Digital Electronics
No ratings yet
Unit 2 Digital Electronics
37 pages
Python Basics
No ratings yet
Python Basics
15 pages
Unit - 3 Digital Electronics - All
No ratings yet
Unit - 3 Digital Electronics - All
121 pages
UNIT - 2 - Material
No ratings yet
UNIT - 2 - Material
252 pages
Unit 3 - Operating System - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Operating System - WWW - Rgpvnotes.in
38 pages
CSO (160311) - Unit 1
No ratings yet
CSO (160311) - Unit 1
134 pages
Unit 5 - Operating System - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Operating System - WWW - Rgpvnotes.in
27 pages
Unit 4 - Operating System - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Operating System - WWW - Rgpvnotes.in
23 pages
Lecture Notes On Ridge Regression
No ratings yet
Lecture Notes On Ridge Regression
149 pages
Multi Product
No ratings yet
Multi Product
8 pages
Applications of Structural Equation Modeling in Marketing and Consumer Research
100% (1)
Applications of Structural Equation Modeling in Marketing and Consumer Research
23 pages
Aula1-Estatistica Basica e Probabilidade
No ratings yet
Aula1-Estatistica Basica e Probabilidade
68 pages
Econ 131 Problem Set 2
No ratings yet
Econ 131 Problem Set 2
1 page
Set 6
No ratings yet
Set 6
4 pages
Control Chart For Variables
No ratings yet
Control Chart For Variables
29 pages
The Beginner's Guide To Data Science Robert Ball Download
No ratings yet
The Beginner's Guide To Data Science Robert Ball Download
50 pages
Comprehensive Ebook of Statistics For Data Science - Chaitali
No ratings yet
Comprehensive Ebook of Statistics For Data Science - Chaitali
21 pages
Abstracts - Eurachem - Uncertainty - Workshop - Nov - 2019 Uncertainty From Sampling and
No ratings yet
Abstracts - Eurachem - Uncertainty - Workshop - Nov - 2019 Uncertainty From Sampling and
77 pages
Chapter-II Risk Management Process
No ratings yet
Chapter-II Risk Management Process
44 pages
Test Code: STB (Short Answer Type) 2015
No ratings yet
Test Code: STB (Short Answer Type) 2015
3 pages
Independent T
No ratings yet
Independent T
9 pages
Adv Analytical Theory and Methods: Regression
No ratings yet
Adv Analytical Theory and Methods: Regression
45 pages
Statistical Methods Using Spss 1st Edition Gabriel Otieno Okello Instant Download
No ratings yet
Statistical Methods Using Spss 1st Edition Gabriel Otieno Okello Instant Download
83 pages
Sen. Renato "Compañero" Cayetano Memorial Science and Technology Highschool
No ratings yet
Sen. Renato "Compañero" Cayetano Memorial Science and Technology Highschool
2 pages
Student's T Distribution
No ratings yet
Student's T Distribution
17 pages
An Introduction To Value at Risk (VAR)
No ratings yet
An Introduction To Value at Risk (VAR)
6 pages
Gaussian Mixture Model GMM
No ratings yet
Gaussian Mixture Model GMM
5 pages
Topic 3
No ratings yet
Topic 3
42 pages
Design of Experiments: A 360 Development Approach
100% (2)
Design of Experiments: A 360 Development Approach
24 pages
Chaeat Sheet Econometrics
100% (2)
Chaeat Sheet Econometrics
5 pages
INDE 3364 Final Exam Cheat Sheet
No ratings yet
INDE 3364 Final Exam Cheat Sheet
5 pages
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
No ratings yet
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
6 pages
CE504 - HW2 - Dec 27, 20
No ratings yet
CE504 - HW2 - Dec 27, 20
4 pages
Lesson 4.2 Normal Distribution
No ratings yet
Lesson 4.2 Normal Distribution
60 pages
SSRN Id3104816
No ratings yet
SSRN Id3104816
21 pages

Unit 4 Part 1

Uploaded by

Unit 4 Part 1

Uploaded by

Descriptive Statistics

1. Rolling a die – outcomes

The Null Event, The empty event - 

 = { } = the event that contains no outcomes

A  B = {e| e belongs to A or e belongs to B}

Let A and B be two events, then the intersection of A and B is the

A  B = {e| e belongs to A and e belongs to B}

Let A be any event, then the complement of A (denoted by )

A = {e| e does not belongs to A}

1. They have no outcomes in common.

Applies only to the special case when

1. They have no outcomes in common.

P[A  B] = P[A] + P[B] – P[A  B]

When P[A] is added to P[B] the outcome in A  B are counted twice

Let A be any event, then the complement of A (denoted by )

A = {e| e does not belongs to A}

Mutually exclusive events are highly

if A and B are independent.

• In a random experiment, a variable whose

• Central limit theorem

You might also like