0% found this document useful (0 votes)
59 views25 pages

Lecture-6: Introduction To Data Science

This document provides an introduction to statistics and probability concepts in data science. It discusses descriptive statistics such as measures of central tendency (mean, median, mode) and variability (range, interquartile range, box plots). Probability topics covered include basic probability, probability distributions like the normal distribution, and probability types like marginal, joint, and conditional probability. Key concepts in inferential statistics like the central limit theorem are also introduced.

Uploaded by

Saif Ali Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views25 pages

Lecture-6: Introduction To Data Science

This document provides an introduction to statistics and probability concepts in data science. It discusses descriptive statistics such as measures of central tendency (mean, median, mode) and variability (range, interquartile range, box plots). Probability topics covered include basic probability, probability distributions like the normal distribution, and probability types like marginal, joint, and conditional probability. Key concepts in inferential statistics like the central limit theorem are also introduced.

Uploaded by

Saif Ali Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Lecture- 6

Introduction to Data Science:

Basic of Statistics and Probability


Statistics

• Types:
–  descriptive statistics 
–  inferential statistics

2
Descriptive Statistics: Variability
(Spread)=>First Quartile and Third Quartile
• The lower half of a data set is the set of all values that are to
the left of the median value when the data has been put into
increasing order.
• The upper half of a data set is the set of all values that are to
the right of the median value when the data has been put into
increasing order.
• The first quartile, denoted by Q1 , is the median of the lower
half of the data set. This means that about 25% of the numbers
in the data set lie below Q1 and about 75% lie above Q1 .
• The third quartile, denoted by Q3 , is the median of the upper
half of the data set. This means that about 75% of the numbers
in the data set lie below Q3 and about 25% lie above Q3 .
3
http://web.mnstate.edu/peil/MDEV102/U4/S36/S363.html
4

https://www.slideshare.net/Sazedur92/measures-of-dispersion-73562437
5
http://web.mnstate.edu/peil/MDEV102/U4/S36/S363.html
Box-and-whisker plots

• Box-and-whisker plots and the five key values used in


constructing a box-and-whisker plot.
• The key values are called a five-number summary, which
consists of the minimum, first quartile, median, 
third quartile, and maximum.

6
http://web.mnstate.edu/peil/MDEV102/U4/S36/S363.html
Box-and-whisker plots
Five-Number Summary
Definitions:
• The minimum value of a data set is the least value in the set.
• The maximum value of a data set is the greatest value in the
set.
• The range of a data set is the distance between the maximum
and minimum value. To compute the range of a data set, we
subtract the minimum from the maximum:
range = maximum – minimum.
• The interquartile range of a data set is the distance between
the two quartiles.
Interquartile range = Q3 – Q1.
7
http://web.mnstate.edu/peil/MDEV102/U4/S36/S363.html
Box-and-whisker plots

8
http://web.mnstate.edu/peil/MDEV102/U4/S36/S363.html
Box-and-whisker plots

• Example 1: Draw a box-and-whisker plot for the data set {3,


7, 8, 5, 12, 14, 21, 13, 18}.

9
http://web.mnstate.edu/peil/MDEV102/U4/S36/S363.html
Probability

• Probability and statistics are inter connected field of


mathematics. Probability is a mathematical method
used for statistical analysis. They deal with analyzing the
related frequency of the event

10
11
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
12
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
13
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
Disjoint (Mutually exclusive event)

• Disjoint Event Two events, A and B, are disjoint if they do not


have any common outcomes.
• Union of Two Event The union of A and B consists of
outcomes that are in A or B, denoted by A∪B.
• For the union of two events, P(A ∪ B) = P(A) + P(B) − P(A ∩
B).
• If the events are disjoint, then P(A ∩ B)=0, so P(A ∪ B) =
P(A) + P(B).

14
Probability Distribution

• Probability density function


• Normal distribution
• Central limit theorem

15
Probability Density Function

16
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
Normal distribution

X is a normal random variable

17
Normal distribution

All normal distributions, regardless of the mean and variance,


share certain characteristics.
These include:
• Symmetry
• Unimodality (a single most common value)
• A continuous range from –∞ to +∞ (negative infinity to positive
infinity)
• A total area under the curve of 1
• A common value for the mean, median, and mode

18
Normal distribution

The empirical rule states that for any normal distribution:


• About 68% of the data will fall within one standard deviation of
the mean
• About 95% of the data will fall within two standard deviations
of the mean
• Over 99% of the data will fall within three standard deviations
of the mean

19
Normal distribution

Percent of data falling into specified ranges of the normal


distribution

20
Normal distribution

21
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
22
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
Types of Probability

Three important types of Probability:


• Marginal Probability
• Joint Probability
• Conditional Probability

23
24
25

You might also like