0% found this document useful (0 votes)

1 views30 pages

Unit 3 DS

Descriptive statistics summarize data sets through measures of central tendency (mean, median, mode) and measures of variability (spread). Techniques such as box plots, pivot tables, heat maps, and correlation statistics are used to visualize and analyze data. Additionally, concepts like variance, covariance, and regression are essential for understanding relationships between variables.

Uploaded by

kvrsbabu2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views30 pages

Unit 3 DS

Uploaded by

kvrsbabu2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

DESCRIPTIVE

STATISTICS

 Descriptive statistics are brief

informational coefficients that
summarize a given data set, which can
be either a representation of the entire
population or a sample of a population.
Descriptive statistics are broken down into
measures of central tendency and measures
of variability (spread).
MEASURES OF CENTRAL
TENDENCY
• Measures of central tendency are the values that
describe a data set by identifying the central
position of the data. There are 3 main measures
of central tendency - Mean, Median and Mode.
• Mean- Sum of all observations divided by the total
number of observations.
• Median- The middle or central value in an ordered
set.
• Mode- The most frequently occurring value in a
data set.
 MEASURE OF VARIATION

• Measure of Variation Measure of variation is the way to

extract meaningful information from a set of provided data.

Variability provides a lot of information about the data. and some

of the information it provides is mentioned below: It shows how far

data items lie from each other. It shows the distance from the

center of the distribution.

 Quartiles and percentiles are a measures of variation, which describes how
spread out the data is.
Quartiles and percentiles are both types of quantiles.
 Exploratory data analytics descriptive
statistics
 Exploratory Data Analysis of
Mean
 Standard Deviation

1. A standard deviation (or σ) is

a measure of how dispersed the

data is in relation to the mean.

Low standard deviation means data

are clustered around the mean, and

high standard deviation indicates data

are more spread out.

 BOX POLTS

a box plot or boxplot is a method for graphically demonstrating

the locality, spread and skewness groups of numerical data

through their quartiles. In addition to the box on a box plot, there

can be lines extending from the box indicating variability outside

the upper and lower quartiles, thus, the plot is also termed as

the box-and-whisker plot and the box-and-whisker diagram.

BOX PLOT :-
BOX PLOT: IT IS A TYPE OF CHART THAT DEPICTS A GROUP OF
NUMERICAL DATA THROUGH THEIR QUARTILES. IT IS A SIMPLE WAY
TO VISUALIZE THE SHAPE OF OUR DATA. IT MAKES COMPARING
CHARACTERISTICS OF DATA BETWEEN CATEGORIES VERY EASY.
CODE :

import matplotlib.pyplot as plt

value1 = [82,76,24,40,67,62,75,78,71,32,98,89,78,67,72,82,87,66,56,52]

value2=[62,5,91,25,36,32,96,95,3,90,95,32,27,55,100,15,71,11,37,21]

value3=[23,89,12,78,72,89,25,69,68,86,19,49,15,16,16,75,65,31,25,52]

value4=[59,73,70,16,81,61,88,98,10,87,29,72,16,23,72,88,78,99,75,30]

box_plot_data=[value1,value2,value3,value4]

plt.boxplot(box_plot_data)
plt.show()
RESULT OF CODE :
 Pivot Table
• Pivot tables are one of Excel's most powerful features. A pivot table allows you
to extract the significance from a large, detailed data set.

. Insert a Pivot Table

To insert a pivot table,
execute the following steps.
1. Click any single cell inside
the data set.
2. On the Insert tab, in the
Tables group, click
PivotTable.
3. Click ok.
PIVOT TABLES :

• Pivot Tables: A pivot table is a table of statistics that

summarizes the data of a more extensive table (such as from
a database, spreadsheet, or business intelligence program).
This summary might include sums, averages, or other
statistics, which the pivot table groups together in a
meaningful way.
CODE :-
• import pandas as pd
• data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A',
'B', 'C', 'D', 'E'], 'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60,
1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50], 'quarter': [1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4], 'country': ['US', 'Japan', 'Brazil', 'UK', 'US',
'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan', 'Brazil', 'UK', 'Brazil', 'Japan',
'Japan', 'Brazil', 'UK', 'US'] }
• df = pd.DataFrame(data)
• pivot = df.pivot_table(index=['person'], values=['sales'], aggfunc='sum’)
• print(pivot)
RESULT OF CODE :-
 HEAT MAP

• A heat map (or heatmap) is a data visualization technique that shows magnitude of
a phenomenon as color in two dimensions. The variation in color may be by hue or
intensity, giving obvious visual cues to the reader about how the phenomenon is clustered
or varies over space.
HEAT MAPS :-
• A heatmap is a two-dimensional graphical
representation of data where the individual
values that are contained in a matrix are
represented as colours
CODE :-
From pandas import DataFrame

import matplotlib.pyplot as plt

data=[{2,3,4,1},{6,3,5,2},{6,3,5,4},{3,7,5,4},{2,8,1,5}]

Index= [‘I1’, ‘I2’,’I3’,’I4’,’I5’]

Cols = [‘C1’, ‘C2’, ‘C3’,’C4’]

df = DataFrame(data, index=Index, columns=Cols)

plt.pcolor(df)
• plt.show()
RESULT OF CODE :-
 CORRELATION STATISTICS

In statistics, correlation or dependence is any statistical relationship, whether causal or not,

between two random variables or bivariate data. Although in the broadest sense, "correlation"
may indicate any type of association, in statistics it normally refers to the degree to which a
pair of variables are linearly related.
CORRELATION :-

• A correlation coefficient is a number between -1 and 1

that tells you the strength and direction of a relationship
between variables.
• Correlation coefficients quantify the association between
variables or features of a dataset. These statistics are of high
importance for science and technology, and Python has great
tools that you can use to calculate them. SciPy, NumPy, and
Pandas correlation methods are fast, comprehensive, and
well-documented
CODE :-

corrmat = data.corr()

ax = plt.subplots(figsize =(9, 8))

sns.heatmap(corrmat, ax = ax, cmap ="YlGnBu",
linewidths = 0.1)
RESULT OF CODE :-
 Random Variable
• A random variable is a variable whose value is unknown or a function that
assigns values to each of an experiment's

• The use of random variables is most common in probability and statistics, where they
are used to quantify outcomes
• Risk analysts use random variables to estimate the probability of an adverse event
occurring.
 Variance
• Variance is a measure of how data points differ from the mean.
According to Layman, a variance is a measure of how far a set
of data (numbers) are spread out from their mean (average)
value.
• Variance means to find the expected difference of deviation
from actual value. Therefore, variance depends on the standard
deviation of the given data set.
• The more the value of variance, the data is more scattered from
its mean and if the value of variance is low or minimum, then it
is less scattered from mean. Therefore, it is called a measure of
spread of data from mean.
 COVARIANCE

• Covariance is a measure of the relationship between two random variables and to what extent,

they change together. Or we can say, in other words, it defines the changes between the two

variables, such that change in one variable is equal to change in another variable. This is the

property of a function of maintaining its form when the variables are linearly transformed.

Covariance is measured in units, which are calculated by multiplying the units of the two

variables.
• Covariance can have both positive and negative values. Based on this, it has two types:
1.positive covariance

2.Negitive covariance
 Correlation Linear
Transformations of Random
Variable

A linear rescaling is a transformation of the form g(u) = a+bu g (u) = a + b u. A

linear rescaling of a random variable does not change the basic shape of its
distribution, just the range of possible values. A linear rescaling transforms the
mean in the same way the individual values are transformed.
 REGRESSION

THANK YOU

Hustler Mini Z 44/52 Parts Manual
0% (1)
Hustler Mini Z 44/52 Parts Manual
125 pages
22 Scheme CSE
No ratings yet
22 Scheme CSE
65 pages
Industrial Statistics - A Computer Based Approach With Python
No ratings yet
Industrial Statistics - A Computer Based Approach With Python
140 pages
Unit 3
No ratings yet
Unit 3
20 pages
Aphical Representation
No ratings yet
Aphical Representation
8 pages
722.9, 7G-Tronic NAG2 Uncomfortable Shift Quality
100% (2)
722.9, 7G-Tronic NAG2 Uncomfortable Shift Quality
3 pages
Dw-Vet3 500 W2.0
100% (1)
Dw-Vet3 500 W2.0
41 pages
01-MCC310 Single Line Diagram For UCD4 MCC of TLM Plant
No ratings yet
01-MCC310 Single Line Diagram For UCD4 MCC of TLM Plant
17 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
T-49C-CA MOD2 Operational Manual
No ratings yet
T-49C-CA MOD2 Operational Manual
52 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
STS PDF
100% (1)
STS PDF
4 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
02 Data
No ratings yet
02 Data
62 pages
VIPDMTheory Chapter 2
No ratings yet
VIPDMTheory Chapter 2
56 pages
Data Mining 2
No ratings yet
Data Mining 2
64 pages
02a EDA and Data Visualization
No ratings yet
02a EDA and Data Visualization
79 pages
Mod2 Notes
No ratings yet
Mod2 Notes
72 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
3 Data Description
No ratings yet
3 Data Description
87 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
Full Statistics
No ratings yet
Full Statistics
108 pages
Unit .......
No ratings yet
Unit .......
45 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Weekly Learning Plan: Calculates The Measures of Central Tendency of Ungrouped and Grouped Data
No ratings yet
Weekly Learning Plan: Calculates The Measures of Central Tendency of Ungrouped and Grouped Data
5 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Solutions Modernstatistics
No ratings yet
Solutions Modernstatistics
144 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
CB161 (R Lab Manual)
No ratings yet
CB161 (R Lab Manual)
32 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Data Visualization
No ratings yet
Data Visualization
37 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Chapter - 3
No ratings yet
Chapter - 3
11 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
DOM503 Session 1
No ratings yet
DOM503 Session 1
19 pages
Unit 5
No ratings yet
Unit 5
25 pages
Unit II Descriptive-Statistics-And-Correlation
No ratings yet
Unit II Descriptive-Statistics-And-Correlation
19 pages
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
No ratings yet
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
21 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Session 3
No ratings yet
Session 3
11 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
Notes: Section 1: Exploratory Data Analysis
No ratings yet
Notes: Section 1: Exploratory Data Analysis
6 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Python Statisc
No ratings yet
Python Statisc
7 pages
Statistics FoundationalMathofAI S24
No ratings yet
Statistics FoundationalMathofAI S24
5 pages
Data Analysis: Measures of Dispersion
No ratings yet
Data Analysis: Measures of Dispersion
6 pages
Statistics and Its Types (v1.0)
No ratings yet
Statistics and Its Types (v1.0)
6 pages
Module 4
No ratings yet
Module 4
68 pages
Lecture Note 1
No ratings yet
Lecture Note 1
42 pages
Data Sheet 3D 40-200 9.26
No ratings yet
Data Sheet 3D 40-200 9.26
6 pages
Unit2 Maths IV
No ratings yet
Unit2 Maths IV
189 pages
CRI215 1st Exam Coverage - Part 2
No ratings yet
CRI215 1st Exam Coverage - Part 2
16 pages
Measures of Dispersion or Variation: Vijay - Gahlawat@yahoo - Co.in
No ratings yet
Measures of Dispersion or Variation: Vijay - Gahlawat@yahoo - Co.in
31 pages
ASSIGNMENT 1 Mass Com
No ratings yet
ASSIGNMENT 1 Mass Com
4 pages
Unit 3
No ratings yet
Unit 3
41 pages
07-Framed Formwork - MGB
No ratings yet
07-Framed Formwork - MGB
60 pages
SOCHUM
No ratings yet
SOCHUM
20 pages
Unit 3 MCQ
No ratings yet
Unit 3 MCQ
16 pages
Generating Permutations. Ranking and Unranking Permutations. The Pigeonhole Principle. The Inclusion and Exclusion Principle
No ratings yet
Generating Permutations. Ranking and Unranking Permutations. The Pigeonhole Principle. The Inclusion and Exclusion Principle
88 pages
Ajava1 To 23prac
No ratings yet
Ajava1 To 23prac
82 pages
Beginning JSP 2-From Novice To Professional
No ratings yet
Beginning JSP 2-From Novice To Professional
39 pages
Steeper Lower Limb Catalogue
No ratings yet
Steeper Lower Limb Catalogue
163 pages
Unit-I - Basic Concepts of GIS - Course Material - 202408
No ratings yet
Unit-I - Basic Concepts of GIS - Course Material - 202408
142 pages
Unit-II - GIS Data - Course Material - 202408
No ratings yet
Unit-II - GIS Data - Course Material - 202408
105 pages
Azure DevOps Assignment
No ratings yet
Azure DevOps Assignment
4 pages
Unit-IV - Applications of GIS - Course Material - 202408
No ratings yet
Unit-IV - Applications of GIS - Course Material - 202408
50 pages
Kashif Mehmood
No ratings yet
Kashif Mehmood
3 pages
Uniswap v3 Liquidity Math
No ratings yet
Uniswap v3 Liquidity Math
8 pages
List of Tags - All Tags
No ratings yet
List of Tags - All Tags
6 pages
The One Page Linux Manual
No ratings yet
The One Page Linux Manual
2 pages
Cinema4d Env Variables
No ratings yet
Cinema4d Env Variables
1 page
Www10.Goiania.go.Gov.br
No ratings yet
Www10.Goiania.go.Gov.br
3 pages
Subcontractor Permit Request2023
No ratings yet
Subcontractor Permit Request2023
1 page
Patterson Rental Tools
No ratings yet
Patterson Rental Tools
1 page
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet