0% found this document useful (0 votes)
1 views30 pages

Unit 3 DS

Descriptive statistics summarize data sets through measures of central tendency (mean, median, mode) and measures of variability (spread). Techniques such as box plots, pivot tables, heat maps, and correlation statistics are used to visualize and analyze data. Additionally, concepts like variance, covariance, and regression are essential for understanding relationships between variables.

Uploaded by

kvrsbabu2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views30 pages

Unit 3 DS

Descriptive statistics summarize data sets through measures of central tendency (mean, median, mode) and measures of variability (spread). Techniques such as box plots, pivot tables, heat maps, and correlation statistics are used to visualize and analyze data. Additionally, concepts like variance, covariance, and regression are essential for understanding relationships between variables.

Uploaded by

kvrsbabu2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

DESCRIPTIVE

STATISTICS

 Descriptive statistics are brief


informational coefficients that
summarize a given data set, which can
be either a representation of the entire
population or a sample of a population.
Descriptive statistics are broken down into
measures of central tendency and measures
of variability (spread).
MEASURES OF CENTRAL
TENDENCY
• Measures of central tendency are the values that
describe a data set by identifying the central
position of the data. There are 3 main measures
of central tendency - Mean, Median and Mode.
• Mean- Sum of all observations divided by the total
number of observations.
• Median- The middle or central value in an ordered
set.
• Mode- The most frequently occurring value in a
data set.
 MEASURE OF VARIATION

• Measure of Variation Measure of variation is the way to

extract meaningful information from a set of provided data.

Variability provides a lot of information about the data. and some

of the information it provides is mentioned below: It shows how far

data items lie from each other. It shows the distance from the

center of the distribution.


 Quartiles and percentiles are a measures of variation, which describes how
spread out the data is.
Quartiles and percentiles are both types of quantiles.
 Exploratory data analytics descriptive
statistics
 Exploratory Data Analysis of
Mean
 Standard Deviation

1. A standard deviation (or σ) is

a measure of how dispersed the

data is in relation to the mean.

Low standard deviation means data

are clustered around the mean, and

high standard deviation indicates data

are more spread out.


 BOX POLTS

a box plot or boxplot is a method for graphically demonstrating

the locality, spread and skewness groups of numerical data

through their quartiles. In addition to the box on a box plot, there

can be lines extending from the box indicating variability outside

the upper and lower quartiles, thus, the plot is also termed as

the box-and-whisker plot and the box-and-whisker diagram.


BOX PLOT :-
BOX PLOT: IT IS A TYPE OF CHART THAT DEPICTS A GROUP OF
NUMERICAL DATA THROUGH THEIR QUARTILES. IT IS A SIMPLE WAY
TO VISUALIZE THE SHAPE OF OUR DATA. IT MAKES COMPARING
CHARACTERISTICS OF DATA BETWEEN CATEGORIES VERY EASY.
CODE :

import matplotlib.pyplot as plt

value1 = [82,76,24,40,67,62,75,78,71,32,98,89,78,67,72,82,87,66,56,52]

value2=[62,5,91,25,36,32,96,95,3,90,95,32,27,55,100,15,71,11,37,21]

value3=[23,89,12,78,72,89,25,69,68,86,19,49,15,16,16,75,65,31,25,52]

value4=[59,73,70,16,81,61,88,98,10,87,29,72,16,23,72,88,78,99,75,30]

box_plot_data=[value1,value2,value3,value4]

plt.boxplot(box_plot_data)
plt.show()
RESULT OF CODE :
 Pivot Table
• Pivot tables are one of Excel's most powerful features. A pivot table allows you
to extract the significance from a large, detailed data set.

. Insert a Pivot Table


To insert a pivot table,
execute the following steps.
1. Click any single cell inside
the data set.
2. On the Insert tab, in the
Tables group, click
PivotTable.
3. Click ok.
PIVOT TABLES :

• Pivot Tables: A pivot table is a table of statistics that


summarizes the data of a more extensive table (such as from
a database, spreadsheet, or business intelligence program).
This summary might include sums, averages, or other
statistics, which the pivot table groups together in a
meaningful way.
CODE :-
• import pandas as pd
• data = {'person': ['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E', 'A',
'B', 'C', 'D', 'E'], 'sales': [1000, 300, 400, 500, 800, 1000, 500, 700, 50, 60,
1000, 900, 750, 200, 300, 1000, 900, 250, 750, 50], 'quarter': [1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4], 'country': ['US', 'Japan', 'Brazil', 'UK', 'US',
'Brazil', 'Japan', 'Brazil', 'US', 'US', 'US', 'Japan', 'Brazil', 'UK', 'Brazil', 'Japan',
'Japan', 'Brazil', 'UK', 'US'] }
• df = pd.DataFrame(data)
• pivot = df.pivot_table(index=['person'], values=['sales'], aggfunc='sum’)
• print(pivot)
RESULT OF CODE :-
 HEAT MAP

• A heat map (or heatmap) is a data visualization technique that shows magnitude of
a phenomenon as color in two dimensions. The variation in color may be by hue or
intensity, giving obvious visual cues to the reader about how the phenomenon is clustered
or varies over space.
HEAT MAPS :-
• A heatmap is a two-dimensional graphical
representation of data where the individual
values that are contained in a matrix are
represented as colours
CODE :-
From pandas import DataFrame

import matplotlib.pyplot as plt

data=[{2,3,4,1},{6,3,5,2},{6,3,5,4},{3,7,5,4},{2,8,1,5}]

Index= [‘I1’, ‘I2’,’I3’,’I4’,’I5’]

Cols = [‘C1’, ‘C2’, ‘C3’,’C4’]

df = DataFrame(data, index=Index, columns=Cols)

plt.pcolor(df)
• plt.show()
RESULT OF CODE :-
 CORRELATION STATISTICS

In statistics, correlation or dependence is any statistical relationship, whether causal or not,


between two random variables or bivariate data. Although in the broadest sense, "correlation"
may indicate any type of association, in statistics it normally refers to the degree to which a
pair of variables are linearly related.
CORRELATION :-

• A correlation coefficient is a number between -1 and 1


that tells you the strength and direction of a relationship
between variables.
• Correlation coefficients quantify the association between
variables or features of a dataset. These statistics are of high
importance for science and technology, and Python has great
tools that you can use to calculate them. SciPy, NumPy, and
Pandas correlation methods are fast, comprehensive, and
well-documented
CODE :-

corrmat = data.corr()

ax = plt.subplots(figsize =(9, 8))


sns.heatmap(corrmat, ax = ax, cmap ="YlGnBu",
linewidths = 0.1)
RESULT OF CODE :-
 Random Variable
• A random variable is a variable whose value is unknown or a function that
assigns values to each of an experiment's

• The use of random variables is most common in probability and statistics, where they
are used to quantify outcomes
• Risk analysts use random variables to estimate the probability of an adverse event
occurring.
 Variance
• Variance is a measure of how data points differ from the mean.
According to Layman, a variance is a measure of how far a set
of data (numbers) are spread out from their mean (average)
value.
• Variance means to find the expected difference of deviation
from actual value. Therefore, variance depends on the standard
deviation of the given data set.
• The more the value of variance, the data is more scattered from
its mean and if the value of variance is low or minimum, then it
is less scattered from mean. Therefore, it is called a measure of
spread of data from mean.
 COVARIANCE

• Covariance is a measure of the relationship between two random variables and to what extent,

they change together. Or we can say, in other words, it defines the changes between the two

variables, such that change in one variable is equal to change in another variable. This is the

property of a function of maintaining its form when the variables are linearly transformed.

Covariance is measured in units, which are calculated by multiplying the units of the two

variables.
• Covariance can have both positive and negative values. Based on this, it has two types:
1.positive covariance

2.Negitive covariance
 Correlation Linear
Transformations of Random
Variable

A linear rescaling is a transformation of the form g(u) = a+bu g (u) = a + b u. A


linear rescaling of a random variable does not change the basic shape of its
distribution, just the range of possible values. A linear rescaling transforms the
mean in the same way the individual values are transformed.
 REGRESSION

THANK YOU

You might also like