Python Pandas - Descriptive Statistics



Descriptive statistics are essential tools in data analysis, offering a way to summarize and understand your data. In Python's Pandas library, there are numerous methods available for computing descriptive statistics on Series and DataFrame objects.

These methods provide various aggregations like sum(), mean(), and quantile(), as well as operations like cumsum() and cumprod() that return an object of the same size.

In this tutorial we will discuss about the some of the most commonly used descriptive statistics functions in Pandas, applied to both Series and DataFrame objects. These methods can be classified into different categories based on their functionality, such as Aggregation Functions, Cumulative Functions, and more.

Aggregation Functions

Aggregation functions produce a single value from a series of data, providing a concise summary of your dataset. Here are some key aggregation functions −

Sr.No. Methods & Description
1

mean()

Returns the mean of the values over the requested axis.

2

sum()

Return the sum of the values over the requested axis.

3

median()

Returns the Arithmetic median of values.

4

min()

It return the minimum of the values over the requested axis.

5

max()

Returns the maximum of the values over the requested axis.

6

count()

Returns the number of non-NA/null observations in the given object.

7

quantile()

Returns the value at the given quantile(s).

8

mode()

Returns the mode(s) of each element along the selected axis/Series.

9

var()

Return unbiased variance over requested axis.

10

kurt()

Return unbiased kurtosis over requested axis.

11

skew()

Return unbiased skew over requested axis.

12

sem()

Return unbiased skew over requested axis.

13

corr()

Compute correlation with other objects, excluding missing values.

14

cov()

Computes the covariance between two objects, excluding NA/null values.

15

autocorr()

Computes the lag-N autocorrelation.

Cumulative Functions

Cumulative functions provide running totals or products and maintain the same shape as the input data. These are useful in time series analysis or for understanding trends −

Sr.No. Methods & Description
1

cumsum()

Return cumulative sum over a DataFrame or Series axis.

2

cumprod()

Return cumulative product over a DataFrame or Series axis.

3

cummax()

Return cumulative maximum over a DataFrame or Series axis.

4

cummin()

Return cumulative minimum over a DataFrame or Series axis.

Boolean Functions

Boolean functions return boolean values based on logical operations across the Series −

Sr.No. Methods & Description
1

all()

Returns True if all elements are True, potentially along an axis.

2

any()

Returns True if any element is True, potentially along an axis.

3

between()

Returns True for each element if it is between the left and right bounds.

Transformation Functions

Transformation functions apply a mathematical operation to each element in the Series, returning a transformed Series−

Sr.No. Methods & Description
1

diff()

Computes the difference between elements in the object, over the specified number of periods.

2

pct_change()

Computes the percentage change between the current and a prior element.

3

rank()

Computes the rank of values in the given object.

Index Related Functions

These functions relate to the Series index and provide ways to manipulate and analyze index labels −

Sr.No. Methods & Description
1

idxmax()

Returns the index of the first occurrence of the maximum value.

2

idxmin()

Returns the index of the first occurrence of the minimum value.

3

value_counts()

Returns a Series containing counts of unique values.

4

unique()

Returns an array of unique values in the Series elements.

Statistical Functions

These functions provide various statistical metrics on the Series data −

Sr.No. Methods & Description
1

nunique()

Returns the number of unique values in the given object.

2

std()

Returns the standard deviation of the Series values.

3

abs()

Return a Series/DataFrame with absolute numeric value of each element.

4

clip()

Trims values at input thresholds, returning values outside the bounds to the boundary.

5

round()

Round each value in the given object to the specified number of decimals.

6

prod()

Returns the product of the given object elements.

7

describe()

Generate descriptive statistics of the given object.

Advertisements