Edaunit IV
Edaunit IV
Descriptive Statistics: Distribution function, Measures of central tendency, Measures of dispersion, Types of kurtosis, Calculating
percentiles, Quartiles, Grouping Datasets, Correlation, Understanding univariate, bivariate, multivariate analysis, Time Series Analysis
Sample Experiments:
Descriptive statistics is a field of statistics that involves summarizing and describing the characteristics of a dataset. This unit
covers essential concepts like distribution functions, central tendency, dispersion, kurtosis, percentiles, and various analysis
techniques, including time series analysis.
1. Distribution Function
o A distribution function (or cumulative distribution function) gives the probability that a random variable takes on
a value less than or equal to a specified value. It’s fundamental for understanding how values are spread across
a dataset.
o Syntax/Example:
The CDF of a normal distribution can be calculated using libraries like scipy.stats.norm.cdf(x,
loc=mean, scale=stddev) in Python.
3. Measures of Dispersion
Standard Deviation: The square root of variance, gives the average distance from the mean.
4. Types of Kurtosis
o Quartiles: Divides data into four equal parts. The difference between the first and third quartiles is called the
Interquartile Range (IQR).
6. Grouping Datasets
o Grouping involves aggregating data based on a certain criterion to summarize it better, like summing or
averaging data by groups.
7. Correlation
o A statistical measure that expresses the extent to which two variables are linearly related.
o Univariate Analysis: Analyzes a single variable. Common tools include frequency distribution, histograms, and
summary statistics.
o Bivariate Analysis: Analyzes two variables, often using scatter plots or correlation coefficients.
o Multivariate Analysis: Analyzes more than two variables simultaneously, commonly used in regression
analysis or principal component analysis (PCA).
o Time series analysis is used to analyze data points collected or recorded at specific time intervals.
a)Uniform Distribution
Uniform distribution is a type of distribution in which every outcome has an equal
chance of occurring.
Sample Code:
import numpy as np
plt.title('Uniform Distribution')
plt.show()
Output Example: A histogram where the probability is uniformly spread across the x-axis.
b) Normal Distribution
Normal distribution follows the classic bell curve.
Sample Code:
data_normal = np.random.normal(loc=0, scale=1, size=1000)
plt.title('Normal Distribution')
plt.show()
Output Example: A bell-shaped curve with the mean around 0 and standard deviation of 1.
c) Gamma Distribution
The Gamma distribution is a two-parameter family of continuous probability distributions.
Sample Code:
plt.title('Gamma Distribution')
plt.show()
d) Exponential Distribution
Exponential distribution is often used to model time between events in a Poisson process.
Sample Code:
plt.title('Exponential Distribution')
plt.show()
Output Example: A distribution where values are skewed toward the lower end.
e) Poisson Distribution
The Poisson distribution is used to describe the number of events occurring in a fixed interval of time or space.
Sample Code:
plt.title('Poisson Distribution')
plt.show()
Output Example: A distribution where the frequency of occurrences is concentrated around a certain value (e.g., 3).
f) Binomial Distribution
The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials.
Sample Code:
plt.title('Binomial Distribution')
plt.show()
Output Example: A distribution where outcomes are centered around the expected number of successes.
Steps:
1. Remove duplicates
Sample Code:
import pandas as pd
'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Los Angeles']}
df = pd.DataFrame(data)
Output Example:
a) Mean
mean = df['Age'].mean()
print(f"Mean: {mean}")
output:
Mean: 26.5
b) Median
median = df['Age'].median()
print(f"Median: {median}")
output:
Median: 26.5
c) Mode
mode = df['Age'].mode()[0]
print(f"Mode: {mode}")
output:-
Mode: 25.0
a) Variance
variance = df['Age'].var()
print(f"Variance: {variance}")
output:
Variance: 7.25
b) Standard Deviation
std_dev = df['Age'].std()
output:-
c) Skewness
skewness = skew(df['Age'])
print(f"Skewness: {skewness}")
output:
Skewness: 0.00
d) Kurtosis
kurt = kurtosis(df['Age'])
print(f"Kurtosis: {kurt}")
output:
Kurtosis: -1.3
a) Percentiles
output:-
print(f"IQR: {IQR}")
# Box Plot
plt.boxplot(df['Age'])
plt.show()
Output Example:
IQR: 3.0
import pandas as pd
data = {'Age': [1, 2, 3, 4, 5], 'Price': [20000, 18000, 15000, 12000, 10000]}
df_automobile = pd.DataFrame(data)
# Scatter plot
plt.show()
b) Multivariate Analysis (e.g., Age, Price, Engine Size)
import pandas as pd
df_automobile = pd.DataFrame(data)
# Pairplot
plt.show()
Steps:
Sample Code:
import pandas as pd
data = {
df_power['Date'] = pd.to_datetime(df_power['Date'])
df_power.set_index('Date', inplace=True)
plt.xlabel('Date')
plt.ylabel('Power')
plt.show()
Sample Experiments
o Calculate mean, median, and mode using Python libraries like numpy or statistics.
o Use scatter plots for bivariate analysis and techniques like regression for multivariate analysis.
o Analyze the time series data for trends, seasonality, and forecasting.
a) Uniform Distribution
Uniform distribution is a type of distribution in which every outcome has an equal chance of occurring.
Sample Code:
import numpy as np
plt.title('Uniform Distribution')
plt.show()
Output Example: A histogram where the probability is uniformly spread across the x-axis.
b) Normal Distribution
Normal distribution follows the classic bell curve.
Sample Code:
plt.title('Normal Distribution')
plt.show()
Output Example: A bell-shaped curve with the mean around 0 and standard deviation of 1.
c) Gamma Distribution
The Gamma distribution is a two-parameter family of continuous probability distributions.
Sample Code:
plt.title('Gamma Distribution')
plt.show()
d) Exponential Distribution
Exponential distribution is often used to model time between events in a Poisson process.
Sample Code:
plt.title('Exponential Distribution')
plt.show()
Output Example: A distribution where values are skewed toward the lower end.
e) Poisson Distribution
The Poisson distribution is used to describe the number of events occurring in a fixed interval of time or space.
Sample Code:
plt.title('Poisson Distribution')
plt.show()
Output Example: A distribution where the frequency of occurrences is concentrated around a certain value (e.g., 3).
f) Binomial Distribution
The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials.
Sample Code:
plt.title('Binomial Distribution')
plt.show()
Output Example: A distribution where outcomes are centered around the expected number of successes.
Steps:
1. Remove duplicates
Sample Code:
import pandas as pd
import pandas as pd
# Create DataFrame
data = {
'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Los Angeles']
df = pd.DataFrame(data)
print(df)
Output Example:
mean = df['Age'].mean()
print(f"Mean: {mean}")
b) Median
python
Copy
median = df['Age'].median()
print(f"Median: {median}")
c) Mode
mode = df['Age'].mode()[0]
print(f"Mode: {mode}")
Output Example:
Mean: 26.5
Median: 26.5
Mode: 25.0
a) Variance
variance = df['Age'].var()
print(f"Variance: {variance}")
b) Standard Deviation
std_dev = df['Age'].std()
c) Skewness
skewness = skew(df['Age'])
print(f"Skewness: {skewness}")
d) Kurtosis
kurt = kurtosis(df['Age'])
print(f"Kurtosis: {kurt}")
Output Example:
Variance: 7.25
Skewness: 0.00
Kurtosis: -1.3
a) Percentiles
print(f"IQR: {IQR}")
# Box Plot
plt.boxplot(df['Age'])
plt.title('Box Plot of Age')
plt.show()
Output Example:
IQR: 3.0
plt.show()
plt.show()
Steps:
Sample Code:
import pandas as pd
data = {
df_power = pd.DataFrame(data)
df_power['Date'] = pd.to_datetime(df_power['Date'])
df_power.set_index('Date', inplace=True)
df_power['Power'].plot(figsize=(10, 5))
plt.show()
Output Example: A time series plot showing power consumption over time.