0% found this document useful (0 votes)
26 views7 pages

Stats Lab (4-6)

The document outlines programs for calculating central tendency and measures of dispersion using Python, including mean, median, mode, variance, and standard deviation. It also describes cross-validation techniques to measure RMSE, MAE, and R2 error, alongside displaying various statistical distributions such as Normal, Binomial, Poisson, and Bernoulli. The use of libraries like numpy, scipy, and matplotlib is emphasized for performing these statistical analyses.

Uploaded by

Sai Kishan .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

Stats Lab (4-6)

The document outlines programs for calculating central tendency and measures of dispersion using Python, including mean, median, mode, variance, and standard deviation. It also describes cross-validation techniques to measure RMSE, MAE, and R2 error, alongside displaying various statistical distributions such as Normal, Binomial, Poisson, and Bernoulli. The use of libraries like numpy, scipy, and matplotlib is emphasized for performing these statistical analyses.

Uploaded by

Sai Kishan .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

4.

Program to measure central tendency and measures of dispersion:


Mean, median, mode, standard deviation, variance, mean deviation
and quartile deviation for a frequency distribution/data.

In statistics, understanding the central tendency and measures of dispersion is


crucial for analyzing data. Central tendency provides a summary measure that
represents the entire dataset, while measures of dispersion indicate the spread or
variability of the data. Below, we will explore how to compute these statistics using
Python.

Required Libraries
To perform these calculations, we will utilize the numpy and scipy libraries. If you
haven't installed these libraries yet, you can do so using pip:

pip install numpy scipy

Sample Data
Let's assume we have a frequency distribution represented as a list of tuples, where
each tuple contains a value and its corresponding frequency. For example:

data = [(1, 5), (2, 10), (3, 15), (4, 20), (5, 10)]

Mean: The mean is calculated as the sum of all values multiplied by their
frequencies divided by the total frequency.

Median: The median is the middle value when the data is sorted. If the number of
observations is even, it is the average of the two middle values.

Mode: The mode is the value that appears most frequently in the dataset.

Calculating Measures of Dispersion


Variance: Variance measures how far a set of numbers is spread out from their
average value.
Standard Deviation: The standard deviation is the square root of the variance,
providing a measure of the average distance from the mean.

Mean Deviation: This is the average of the absolute deviations from the mean.

Quartile Deviation: This is half the difference between the first quartile (Q1) and
the third quartile (Q3).

Implementation

import numpy as np
from scipy import stats

# Sample frequency distribution


data = [(1, 5), (2, 10), (3, 15), (4, 20), (5, 10)]

# Expanding the data based on frequency


expanded_data = []
for value, frequency in data:
expanded_data.extend([value] * frequency)

# Convert to numpy array for calculations


expanded_data = np.array(expanded_data)

# Central Tendency
mean = np.mean(expanded_data)
median = np.median(expanded_data)
mode = stats.mode(expanded_data)[0][0]
# Measures of Dispersion
variance = np.var(expanded_data)
std_deviation = np.std(expanded_data)
mean_deviation = np.mean(np.abs(expanded_data - mean))

# Quartiles
Q1 = np.percentile(expanded_data, 25)
Q3 = np.percentile(expanded_data, 75)
quartile_deviation = (Q3 - Q1) / 2

# Displaying the results


print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_deviation}")
print(f"Mean Deviation: {mean_deviation}")
print(f"Quartile Deviation: {quartile_deviation}")

Conclusion
This program effectively calculates the central tendency and measures of dispersion
for a frequency distribution. By utilizing Python's powerful libraries, we can easily
perform statistical analysis, making it a valuable tool for data scientists and
analysts. Understanding these measures allows for better insights into the data,
guiding informed decision-making.

5. Program to perform cross validation for a given dataset to measure


Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and R2
Error using validation set, Leave one out cross-validation(LOOCV)
and k-fold cross-validation approaches.
import numpy as np
from sklearn.model_selection import KFold, LeaveOneOut
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate synthetic data


X, y = make_regression(n_samples=100, n_features=1, noise=10)

# Initialize model
model = LinearRegression()

# K-Fold Cross Validation


kf = KFold(n_splits=5)
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]

model.fit(X_train, y_train)
predictions = model.predict(X_test)

print("K-Fold Metrics:")
print("RMSE:", np.sqrt(mean_squared_error(y_test, predictions)))
print("MAE:", mean_absolute_error(y_test, predictions))
print("R-squared:", r2_score(y_test, predictions))

# Leave-One-Out Cross Validation


loo = LeaveOneOut()
for train_index, test_index in loo.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]

model.fit(X_train, y_train)
predictions = model.predict(X_test)

print("LOOCV Metrics:")
print("RMSE:", np.sqrt(mean_squared_error(y_test, predictions)))
print("MAE:", mean_absolute_error(y_test, predictions))
print("R-squared:", r2_score(y_test, predictions))

6. Program to display Normal, Binomial Poisson , Bernoulli distributions


for a given frequency distribution and analyze the results.
import numpy as np
import matplotlib.pyplot as plt

# Function to calculate normal distribution


def normal_distribution(x, mu, sigma):
return (1 / (sigma * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu) / sigma) **
2)

# Function to calculate binomial distribution


def binomial_distribution(n, p, k):
from math import comb
return comb(n, k) * (p ** k) * ((1 - p) ** (n - k))

# Function to calculate Poisson distribution


def poisson_distribution(lmbda, k):
from math import exp, factorial
return (lmbda ** k * exp(-lmbda)) / factorial(k)

# Function to calculate Bernoulli distribution


def bernoulli_distribution(p, k):
return p ** k * (1 - p) ** (1 - k)

# Parameters
mu = 0
sigma = 1
n = 10
p = 0.5
lmbda = 3

# X values for normal distribution


x = np.linspace(-5, 5, 100)
normal_y = normal_distribution(x, mu, sigma)

# X values for binomial distribution


k_values = np.arange(0, n + 1)
binomial_y = [binomial_distribution(n, p, k) for k in k_values]

# X values for Poisson distribution


poisson_k_values = np.arange(0, 15)
poisson_y = [poisson_distribution(lmbda, k) for k in poisson_k_values]

# X values for Bernoulli distribution


bernoulli_k_values = [0, 1]
bernoulli_y = [bernoulli_distribution(p, k) for k in bernoulli_k_values]

# Plotting
plt.figure(figsize=(12, 8))

# Normal Distribution
plt.subplot(2, 2, 1)
plt.plot(x, normal_y, label='Normal Distribution', color='blue')
plt.title('Normal Distribution')
plt.xlabel('X')
plt.ylabel('Probability Density')
plt.grid()

# Binomial Distribution
plt.subplot(2, 2, 2)
plt.bar(k_values, binomial_y, label='Binomial Distribution', color='orange')
plt.title('Binomial Distribution')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.grid()

# Poisson Distribution
plt.subplot(2, 2, 3)
plt.bar(poisson_k_values, poisson_y, label='Poisson Distribution',
color='green')
plt.title('Poisson Distribution')
plt.xlabel('Number of Events')
plt.ylabel('Probability')
plt.grid()
# Bernoulli Distribution
plt.subplot(2, 2, 4)
plt.bar(bernoulli_k_values, bernoulli_y, label='Bernoulli Distribution',
color='red')
plt.title('Bernoulli Distribution')
plt.xlabel('Outcome')
plt.ylabel('Probability')
plt.xticks(bernoulli_k_values)
plt.grid()

plt.tight_layout()
plt.show()

Conclusion
Normal Distribution: The function normal_distribution computes the PDF for a
range of x values.
Binomial Distribution: The function binomial_distribution calculates the
probability for each number of successes.
Poisson Distribution: The function poisson_distribution computes the
probabilities for a range of events.
Bernoulli Distribution: The function bernoulli_distribution calculates the
probabilities for two outcomes (success and failure).

You might also like