Stats Lab (4-6)
Stats Lab (4-6)
Required Libraries
To perform these calculations, we will utilize the numpy and scipy libraries. If you
haven't installed these libraries yet, you can do so using pip:
Sample Data
Let's assume we have a frequency distribution represented as a list of tuples, where
each tuple contains a value and its corresponding frequency. For example:
data = [(1, 5), (2, 10), (3, 15), (4, 20), (5, 10)]
Mean: The mean is calculated as the sum of all values multiplied by their
frequencies divided by the total frequency.
Median: The median is the middle value when the data is sorted. If the number of
observations is even, it is the average of the two middle values.
Mode: The mode is the value that appears most frequently in the dataset.
Mean Deviation: This is the average of the absolute deviations from the mean.
Quartile Deviation: This is half the difference between the first quartile (Q1) and
the third quartile (Q3).
Implementation
import numpy as np
from scipy import stats
# Central Tendency
mean = np.mean(expanded_data)
median = np.median(expanded_data)
mode = stats.mode(expanded_data)[0][0]
# Measures of Dispersion
variance = np.var(expanded_data)
std_deviation = np.std(expanded_data)
mean_deviation = np.mean(np.abs(expanded_data - mean))
# Quartiles
Q1 = np.percentile(expanded_data, 25)
Q3 = np.percentile(expanded_data, 75)
quartile_deviation = (Q3 - Q1) / 2
Conclusion
This program effectively calculates the central tendency and measures of dispersion
for a frequency distribution. By utilizing Python's powerful libraries, we can easily
perform statistical analysis, making it a valuable tool for data scientists and
analysts. Understanding these measures allows for better insights into the data,
guiding informed decision-making.
# Initialize model
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("K-Fold Metrics:")
print("RMSE:", np.sqrt(mean_squared_error(y_test, predictions)))
print("MAE:", mean_absolute_error(y_test, predictions))
print("R-squared:", r2_score(y_test, predictions))
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("LOOCV Metrics:")
print("RMSE:", np.sqrt(mean_squared_error(y_test, predictions)))
print("MAE:", mean_absolute_error(y_test, predictions))
print("R-squared:", r2_score(y_test, predictions))
# Parameters
mu = 0
sigma = 1
n = 10
p = 0.5
lmbda = 3
# Plotting
plt.figure(figsize=(12, 8))
# Normal Distribution
plt.subplot(2, 2, 1)
plt.plot(x, normal_y, label='Normal Distribution', color='blue')
plt.title('Normal Distribution')
plt.xlabel('X')
plt.ylabel('Probability Density')
plt.grid()
# Binomial Distribution
plt.subplot(2, 2, 2)
plt.bar(k_values, binomial_y, label='Binomial Distribution', color='orange')
plt.title('Binomial Distribution')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.grid()
# Poisson Distribution
plt.subplot(2, 2, 3)
plt.bar(poisson_k_values, poisson_y, label='Poisson Distribution',
color='green')
plt.title('Poisson Distribution')
plt.xlabel('Number of Events')
plt.ylabel('Probability')
plt.grid()
# Bernoulli Distribution
plt.subplot(2, 2, 4)
plt.bar(bernoulli_k_values, bernoulli_y, label='Bernoulli Distribution',
color='red')
plt.title('Bernoulli Distribution')
plt.xlabel('Outcome')
plt.ylabel('Probability')
plt.xticks(bernoulli_k_values)
plt.grid()
plt.tight_layout()
plt.show()
Conclusion
Normal Distribution: The function normal_distribution computes the PDF for a
range of x values.
Binomial Distribution: The function binomial_distribution calculates the
probability for each number of successes.
Poisson Distribution: The function poisson_distribution computes the
probabilities for a range of events.
Bernoulli Distribution: The function bernoulli_distribution calculates the
probabilities for two outcomes (success and failure).