0% found this document useful (0 votes)
4 views6 pages

SH Assignment

The document contains a Python script that analyzes two datasets, 'z' and 'y', using various statistical methods including histograms, cumulative frequency distributions, and box plots. It calculates key statistics such as mean, variance, skewness, and correlation coefficient, and also evaluates specific fractions of data based on defined criteria. Additionally, it estimates the area of a site cleaned up based on a critical concentration threshold.

Uploaded by

Account
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

SH Assignment

The document contains a Python script that analyzes two datasets, 'z' and 'y', using various statistical methods including histograms, cumulative frequency distributions, and box plots. It calculates key statistics such as mean, variance, skewness, and correlation coefficient, and also evaluates specific fractions of data based on defined criteria. Additionally, it estimates the area of a site cleaned up based on a critical concentration threshold.

Uploaded by

Account
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

SH Assignment

January 20, 2025

[11]: import numpy as np


import matplotlib.pyplot as plt
from scipy.stats import skew
import pandas as pd

data = {
'n': np.arange(1, 21),
'z': [1.7, 6.26, 7.56, 7.92, 0.96, 2.47, 2.55, 0.28, 1.34, 0.71, 1.66, 2.
↪99, 8.71, 0.09, 0.62, 0.99, 10.27, 2.96, 5.54, 3.61],

'y': [1.3, 17.02, 19.74, 12.01, 0.66, 1.8, 15.91, 0.62, 2.15, 2.07, 4.68, 2.
↪74, 11.72, 0.24, 2.3, 0.52, 5.67, 3.17, 5.92, 5.03]

z = np.array(data['z'])
y = np.array(data['y'])

[2]: # Question 1
plt.hist(z, bins=np.arange(0, 15, 5), edgecolor='black', alpha=0.7)
plt.title('Histogram of z')
plt.xlabel('Value Ranges')
plt.ylabel('Frequency')
plt.grid(axis='y')
plt.show()
fraction = np.sum((z >= 5) & (z < 10)) / len(z)
print(f"Fraction of data with z-values between 5 and 10: {fraction:.2f}")

1
Fraction of data with z-values between 5 and 10: 0.25

[12]: # Question 2
z_sorted = np.sort(z)
y_sorted = np.sort(y)
z_cumulative = np.cumsum(np.ones_like(z_sorted)) / len(z_sorted)
y_cumulative = np.cumsum(np.ones_like(y_sorted)) / len(y_sorted)

print("Cumulative Frequency Distribution of z:")


print(pd.DataFrame({'z': z_sorted, 'Cumulative Frequency': z_cumulative}))

print("\nCumulative Frequency Distribution of y:")


print(pd.DataFrame({'y': y_sorted, 'Cumulative Frequency': y_cumulative}))

Cumulative Frequency Distribution of z:


z Cumulative Frequency
0 0.09 0.05
1 0.28 0.10
2 0.62 0.15
3 0.71 0.20
4 0.96 0.25

2
5 0.99 0.30
6 1.34 0.35
7 1.66 0.40
8 1.70 0.45
9 2.47 0.50
10 2.55 0.55
11 2.96 0.60
12 2.99 0.65
13 3.61 0.70
14 5.54 0.75
15 6.26 0.80
16 7.56 0.85
17 7.92 0.90
18 8.71 0.95
19 10.27 1.00

Cumulative Frequency Distribution of y:


y Cumulative Frequency
0 0.24 0.05
1 0.52 0.10
2 0.62 0.15
3 0.66 0.20
4 1.30 0.25
5 1.80 0.30
6 2.07 0.35
7 2.15 0.40
8 2.30 0.45
9 2.74 0.50
10 3.17 0.55
11 4.68 0.60
12 5.03 0.65
13 5.67 0.70
14 5.92 0.75
15 11.72 0.80
16 12.01 0.85
17 15.91 0.90
18 17.02 0.95
19 19.74 1.00

[26]: # Question 3
def calculate_statistics(data):
mean = np.mean(data)
variance = np.var(data, ddof=1)
skewness = skew(data)
quantiles = np.quantile(data, [0.25, 0.5, 0.75])
iqr = quantiles[2] - quantiles[0]
return mean, variance, skewness, quantiles, quantiles[1], iqr

3
z_stats = calculate_statistics(z)
y_stats = calculate_statistics(y)

print("\nStatistics for z:")


print(f"Mean: {z_stats[0]:.2f}, Variance: {z_stats[1]:.2f}, Skewness:␣
↪{z_stats[2]:.2f}")

print(f"Quantiles: {z_stats[3]}, Median: {z_stats[4]}, Interquantile Range:␣


↪{z_stats[5]:.2f}")

print("\nStatistics for y:")


print(f"Mean: {y_stats[0]:.2f}, Variance: {y_stats[1]:.2f}, Skewness:␣
↪{y_stats[2]:.2f}")

print(f"Quantiles: {y_stats[3]}, Median: {y_stats[4]}, Interquantile Range:␣


↪{y_stats[5]:.2f}")

Statistics for z:
Mean: 3.46, Variance: 9.76, Skewness: 0.85
Quantiles: [0.9825 2.51 5.72 ], Median: 2.51, Interquantile Range: 4.74

Statistics for y:
Mean: 5.76, Variance: 36.94, Skewness: 1.14
Quantiles: [1.675 2.955 7.37 ], Median: 2.955, Interquantile Range: 5.70

[15]: # Question 4
plt.boxplot([z, y], labels=['z', 'y'], showmeans=True)
plt.title('Box-and-Whisker Plot of z and y')
plt.ylabel('Values')
plt.grid(axis='y')
plt.show()

C:\Users\satya\AppData\Local\Temp\ipykernel_9812\2455904157.py:2:
MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been
renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be
dropped in 3.11.
plt.boxplot([z, y], labels=['z', 'y'], showmeans=True)

4
[22]: # Question 5
z_mean = np.mean(z)
y_mean = np.mean(y)

covariance = np.sum((z - z_mean) * (y - y_mean)) / (len(z) - 1)


std_z = np.std(z,ddof=1)
std_y = np.std(y,ddof=1)
correlation_coefficient = covariance / (std_z * std_y)

print(f"Correlation coefficient between z and y: {correlation_coefficient:.2f}")

Correlation coefficient between z and y: 0.67

[24]: # Question 6
critical_concentration = 5
site_area = 8000
fraction_below_critical = np.sum(z < critical_concentration) / len(z)
cleanup_area = fraction_below_critical * site_area
print(f"Approximate area of the site cleaned up: {cleanup_area:.2f} m²")

Approximate area of the site cleaned up: 5600.00 m²

5
[27]: # Question 7
fraction = np.sum((z < 5) & (y < 10)) / len(z)
print(f"Fraction of data with z < 5 and y < 10: {fraction:.2f}")

Fraction of data with z < 5 and y < 10: 0.65

[10]: # Question 8
fraction_z_less_5_or_y_less_10 = np.sum((z < 5) | (y < 10)) / len(z)
print(f"Fraction of data with z < 5 or y < 10: {fraction_z_less_5_or_y_less_10:.
↪2f}")

Fraction of data with z < 5 or y < 10: 0.80

You might also like