0% found this document useful (0 votes)
9 views14 pages

Mltheory 2

The document presents an assignment involving data analysis of power consumption across three zones using Python libraries such as NumPy, Pandas, Matplotlib, and Seaborn. It includes the Central Limit Theorem explanation, visualizations of energy consumption distributions for each zone, and a resampling exercise for Zone 3 to observe the distribution of sample means. The analysis aims to explore normality and the effects of sampling on energy consumption data.

Uploaded by

adarshhalse45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

Mltheory 2

The document presents an assignment involving data analysis of power consumption across three zones using Python libraries such as NumPy, Pandas, Matplotlib, and Seaborn. It includes the Central Limit Theorem explanation, visualizations of energy consumption distributions for each zone, and a resampling exercise for Zone 3 to observe the distribution of sample means. The analysis aims to explore normality and the effects of sampling on energy consumption data.

Uploaded by

adarshhalse45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Adarsh_Halse_22104B0020_ML_ASSIGNMENT

_2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data=pd.read_csv('powerconsumption_3.csv')

data

Datetime Temperature Humidity WindSpeed \


0 2017-01-01 00:00:00 6.559 73.8 0.083
1 2017-01-01 00:10:00 6.414 74.5 0.083
2 2017-01-01 00:20:00 6.313 74.5 0.080
3 2017-01-01 00:30:00 6.121 75.0 0.083
4 2017-01-01 00:40:00 5.921 75.7 0.081
... ... ... ... ...
52411 2017-12-30 23:10:00 7.010 72.4 0.080
52412 2017-12-30 23:20:00 6.947 72.6 0.082
52413 2017-12-30 23:30:00 6.900 72.8 0.086
52414 2017-12-30 23:40:00 6.758 73.0 0.080
52415 2017-12-30 23:50:00 6.580 74.1 0.081

GeneralDiffuseFlows DiffuseFlows PowerConsumption_Zone1 \


0 0.051 0.119 34055.69620
1 0.070 0.085 29814.68354
2 0.062 0.100 29128.10127
3 0.091 0.096 28228.86076
4 0.048 0.085 27335.69620
... ... ... ...
52411 0.040 0.096 31160.45627
52412 0.051 0.093 30430.41825
52413 0.084 0.074 29590.87452
52414 0.066 0.089 28958.17490
52415 0.062 0.111 28349.80989

PowerConsumption_Zone2 PowerConsumption_Zone3
0 16128.87538 20240.96386
1 19375.07599 20131.08434
2 19006.68693 19668.43373
3 18361.09422 18899.27711
4 17872.34043 18442.40964
... ... ...
52411 26857.31820 14780.31212
52412 26124.57809 14428.81152
52413 25277.69254 13806.48259
52414 24692.23688 13512.60504
52415 24055.23167 13345.49820

[52416 rows x 9 columns]

# Q.1] State the statement of Central Limit Theorem.

# => The Central Limit Theorem (CLT) states that, regardless of the
original population distribution, the sampling distribution of the
sample mean
# approaches a normal distribution as the sample size increases,
provided the samples are independent and identically distributed. This
holds
# true even if the population itself is not normally distributed.
The theorem is fundamental in probability and statistics, as it
justifies
# the use of normal distribution in inferential statistics,
enabling hypothesis testing and confidence interval estimation.

# Q.2] Plot the distribution of energy consumption for all the zones
and check for normality.
sns.kdeplot(data['PowerConsumption_Zone1'],color='yellow',fill='True')
plt.title("kde distribution")

Text(0.5, 1.0, 'kde distribution')


sns.histplot(data['PowerConsumption_Zone1'],color='red',fill='True')
plt.title("original histplot ")

Text(0.5, 1.0, 'original histplot ')


sns.boxplot(data['PowerConsumption_Zone1'],color='violet',orient='h')
plt.title("original data boxplot")

Text(0.5, 1.0, 'original data boxplot')


sns.kdeplot(data['PowerConsumption_Zone2'],color='blue',fill='True')
plt.title("original data distribution")

Text(0.5, 1.0, 'original data distribution')


sns.histplot(data['PowerConsumption_Zone2'],color='green',fill='True')
plt.title("original histplot ")

Text(0.5, 1.0, 'original histplot ')


sns.boxplot(data['PowerConsumption_Zone2'],color='orange',orient='h')
plt.title("original data boxplot")

Text(0.5, 1.0, 'original data boxplot')


sns.kdeplot(data['PowerConsumption_Zone3'],color='grey',fill='True')
plt.title("original data distribution")

Text(0.5, 1.0, 'original data distribution')


sns.histplot(data['PowerConsumption_Zone3'],color='indigo',fill='True'
)
plt.title("original histplot ")

Text(0.5, 1.0, 'original histplot ')


sns.boxplot(data['PowerConsumption_Zone3'],color='maroon',orient='h')
plt.title("original data boxplot")

Text(0.5, 1.0, 'original data boxplot')


# Q.3] For zone -3 resample the energy consumption data and plot the
distribution of sampling. What distribution do you observe.

sample_size=50
total_sample=10000
sample=np.random.choice(data['PowerConsumption_Zone3'],
(total_sample,sample_size))

sample.shape

(10000, 50)

sample_mean1=np.mean(sample,axis=1)
sample_mean2=np.mean(sample,axis=0)

plt.figure(figsize=(10,6))
plt.subplot(1,2,1)
sns.kdeplot(sample_mean1,color='yellow',fill='True')
plt.title=("mean by first type")
plt.subplot(1,2,2)
sns.kdeplot(sample_mean2,color='blue',fill='True')
plt.title=("mean by second type")
plt.show()
plt.figure(figsize=(10,6))
plt.subplot(1,2,1)
sns.histplot(sample_mean1,color='red',fill='True')
plt.title=("mean by first type")
plt.subplot(1,2,2)
sns.histplot(sample_mean2,color='grey',fill='True')
plt.title=("mean by second type")
plt.show()
plt.subplot(1,2,1)
sns.boxplot(sample_mean1,color='pink',orient='h')
plt.title=("mean by first type")
plt.subplot(1,2,2)
sns.boxplot(sample_mean2,color='yellow',orient='h')
plt.title=("mean by second type")
plt.show()
#median

You might also like