0% found this document useful (0 votes)

21 views19 pages

Load Prediction With 20 Models

Python model

Uploaded by

dharam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views19 pages

Load Prediction With 20 Models

Python model

Uploaded by

dharam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Steamlit webapp for Steel plant load

Perdition

Inspiration

Which times of the year is the most energy consumed?

What patterns can we identify in energy usage?

About Dataset
Date Continuous-time data taken on the first of the month

Usage_kWh Industry Energy Consumption Continuous kWh

Lagging Current reactive power Continuous kVarh

Leading Current reactive power Continuous kVarh

CO2 Continuous ppm

NSM Number of Seconds from midnight Continuous S

Week status Categorical (Weekend (0) or a Weekday(1))

Day of week Categorical Sunday, Monday : Saturday

Load Type Categorical Light Load, Medium Load, Maximum Load

from IPython.display import YouTubeVideo

YouTubeVideo('3kXF5AMzcn0', width=950, height=400)

Streamlit website Link for Steel plant load Prediction

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from lightgbm import LGBMClassifier
from sklearn.linear_model import RidgeClassifierCV
from xgboost import XGBClassifier
from sklearn.neighbors import NearestCentroid
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.calibration import CalibratedClassifierCV
from sklearn.naive_bayes import BernoulliNB
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
import xgboost as Xgb
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import AdaBoostClassifier
import lightgbm as lgb
from sklearn.svm import NuSVC
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.linear_model import RidgeClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.linear_model import PassiveAggressiveClassifier

from sklearn.metrics import accuracy_score, classification_report,f1_score,confusion_matrix,precision_score,recall_sc

from sklearn.metrics import confusion_matrix, plot_confusion_matrix, plot_roc_curve, plot_precision_recall_curve

import warnings
warnings.filterwarnings("ignore")

/opt/conda/lib/python3.7/site-packages/sklearn/experimental/enable_hist_gradient_boosting.py:17: UserWarning: S
ince version 1.0, it is not needed to import enable_hist_gradient_boosting anymore. HistGradientBoostingClassif
ier and HistGradientBoostingRegressor are now stable and can be normally imported from sklearn.ensemble.
"Since version 1.0, "

# Color Palettes
colors = ["#bfd3e6", "#9b5b4f", "#4e4151", "#dbba78", "#bb9c55", "#909195","#dc1e1e","#a02933","#716807","#717cb4"
sns.palplot(sns.color_palette(colors))

#Default theme
sns.set_theme(palette='tab10',
font='Comic Sans MS',
font_scale=1.5,
rc=None)

import matplotlib
matplotlib.rcParams.update({'font.size': 15})
plt.style.use('dark_background')
plt.rcParams["axes.grid"] = False

df = pd.read_csv("/kaggle/input/steel-industry-energy-consumption/Steel_industry_data.csv")
df.head().style.background_gradient(cmap='copper').set_precision(2)

date Usage_kWh Lagging_Current_Reactive.Power_kVarh Leading_Current_Reactive_Power_kVarh CO2(tCO2) Lagging_Current_Power_Facto

01/01/2018
0 3.17 2.95 0.00 0.00 73.21
00:15

01/01/2018
1 4.00 4.46 0.00 0.00 66.77
00:30

01/01/2018
2 3.24 3.28 0.00 0.00 70.28
00:45

01/01/2018
3 3.31 3.56 0.00 0.00 68.09
01:00

01/01/2018
4 3.82 4.50 0.00 0.00 64.72
01:15

df.describe().T.round(2).sort_values(by='std' , ascending = False)\

.style.background_gradient(cmap='GnBu')\
.bar(subset=["max"], color='#BB0000')\
.bar(subset=["min",], color='green')\
.bar(subset=["mean",], color='Orange')\
.bar(subset=['std'], color='#716807')\
.bar(subset=['50%'], color='#717cb4')

count mean std min 25% 50% 75%

NSM 35040.000000 42750.000000 24940.530000 0.000000 21375.000000 42750.000000 64125.000000 85500.00000

Usage_kWh 35040.000000 27.390000 33.440000 0.000000 3.200000 4.570000 51.240000 157.18000

Leading_Current_Power_Factor 35040.000000 84.370000 30.460000 0.000000 99.700000 100.000000 100.000000 100.00000

Lagging_Current_Power_Factor 35040.000000 80.580000 18.920000 0.000000 63.320000 87.960000 99.020000 100.00000

Lagging_Current_Reactive.Power_kVarh 35040.000000 13.040000 16.310000 0.000000 2.300000 5.000000 22.640000 96.91000

Leading_Current_Reactive_Power_kVarh 35040.000000 3.870000 7.420000 0.000000 0.000000 0.000000 2.090000 27.76000

CO2(tCO2) 35040.000000 0.010000 0.020000 0.000000 0.000000 0.000000 0.020000 0.07000

corr = df.corr()
df.corr().style.background_gradient(cmap='copper').set_precision(2)

Usage_kWh Lagging_Current_Reactive.Power_kVarh Leading_Current_Reactive_Power_kVarh CO2(tCO2) Lag

Usage_kWh 1.00 0.90 -0.32 0.99

Lagging_Current_Reactive.Power_kVarh 0.90 1.00 -0.41 0.89

Leading_Current_Reactive_Power_kVarh -0.32 -0.41 1.00 -0.33

CO2(tCO2) 0.99 0.89 -0.33 1.00

Lagging_Current_Power_Factor 0.39 0.14 0.53 0.38

Leading_Current_Power_Factor 0.35 0.41 -0.94 0.36

NSM 0.23 0.08 0.37 0.23

df1 = df.copy()
#Correlation with Response Variable class
X = df1.drop(['Usage_kWh'],axis=1)
y = df1['Usage_kWh']

X.corrwith(y).plot.bar(
figsize = (16, 5), title = "Correlation with Steel plant Load Distribution", fontsize = 15,
rot = 45, grid = False)
plt.show()

class color:
BOLD = '\033[1m'
END = '\033[0m'
print(f"\033[94m\033[1m")
print(color.BOLD + 'Missing values - Percentage: \n' + color.END)
print(f"\033[91m\033[1m")
print(round(df.isnull().mean() * 100, 2))
Missing values - Percentage:

date 0.0
Usage_kWh 0.0
Lagging_Current_Reactive.Power_kVarh 0.0
Leading_Current_Reactive_Power_kVarh 0.0
CO2(tCO2) 0.0
Lagging_Current_Power_Factor 0.0
Leading_Current_Power_Factor 0.0
NSM 0.0
WeekStatus 0.0
Day_of_week 0.0
Load_Type 0.0
dtype: float64

cat = df.select_dtypes(include='object').columns.tolist()

for col in df[cat]:

print(f"\033[94m\033[1m")
print(col,"\n")
print(f"\033[91m\033[1m")
print(df[col].value_counts())
print(f"\033[92m\033[1m")
print("======="*5)

date

01/01/2018 00:15 1
01/09/2018 08:45 1
01/09/2018 07:15 1
01/09/2018 07:30 1
01/09/2018 07:45 1
..
02/05/2018 14:45 1
02/05/2018 14:30 1
02/05/2018 14:15 1
02/05/2018 14:00 1
31/12/2018 00:00 1
Name: date, Length: 35040, dtype: int64

===================================

WeekStatus

Weekday 25056
Weekend 9984
Name: WeekStatus, dtype: int64

===================================

Day_of_week

Monday 5088
Tuesday 4992
Wednesday 4992
Thursday 4992
Friday 4992
Saturday 4992
Sunday 4992
Name: Day_of_week, dtype: int64

===================================

Load_Type

Light_Load 18072
Medium_Load 9696
Maximum_Load 7272
Name: Load_Type, dtype: int64

===================================

#Rename some columns

df = df.rename(columns={'Lagging_Current_Reactive.Power_kVarh': 'Lagging_Reactive_Power_kVarh',
'Leading_Current_Reactive_Power_kVarh': 'Leading_Reactive_Power_kVarh',
'Lagging_Current_Power_Factor': 'Lagging_Power_Factor',
'Leading_Current_Power_Factor': 'Leading_Power_Factor',
'CO2(tCO2)':'CO2'})
df.head()
date Usage_kWh Lagging_Reactive_Power_kVarh Leading_Reactive_Power_kVarh CO2 Lagging_Power_Factor Leading_Power_Factor NSM

01/01/2018
0 3.17 2.95 0.0 0.0 73.21 100.0
00:15

01/01/2018
1 4.00 4.46 0.0 0.0 66.77 100.0 1800
00:30

01/01/2018
2 3.24 3.28 0.0 0.0 70.28 100.0 2700
00:45

01/01/2018
3 3.31 3.56 0.0 0.0 68.09 100.0 3600
01:00

01/01/2018
4 3.82 4.50 0.0 0.0 64.72 100.0 4500
01:15

sns.displot(data=df, x="Lagging_Reactive_Power_kVarh", kde=True, bins = 100,color = "red", facecolor = "#3F7F7F"

plt.figure(figsize=(18,4))
color = plt.cm.copper(np.linspace(0, 1, 10))
df.groupby(['WeekStatus','Day_of_week'])['Usage_kWh'].count().plot(kind='bar', width=.4,color=color);
plt.xticks(rotation=45);

fig, (ax1,ax2) =plt.subplots(1,2, figsize=(18,4))

fig, (ax3,ax4) =plt.subplots(1,2, figsize=(18,4))
fig, (ax5,ax6) =plt.subplots(1,2, figsize=(18,4))
fig, (ax7,ax8) =plt.subplots(1,2, figsize=(18,4))

ax1.scatter(data=df,x="Usage_kWh", y="Lagging_Reactive_Power_kVarh", color=colors[1])

ax1.set_title("Usage kWh vs Lagging Reactive Power kVarh",pad=20)
ax1.set_xlabel("Usage (kWh)")
ax1.set_ylabel("Lagging Reactive Power (kVarh)")

ax2.scatter(data=df,x="Usage_kWh",y="Leading_Reactive_Power_kVarh", color=colors[8])
ax2.set_title("Usage(kWh) vs Leading Reactive Power(kVarh)",pad=20)
ax2.set_xlabel("Usage(kWh)")
ax2.set_ylabel("Leading Reactive Power (kVarh)")

ax3.scatter(data=df,x="Usage_kWh", y="Lagging_Power_Factor", color=colors[3])

ax3.set_title("Usage kWh vs Lagging Power Factor",pad=20)
ax3.set_xlabel("Usage (kWh)")
ax3.set_ylabel("Lagging Power Factor")

ax4.scatter(data=df,x="Usage_kWh",y="Leading_Power_Factor", color=colors[9])
ax4.set_title("Usage(kWh) vs Leading Power Factor",pad=20)
ax4.set_xlabel("Usage(kWh)")
ax4.set_ylabel("Leading Power Factor")
ax5.scatter(data=df,x="Lagging_Reactive_Power_kVarh",y="Leading_Reactive_Power_kVarh", color=colors[2])
ax5.set_title("Lagging Reactive Power (kVarh) vs Leading Reactive Power(kVarh)",pad=20,fontsize=15)
ax5.set_xlabel("Lagging Reactive Power (kVarh)")
ax5.set_ylabel("Leading Reactive Power(kVarh)")

ax6.scatter(data=df,x="Lagging_Power_Factor",y="Leading_Power_Factor", color=colors[4])
ax6.set_title("Lagging Power Factor vs Leading Power Factor",pad=20,fontsize=15)
ax6.set_xlabel("Lagging Power Factor")
ax6.set_ylabel("Leading Power Factor")

ax7.scatter(data=df,x="Lagging_Reactive_Power_kVarh",y="Lagging_Power_Factor", color=colors[5])
ax7.set_title("Lagging Reactive Power (kVarh) vs Leading Power Factor",pad=20,fontsize=15)
ax7.set_xlabel("Lagging Reactive Power (kVarh)")
ax7.set_ylabel("Leading Power Factor")

ax8.scatter(data=df,x="Lagging_Reactive_Power_kVarh",y="Leading_Power_Factor", color=colors[4])
ax8.set_title("Lagging Reactive Power (kVarh) vs Leading Power Factor",pad=20,fontsize=15)
ax8.set_xlabel("Lagging Reactive Power (kVarh)")
ax8.set_ylabel("Leading Power Factor")

plt.show()

sns.relplot(data=df, x="Usage_kWh", y="Lagging_Reactive_Power_kVarh", hue="WeekStatus",col="WeekStatus",palette=

sns.relplot(data=df, x="Usage_kWh", y="Leading_Reactive_Power_kVarh", hue="WeekStatus",col="WeekStatus",palette=
sns.relplot(data=df, x="Usage_kWh", y="Lagging_Power_Factor", hue="WeekStatus",col="WeekStatus",palette='tab10');
sns.relplot(data=df, x="Usage_kWh", y="Leading_Power_Factor", hue="WeekStatus",col="WeekStatus",palette='tab10');
sns.relplot(data=df, x="Usage_kWh", y="Lagging_Reactive_Power_kVarh", hue="Load_Type",col="Load_Type",palette='tab10'
sns.relplot(data=df, x="Usage_kWh", y="Leading_Reactive_Power_kVarh", hue="Load_Type",col="Load_Type",palette='tab10'
sns.relplot(data=df, x="Usage_kWh", y="Lagging_Power_Factor", hue="Load_Type",col="Load_Type",palette='tab10');
sns.relplot(data=df, x="Usage_kWh", y="Leading_Power_Factor", hue="Load_Type",col="Load_Type",palette='tab10');
ax = plt.figure(figsize=(18,6))
ax = plt.subplot(1,2,1)
ax = sns.countplot(x='Load_Type', data=df)
ax.bar_label(ax.containers[0])
plt.title("Load_Type", fontsize=20,color='#dbba78',font='Times New Roman',pad=30)
ax =plt.subplot(1,2,2)
ax=df['Load_Type'].value_counts().plot.pie(explode=[0.1, 0.1,0.1],autopct='%1.2f%%',shadow=True);
ax.set_title(label = "Load_Type", fontsize = 20,color='#dbba78',font='Times New Roman',pad=30);

from statsmodels.graphics.gofplots import qqplot

var = df['Lagging_Reactive_Power_kVarh']
color = colors[4]
fig = plt.figure(figsize = (18, 12))

# --- Title ---

plt.title("Lagging Reactive Power kVarh Distribution",fontsize=20,font='Comic Sans MS',pad=40,color = colors[1])

# --- Histogram ---

ax_1=fig.add_subplot(2, 2, 2)
plt.title('Histogram Plot', fontweight = 'bold', fontsize = 14, fontfamily = 'Comic Sans MS', color = colors[1])
sns.histplot(data = df, x = var, kde = True, color = color)
plt.xlabel('Total', fontweight = 'regular', fontsize = 11, fontfamily = 'Comic Sans MS', color = colors[1])
plt.ylabel('Lagging_Reactive_Power_kVarh', fontweight = 'regular', fontsize = 11, fontfamily = 'sans-serif', color
plt.grid(axis = 'x', alpha = 0)
plt.grid(axis = 'y', alpha = 0.2)

# --- Q-Q Plot ---

ax_2 = fig.add_subplot(2, 2, 4)
plt.title('Q-Q Plot', fontweight = 'bold', fontsize = 14, fontfamily = 'Comic Sans MS', color = colors[1])
qqplot(var, fit = True, line = '45', ax = ax_2, markerfacecolor = color, markeredgecolor = color, alpha = 0.6)
plt.xlabel('Theoritical Quantiles', fontweight = 'regular', fontsize = 11, fontfamily = 'Comic Sans MS',
color = colors[3])
plt.ylabel('Sample Quantiles', fontweight = 'regular', fontsize = 11, fontfamily = 'Comic Sans MS', color = colors

# --- Boxen Plot ---

ax_3 = fig.add_subplot(1, 2, 1)
plt.title('Boxen Plot', fontweight = 'bold', fontsize = 14, fontfamily = 'Comic Sans MS', color = colors[3])
sns.boxenplot(y = var, data = df, color = color, linewidth = 1.5)
plt.ylabel('Lagging_Reactive_Power_kVarh', fontweight = 'regular', fontsize = 11, fontfamily = 'Comic Sans MS',

plt.show();

plt.figure(figsize=(18,7))
sns.scatterplot(data=df, x="Usage_kWh", y="Lagging_Reactive_Power_kVarh", hue="Load_Type",palette="tab10");

plt.figure(figsize=(18,10))

plt.subplot(2,2,1)
sns.barplot(x = 'Load_Type', y = 'Usage_kWh', palette= "tab10",data=df)
plt.title("Load Type", color = "#bfd3e6")
plt.xlabel("Load_Type")
plt.ylabel("Usage_kWh")

plt.subplot(2,2,2)
df["WeekStatus"].value_counts().plot.pie(autopct='%1.2f%%', explode=[0.1, 0.1], colors=['blue','#dbba78'])
p = plt.gcf()
plt.title("Weekday/Weekend")
plt.legend()
plt.subplot(2,2,(3,4))

sns.countplot(x = 'WeekStatus', hue = 'Load_Type', data = df, palette="twilight")

plt.title("Usage kWh by Load Type", color = "Lightpink")
plt.xlabel("Load_Type")
plt.tight_layout()
plt.show()

col = ['Usage_kWh']
fig, ax = plt.subplots(2, 1, sharex=True, figsize=(17,8),gridspec_kw={"height_ratios": (.2, .8)})
ax[0].set_title('Usage_kWh distribution',fontsize=18,pad=20)
sns.boxplot(x='Usage_kWh', data=df, ax=ax[0])
ax[0].set(yticks=[])
sns.histplot(x='Usage_kWh', data=df, ax=ax[1])
ax[1].set_xlabel(col, fontsize=16)
plt.axvline(df['Usage_kWh'].mean(), color='darkgreen', linewidth=2.2, label='mean=' + str(np.round(df['Usage_kWh'
plt.axvline(df['Usage_kWh'].median(), color='red', linewidth=2.2, label='median='+ str(np.round(df['Usage_kWh'].
plt.axvline(df['Usage_kWh'].mode()[0], color='purple', linewidth=2.2, label='mode='+ str(df['Usage_kWh'].mode()[
plt.legend(bbox_to_anchor=(1, 1.03), ncol=1, fontsize=17, fancybox=True, shadow=True, frameon=True)
plt.tight_layout()
plt.show()

#Disaster count(Whether the disaster happened or not )

#Disaster count(Whether the disaster happened or not )
plt.figure(figsize = (18, 8))
ax = plt.axes()
ax.set_facecolor('black')
ax = sns.countplot(x = 'Load_Type', data = df, palette = 'tab10', edgecolor = 'white', linewidth = 1.2)
plt.title('Load Types', fontsize = 25)
plt.xlabel('Load Type', fontsize = 20)
plt.ylabel('Count', fontsize = 20)
ax.xaxis.set_tick_params(labelsize = 15)
ax.yaxis.set_tick_params(labelsize = 15)
bbox_args = dict(boxstyle = 'round', fc = '0.9')
for p in ax.patches:
ax.annotate('{:.0f} = {:.2f}%'.format(p.get_height(), (p.get_height() / len(df['Load_Type'])) * 100), (p
color = 'black',
bbox = bbox_args,
fontsize = 13)
plt.show()

col_names = ['Lagging_Reactive_Power_kVarh','Leading_Reactive_Power_kVarh','Lagging_Power_Factor','Leading_Power_Fact
fig, axs = plt.subplots(nrows=2,ncols=3,figsize=(20,10))
for i in range(0, len(col_names)):
rows = i // 3
cols = i % 3
ax = axs[rows,cols]
plot = sns.regplot(x = col_names[i], y = 'Usage_kWh', data = df, ax = ax )

var = ['Usage_kWh', 'Lagging_Reactive_Power_kVarh','Leading_Reactive_Power_kVarh','Lagging_Power_Factor','Leading_Pow

from scipy.stats import skew

for col in df[var]:

print(f"\033[91m\033[1m")
print("Skewness:",col,"=",round(skew(df[col]),3))
print("Kurtosis:",col, "=",round(df[col].kurt(),2))
print("Mean:",col, "=",round(df[col].mean(),2))
print("Max:",col, "=",round(df[col].max(),2))
print("Min:",col, "=",round(df[col].min(),2))
print("Median:",col, "=",round(df[col].median(),2))
print("Std:",col, "=",round(df[col].std(),2))
print("Var:",col, "=",round(df[col].var(),2))
print("Mode:",col, "=",round(df[col].mode(),2))
plt.figure(figsize=(18,6))
sns.distplot(df[col],kde=True,bins=50,color="Yellow",hist_kws={"edgecolor": (1,1,0,1)})
plt.title(col,fontweight="bold")
plt.show()
print(f"\033[93m\033[1m")
print("====="*25)

Skewness: Usage_kWh = 1.197

Kurtosis: Usage_kWh = 0.39
Mean: Usage_kWh = 27.39
Max: Usage_kWh = 157.18
Min: Usage_kWh = 0.0
Median: Usage_kWh = 4.57
Std: Usage_kWh = 33.44
Var: Usage_kWh = 1118.53
Mode: Usage_kWh = 0 3.06
dtype: float64

===============================================================================================================
==============

Skewness: Lagging_Reactive_Power_kVarh = 1.438

Kurtosis: Lagging_Reactive_Power_kVarh = 1.21
Mean: Lagging_Reactive_Power_kVarh = 13.04
Max: Lagging_Reactive_Power_kVarh = 96.91
Min: Lagging_Reactive_Power_kVarh = 0.0
Median: Lagging_Reactive_Power_kVarh = 5.0
Std: Lagging_Reactive_Power_kVarh = 16.31
Var: Lagging_Reactive_Power_kVarh = 265.89
Mode: Lagging_Reactive_Power_kVarh = 0 0.0
dtype: float64
===============================================================================================================
==============

Skewness: Leading_Reactive_Power_kVarh = 1.734

Kurtosis: Leading_Reactive_Power_kVarh = 1.58
Mean: Leading_Reactive_Power_kVarh = 3.87
Max: Leading_Reactive_Power_kVarh = 27.76

Min: Leading_Reactive_Power_kVarh = 0.0

Median: Leading_Reactive_Power_kVarh = 0.0
Std: Leading_Reactive_Power_kVarh = 7.42
Var: Leading_Reactive_Power_kVarh = 55.12
Mode: Leading_Reactive_Power_kVarh = 0 0.0
dtype: float64

===============================================================================================================
==============

Skewness: Lagging_Power_Factor = -0.606

Kurtosis: Lagging_Power_Factor = -1.1
Mean: Lagging_Power_Factor = 80.58
Max: Lagging_Power_Factor = 100.0
Min: Lagging_Power_Factor = 0.0
Median: Lagging_Power_Factor = 87.96
Std: Lagging_Power_Factor = 18.92
Var: Lagging_Power_Factor = 358.02
Mode: Lagging_Power_Factor = 0 100.0
dtype: float64

===============================================================================================================
==============

Skewness: Leading_Power_Factor = -1.512

Kurtosis: Leading_Power_Factor = 0.38
Mean: Leading_Power_Factor = 84.37
Max: Leading_Power_Factor = 100.0
Min: Leading_Power_Factor = 0.0
Median: Leading_Power_Factor = 100.0
Std: Leading_Power_Factor = 30.46
Var: Leading_Power_Factor = 927.6
Mode: Leading_Power_Factor = 0 100.0
dtype: float64
===============================================================================================================
==============

from sklearn.preprocessing import FunctionTransformer

from sklearn.compose import ColumnTransformer

old_skew = df.skew().sort_values(ascending=False)
old_skew

def logTrans(feature): # function to apply transformer and check the distribution with histogram and kdeplot

logTr = ColumnTransformer(transformers=[("lg", FunctionTransformer(np.log1p), [feature])])

plt.figure(figsize=(15,6))
plt.subplot(1,2,1)
plt.title("Distribution before Transformation", fontsize=20,color='red')
sns.histplot(df[feature], kde=True, color="red")
plt.xlabel(feature,color='Red')

plt.subplot(1,2,2)
df_log = pd.DataFrame(logTr.fit_transform(df))
plt.title("Distribution after Transformation", fontsize=20,color='Blue')
sns.histplot(df_log,bins=20, kde=True , legend=False)
plt.xlabel(feature,color='Blue')
plt.show()

print(f"Skewness was {round(old_skew[feature],2)} before & is {round(df_log.skew()[0],2)} after Log transformatio

logTrans(feature="Lagging_Reactive_Power_kVarh")

Skewness was 1.44 before & is -0.02 after Log transformation.

plt.figure(figsize=(18,4))
sns.kdeplot(data=df,x="Usage_kWh",hue='Load_Type',multiple="stack");
# Encode Categorical Columns
from sklearn.preprocessing import LabelEncoder
categ = df.select_dtypes(include = "object").columns

le = LabelEncoder()
df[categ] = df[categ].apply(le.fit_transform)

df.head()

date Usage_kWh Lagging_Reactive_Power_kVarh Leading_Reactive_Power_kVarh CO2 Lagging_Power_Factor Leading_Power_Factor NSM

0 1 3.17 2.95 0.0 0.0 73.21 100.0 900

1 2 4.00 4.46 0.0 0.0 66.77 100.0 1800

2 3 3.24 3.28 0.0 0.0 70.28 100.0 2700

3 4 3.31 3.56 0.0 0.0 68.09 100.0 3600

4 5 3.82 4.50 0.0 0.0 64.72 100.0 4500

# Split the dataset and prepare some lists to store the models
from sklearn.model_selection import train_test_split
X = df.drop(['Load_Type'], axis=1)
y = df.Load_Type

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 42)

models = []

names = [
"LGBMClassifier",
"RidgeClassifierCV",
"XGBClassifier",
"QuadraticDiscriminantAnalysis",
"CalibratedClassifierCV",
"BernoulliNB",
"BaggingClassifier",
"LogisticRegression",
"NearestCentroid",
"SVC",
"LinearSVC",
"KNeighborsClassifier",
"GaussianNB",
"Perceptron",
"SGDClassifier",
"DecisionTreeClassifier",
"RandomForestClassifier",
"MLPClassifier",
"ExtraTreesClassifier",
"AdaBoostClassifier",
"NuSVC"
]

scores = []

clf =[
LGBMClassifier(),
RidgeClassifierCV(),
XGBClassifier(),
QuadraticDiscriminantAnalysis(),
CalibratedClassifierCV(),
BernoulliNB(),
BaggingClassifier(),
LogisticRegression(),
NearestCentroid(),
SVC(),
LinearSVC(),
KNeighborsClassifier(),
GaussianNB(),
Perceptron(),
SGDClassifier(),
DecisionTreeClassifier(),
RandomForestClassifier(),
MLPClassifier(),
ExtraTreesClassifier(),
AdaBoostClassifier(),
NuSVC()

%%time
for model in clf:
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
scores.append(score)

final_scores = pd.DataFrame(zip(names,scores), columns=['Classifier', 'Accuracy'])

CPU times: user 2min 57s, sys: 2.79 s, total: 3min

Wall time: 2min 53s

final_scores.sort_values(by='Accuracy',ascending=False).style.background_gradient(cmap="copper").set_properties(
'font-family': 'Comic Sans MS',
'color': 'Brown',
'font-size': '15px',"color": "Brown"
})

Classifier Accuracy

2 XGBClassifier 0.965068

0 LGBMClassifier 0.956963

6 BaggingClassifier 0.952740

15 DecisionTreeClassifier 0.936872

16 RandomForestClassifier 0.935274

18 ExtraTreesClassifier 0.921918

19 AdaBoostClassifier 0.855594

11 KNeighborsClassifier 0.796119

1 RidgeClassifierCV 0.744064

9 SVC 0.736986

20 NuSVC 0.729566

3 QuadraticDiscriminantAnalysis 0.713128

5 BernoulliNB 0.704452

12 GaussianNB 0.691210

7 LogisticRegression 0.671461

8 NearestCentroid 0.665525

10 LinearSVC 0.632534

17 MLPClassifier 0.613356

4 CalibratedClassifierCV 0.606050

14 SGDClassifier 0.605137

13 Perceptron 0.524087

p = plt.figure(figsize=(18,20))
p = sns.set_context('paper', font_scale=1.8)

p = final_scores=final_scores.sort_values(by='Accuracy',ascending=False)[:20]

p = sns.barplot(y= 'Classifier', x= 'Accuracy', data= final_scores,palette=colors[0:10])

for container in p.containers:

p.bar_label(container,label_type = 'center',padding = 15,size = 20,color = "Red",rotation = 0,
bbox={"boxstyle": "round", "pad": 0.6, "facecolor": "#a9a9a9", "edgecolor": "white", "alpha": .8})
plt.title('COMPARE THE MODEL',fontsize=20,color='#013220')
plt.xlabel('MODEL',fontsize=20,color='#013220')
plt.ylabel('Model Accuracy',fontsize=20,color='#013220');

Streamlit website Link for Steel plant load

Prediction
from IPython.display import YouTubeVideo
YouTubeVideo('3kXF5AMzcn0', width=950, height=400)
Loading [MathJax]/extensions/Safe.js

Ec240b Update Tier III
100% (1)
Ec240b Update Tier III
82 pages
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Energy Consumption Time Series Forcasting 1681824033
No ratings yet
Energy Consumption Time Series Forcasting 1681824033
14 pages
v1 Diode Clipper Clamper
No ratings yet
v1 Diode Clipper Clamper
29 pages
In47 Instrument Impulse Piping Hook-Up
No ratings yet
In47 Instrument Impulse Piping Hook-Up
20 pages
Engine Technology
No ratings yet
Engine Technology
236 pages
3 - Multiple Choice Questions in Engineering Mathematics
No ratings yet
3 - Multiple Choice Questions in Engineering Mathematics
159 pages
Fault Code 120cu20
No ratings yet
Fault Code 120cu20
246 pages
Mini TVR II - 380-415V-60Hz-3ph - Technical Service Manual
No ratings yet
Mini TVR II - 380-415V-60Hz-3ph - Technical Service Manual
31 pages
Solar Operated Oil Skimmer: Mr. Suyog Zagadu, Mr. Sanmesh Chavan, Prof. Varsha Magar
No ratings yet
Solar Operated Oil Skimmer: Mr. Suyog Zagadu, Mr. Sanmesh Chavan, Prof. Varsha Magar
5 pages
Smaw Ncii Pre-Assessment Test
100% (1)
Smaw Ncii Pre-Assessment Test
9 pages
Ca - VD4G (En) A - 1VCP000671 - 2017.04
No ratings yet
Ca - VD4G (En) A - 1VCP000671 - 2017.04
46 pages
Magnetic Bus Bracing Pa182404en
No ratings yet
Magnetic Bus Bracing Pa182404en
2 pages
1SFA894011R7000
No ratings yet
1SFA894011R7000
4 pages
Be 46 / Be23A OEM's Manual: Warranty
No ratings yet
Be 46 / Be23A OEM's Manual: Warranty
28 pages
Marapur MBR Technology: Maximum Performances
No ratings yet
Marapur MBR Technology: Maximum Performances
12 pages
Smart Factory Energy Prediction - Ipynb
No ratings yet
Smart Factory Energy Prediction - Ipynb
355 pages
MJ 10012
No ratings yet
MJ 10012
3 pages
Streamlit
No ratings yet
Streamlit
74 pages
Revision Outline For Final Exam (Updated)
No ratings yet
Revision Outline For Final Exam (Updated)
13 pages
9702 m17 QP 42 Merged
No ratings yet
9702 m17 QP 42 Merged
12 pages
MS Revision Worksheet 2
No ratings yet
MS Revision Worksheet 2
9 pages
Catalogue Báo Sự Cố Đầu Cáp EKL4
No ratings yet
Catalogue Báo Sự Cố Đầu Cáp EKL4
4 pages
Time Series Analysis With Python
100% (1)
Time Series Analysis With Python
21 pages
Gutsche Image Brochure 2019-E
No ratings yet
Gutsche Image Brochure 2019-E
12 pages
Eco-Friendly For Anyone, Anywhere.: Hydropower
No ratings yet
Eco-Friendly For Anyone, Anywhere.: Hydropower
15 pages
Apreo Trinity Detection System Materials Science White Paper Wp0014
No ratings yet
Apreo Trinity Detection System Materials Science White Paper Wp0014
6 pages
Backward && Forward Feature Selection PART-2
No ratings yet
Backward && Forward Feature Selection PART-2
6 pages
Colab VG
No ratings yet
Colab VG
24 pages
Biography of Bernard DOROSZCZUK November 2018
No ratings yet
Biography of Bernard DOROSZCZUK November 2018
1 page
Air Compressor WP33L
No ratings yet
Air Compressor WP33L
6 pages
MCT Series Counting Scale User Manual
No ratings yet
MCT Series Counting Scale User Manual
3 pages
Bill of Materials For Flying Car Project
No ratings yet
Bill of Materials For Flying Car Project
5 pages
Mechflu Conversion
No ratings yet
Mechflu Conversion
1 page
Booster 1000
No ratings yet
Booster 1000
2 pages
Practical No. 5
No ratings yet
Practical No. 5
12 pages
ST - DS - Big Bubble Air Compressor ST3100 - R2
No ratings yet
ST - DS - Big Bubble Air Compressor ST3100 - R2
1 page
Solar Power Generation Forecasting in Europe A Time Series Analysis
No ratings yet
Solar Power Generation Forecasting in Europe A Time Series Analysis
19 pages
Code ML
No ratings yet
Code ML
12 pages
Update of Church Member Details
No ratings yet
Update of Church Member Details
1 page
Machine Downtime Prediction
No ratings yet
Machine Downtime Prediction
17 pages
Manufacturing Machine Learning Tool Mechanical
No ratings yet
Manufacturing Machine Learning Tool Mechanical
13 pages
PR Final File
No ratings yet
PR Final File
70 pages
LSTM - Ipynb - Colab
No ratings yet
LSTM - Ipynb - Colab
10 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Source Code
No ratings yet
Source Code
4 pages
Código - Operaciones de Sistemas Eléctricos - Trabajo Final PDF
No ratings yet
Código - Operaciones de Sistemas Eléctricos - Trabajo Final PDF
41 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
Time Series Visualization From Raw Data To Insights
No ratings yet
Time Series Visualization From Raw Data To Insights
34 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
Electricity Bill Price 1684107436
No ratings yet
Electricity Bill Price 1684107436
64 pages
Codigo Modelo
No ratings yet
Codigo Modelo
5 pages
ML Practical Solutions
No ratings yet
ML Practical Solutions
15 pages
Shiva Teja
No ratings yet
Shiva Teja
19 pages
Codeppsjf
No ratings yet
Codeppsjf
16 pages
Business Problem:: Time Series Analysis, Forcasting and Prediction
No ratings yet
Business Problem:: Time Series Analysis, Forcasting and Prediction
25 pages
Energy Price Prediction With XGBoost-Time Series
No ratings yet
Energy Price Prediction With XGBoost-Time Series
8 pages
Data Analyzer
No ratings yet
Data Analyzer
10 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Part 1 (Final Year Project)
No ratings yet
Part 1 (Final Year Project)
1 page
Individual Household Electric Power Consumption
No ratings yet
Individual Household Electric Power Consumption
29 pages
Co Digit Ooo
No ratings yet
Co Digit Ooo
15 pages
Machine Exercise 3
No ratings yet
Machine Exercise 3
22 pages
Code
No ratings yet
Code
2 pages
Pandas
No ratings yet
Pandas
21 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Converting Time Series Into Supervised Learning Models
No ratings yet
Converting Time Series Into Supervised Learning Models
5 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
WorkingWithData - Ipynb - Colaboratory
No ratings yet
WorkingWithData - Ipynb - Colaboratory
13 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
5) Randomforest - Ipynb - Colaboratory
No ratings yet
5) Randomforest - Ipynb - Colaboratory
12 pages
Electrical Machine Learning Tool
No ratings yet
Electrical Machine Learning Tool
3 pages
Python Scripts For Machine Learning
No ratings yet
Python Scripts For Machine Learning
13 pages
Project Intern - Jupyter Notebook
No ratings yet
Project Intern - Jupyter Notebook
16 pages
Linear Regression and SVR
No ratings yet
Linear Regression and SVR
25 pages
Recurrent Neural Network-Programs
No ratings yet
Recurrent Neural Network-Programs
9 pages
BDA File
No ratings yet
BDA File
26 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
No ratings yet
# For Linear Algebra Import Numpy As NP # For Data Processing Import Pandas As PD
4 pages
Load Dataset: Import As
No ratings yet
Load Dataset: Import As
8 pages
Time Series
No ratings yet
Time Series
31 pages
External
No ratings yet
External
11 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Coca Cola Start
No ratings yet
Coca Cola Start
1 page
AIML
No ratings yet
AIML
13 pages
Worked Examples in Mechanics of Machines using MATLAB
From Everand
Worked Examples in Mechanics of Machines using MATLAB
Eric Ogur
No ratings yet
Worked Examples in Mechanics of Machines using MATLAB
From Everand
Worked Examples in Mechanics of Machines using MATLAB
Eric Okoth Ogur
No ratings yet