Final Project Report - Kunal - Sir
Final Project Report - Kunal - Sir
Project Report
Submitted to
BACHELOR OF TECHNOLOGY
IN
ELECTRICAL ENGINEERING
Submitted by
CANDIDATE’S DECLARATION
We hereby certify that the project entitled “FORECASTING OF FUTURE
HOUSEHOLD ELECTRICAL LOAD USING MACHINE LEARNING” which is
being submitted in the partial fulfillment of the requirement for the award of Bachelor of
Technology in the Department of Electrical Engineering, is a record of our own work
carried out under the supervision and guidance of Dr. Amit Kumar Choudhary, Assistant
Professor, Department of Electrical Engineering, BIT Sindri.
All information in this document has been obtained and presented as per academic
rules and ethical conduct.
To the best of our knowledge, the contents presented in this project report have
not been submitted elsewhere for the award of any other degree/diploma.
Date: Signature:
Place: Kunal Ranjan Singh (18030450044)
Signature:
Piyush Kumar (18030450058)
Signature:
Roopam Kr. Raj (18030450072)
Signature:
Sanskar Ram (18030450081)
Signature:
Soham Vivek (18030455015)
BIT SINDRI, DHANBAD-828123
Department of Higher & Technical Education, Government of Jharkhand
This is to certify that the work presented in the project report entitled
“FORECASTING OF FUTURE HOUSEHOLD ELECTRICAL LOAD USING
MACHINE LEARNING” in partial fulfillment of the requirement for the award of
Degree of Bachelor of Technology in the Department of Electrical Engineering at BIT
Sindri is an authentic work carried out under my supervision and guidance.
To the best of my knowledge, the content of this report does not form a basis for
the award of any previous Degree to anyone else.
2
BIT SINDRI, DHANBAD-828123
Department of Higher & Technical Education, Government of Jharkhand
Signature of HOD
Name: Dr. Md Abul Kalam
Head, Department of Electrical Engineering, BIT Sindri
3
ACKNOWLEDGMENT
We would like to articulate our deep gratitude to our project guide Dr. Amit
Kumar Choudhary, Assistant Professor, Department of Electrical Engineering who have
always been source of motivation and firm support for carrying out the project. We also
express our gratitude to him for his invaluable suggestion and constant encouragement all
through the project work.
We would also like to convey our sincerest gratitude and indebtedness to the Head
of the Department and all other faculty members and staff of Department of Electrical
Engineering, BIT Sindri, who bestowed their great effort and guidance at appropriate
times without it, would have been very difficult on our work.
An assemblage of this nature could have been attempted with our reference to and
inspiration from the works of others whose details are mentioned in references section.
We acknowledge our indebtedness to all of them. Further, we would like to express our
feeling towards our parents and god who directly or indirectly encouraged and motivated
us during this dissertate.
Date: Signature:
Place: Kunal Ranjan Singh (18030450044)
Signature:
Piyush Kumar (18030450058)
Signature:
Roopam Kr. Raj (18030450072)
Signature:
Sanskar Ram (18030450081)
Signature:
Soham Vivek (18030455015)
4
INDEX
Declaration 1
Certificate of Declaration from the Supervisor 2
Certificate of Approval from the Examiners 3
Acknowledgement 4
Index 5
Abstract 7
List of Figures 8
List of Tables 8
1. INTRODUCTION 9
1.1 Overview of Machine Learning 9
2. OBJECTIVE 10
2.1 Objective of the work 10
3. METHODOLOGY OF THE WORK 11
3.1 Flowchart of the work 11
3.2 Gathering of data 11
3.3 Data analysis and Pre-processing 13
3.3.1 Data Cleaning 14
3.3.2 Data Integration 14
3.3.3 Data Reduction 15
3.3.4 Data Transformation 15
3.4 Data Visualization 15
3.4.1 Line Charts 16
3.4.2 Bar Graphs 17
3.4.3 Histograms 17
3.4.4 Scatter Plots 18
3.4.5 Violin Plots 18
3.4.6 Box Plots 19
3.5 Training, Testing and Evaluating Machine Learning Models 19
3.5.1 Splitting the dataset 19
5
3.5.2 Model training and testing 21
3.5.3 Hyper parameter tuning 24
3.5.4 Model Evaluation 25
4. RESULTS AND DISCUSSION 27
5. CONCLUSIONS AND FUTURE SCOPES 29
5.1 Conclusions 29
5.2 Future Scopes 29
REFRENCES 30
LIST OF PUBLICATION 31
6
ABSTRACT
The objective of the project is to identify & predict future large-scale electrical
energy consumption, a more flexible power consumption so as to reduce energy costs and
the impact on the environment as well as utilize periods with large environmental power
generation and minimum load on the grid.
Machine learning (ML) methods has recently contributed very well in the
advancement of the prediction models used for energy consumption. Such models highly
improve the accuracy, robustness, and precision and the generalization ability of the
conventional time series forecasting tools. This project reviews the state of the art of
machine learning models used in the general application of energy consumption. Through
a novel search and taxonomy, the most relevant literature in the field is classified
according to the ML modelling technique, energy type, perdition type, and the application
area. A comprehensive review of the literature identifies the major ML methods, their
application and a discussion on the evaluation of their effectiveness in energy
consumption prediction. This proposed work further makes a conclusion on the trend and
the effectiveness of the ML models. As the result, this work reports an outstanding rise in
the accuracy and an ever-increasing performance of the prediction technologies using the
novel hybrid and ensemble prediction models.
7
LIST OF FIGURES
Fig No. Title Page No.
3.1 Flowchart of the work 11
3.2 Code Snippet showing the location of dataset in the drive. 12
3.3 Code Snippet for loading of the dataset in the dataframe. 12
3.4 Data Pre-processing of raw data. 13
3.5 Line Chart 16
3.6 Bar Graph 17
3.7 Histogram 17
3.8 Scatter Plot 18
3.9 Violin Plot 18
3.10 Box Plot 19
3.11 Distribution of dataset into different segment 19
3.12 Code snippet for splitting of the dataset 20
3.13 Training dataset after splitting of the dataset 20
3.14 Testing dataset after splitting of the dataset 21
3.15 Code Snippet for training & testing of the linear regression model 22
3.16 Code Snippet for training & testing of the svm regression (linear kernel) 22
model
3.17 Code Snippet for training & testing of the svm regression (poly kernel) 22
model
3.18 Code Snippet for training & testing of the decision tree regression model 23
3.19 Code Snippet for training & testing of the random forest regression model 23
3.20 Code Snippet for training & testing of the xgboost regression model 24
3.21 Hyperparameter tuning 24
3.22 Hyperparameter tuning technique 24
3.23 Models and their respective mse and r2_score values 25
3.24 Plot of r2_score of models 26
3.25 Plot of mse of models 26
4.1 Plot of r2_score of models 28
LIST OF TABLES
Table No. Title Page No.
4.1 Comparison Between Mean Squared Error & R2_Score 27
8
CHAPTER 1
INTRODUCTION
Machine learning (ML) methods has recently contributed very well in the
advancement of the prediction models used for energy consumption. Such models highly
improve the accuracy, robustness, and precision and the generalization ability of the
conventional time series forecasting tools [1].
Types of Machine Learning Algorithms:
• Supervised Machine Learning: In supervised learning, both the input and desired
output are provided and the machine must learn how to map the former to the
latter. To accomplish this, the machine is trained on a statistically representative
set of example inputs and corresponding outputs. Supervised Learning deals with
two main tasks Regression and Classification.
• Unsupervised Machine Learning: In unsupervised learning, the machine is not
provided labelled examples or previous patterns on which to base the analysis of
the data inputs. The machine must uncover patterns and draw inferences by itself,
without having the correct answers. It will classify or cluster data by discovering
the similarity of features on its own. Unsupervised Learning deals with clustering
and associative rule mining problems.
• Reinforcement Learning: Reinforcement learning continuously improves its model
based on feedback from experiences. It learns through trial and error – from the
consequences of its action and by new choices. As an action is taken, the success
of the outcome is graded and receives either a positive or negative score.
Reinforcement Learning deals with exploitation or exploration, Markov’s decision
processes, Policy Learning, Deep Learning and value learning [2].
9
CHAPTER 2
OBJECTIVE
2.1 Objective of the Work
This project reviews the state of the art of machine learning models used in the
general application of energy consumption. Through a novel search the most relevant
literature in the field is classified according to the ML modelling technique, energy type,
perdition type, and the application area.
A comprehensive review of the literature identifies the major ML methods, their
application and a discussion on the evaluation of their effectiveness in energy
consumption prediction. This literature reviews the training of the different machine
learning models and their performance score on the standard comparison metrics.
This project further makes a conclusion on the trend and the effectiveness of the
data analysis & pre-processing, data visualization, outlier removal and data splitting prior
to start building ML models. The different state of the art machine learning will be trained
with the help of the available dataset with hyper parameter tuning to improve their
accuracy and appropriate model will be selected among all the trained models. As the
result, this project reports an outstanding rise in the accuracy and an ever-increasing
performance of the prediction technologies using the novel hybrid and ensemble
prediction models [3] [4].
With the use of machine learning the below optimization are produced in the power
system:
1. The decrease in the maintenance cost of the power system.
2. Sufficient supply of electrical energy to household as per the requirement
3. To predict upcoming future load.
4. Improved energy efficiency, reliability, and security
5. Reducing power generation costs [5] [6] [7].
10
CHAPTER 3
METHODOLOGY OF THE WORK
3.1 Flowchart of the work
The below flowchart that is represented by Fig. 3.1 shows the various steps
involved in the project.
Data Analysis
Data Pre-Processing
Data Visualization
11
sources but the collected data cannot be used directly for performing the analysis process
as there might be a lot of missing data, extremely large values, unorganized text data or
noisy data. Data pre-processing is one of the most important steps in machine learning [1]
[3] [4]. It is the most important step that helps in building machine learning models more
accurately. In machine learning, there is an 80/20 rule.
In this project the data for training the machine learning model is collected from
the UC Irvine Machine learning repository, which is an open-source machine learning
dataset platform for performing machine learning. The dataset on which machine learning
will be performed is stored in the hierarchical directory and it is further loaded in pandas
data frame with the help of python library for performing various kind of exploratory data
analysis and data visualization tasks [3]. Fig 3.2 and fig 3.3 depicts the location of dataset
in drive and loading of dataset in data frame respectively
Fig 3.2 Code Snippet showing the location of dataset in the drive.
Fig 3.3 Code Snippet for loading of the dataset in the data frame.
12
3.3 Data Analysis & Pre-processing
This process helps in the reduction of the volume of the data which makes the
analysis easier yet produces the same or almost the same result. This reduction also helps
to reduce storage space. There are some of the techniques in data reduction are
Dimensionality reduction, Numerosity reduction, Data compression [1] [4].
• Dimensionality reduction: This process is necessary for real-world applications as
the data size is big. In this process, the reduction of random variables or attributes is
done so that the dimensionality of the data set can be reduced. Combining and
merging the attributes of the data without losing its original characteristics. This
also helps in the reduction of storage space and computation time is reduced. When
the data is highly dimensional the problem called “Curse of Dimensionality” occurs.
• Numerosity Reduction: In this method, the representation of the data is made
smaller by reducing the volume. There will not be any loss of data in this reduction.
• Data compression: The compressed form of data is called data compression. This
compression can be lossless or lossy. When there is no loss of information during
compression it is called lossless compression. Whereas lossy compression reduces
information but it removes only the unnecessary information.
The change made in the format or the structure of the data is called data
transformation. This step can be simple or complex based on the requirements. There are
some methods in data transformation [4].
• Smoothing: With the help of algorithms, noises are removed from the dataset and
this helps in knowing the important features of the dataset. By smoothing we can
find even a simple change.
• Aggregation: In this method, the data is stored and presented in the form of a
summary. The data set which is from multiple sources is integrated into with data
analysis description.
• Discretization: The continuous data here is split into intervals. Discretization
reduces the data size.
• Normalization: It is the method of scaling the data so that it can be represented in a
smaller range. Example ranging from -1.0 to 1.0.
3.4.3 Histogram
A Histogram is a bar representation of data that varies over a range. It plots the
height of the data belonging to a range along the y-axis and the range along the x-axis.
Histograms are used to plot data over a range of values. It uses a bar representation to
show the data belonging to each range. A histogram is shown in fig. 3.7.
18
3.4.6 Box-Plot
A box plot shows the distribution of quantitative data in a way that facilitates
comparisons between variables or across levels of a categorical variable. The box shows
the quartiles of the dataset while the whiskers extend to show the rest of the distribution,
except for points that are determined to be “outliers” using a method that is a function of
the inter-quartile range. The box plot is represented in fig 3.10.
20
Fig 3.14 Testing dataset after splitting of the dataset.
3.5.2 Model training and testing
Model training for machine learning includes splitting the dataset, training
different machine learning models, tuning hyperparameters and performing batch
normalization. In this work with the available dataset the below mentioned machine
learning models were trained [1] [2]:
• Linear Regression Model (Fig 3.15)
• Support Vector Machine Regression Model (linear kernel) (Fig 3.16)
• Support Vector Machine Regression Model (poly kernel) (Fig 3.17)
• Decision Tree Regression Model (Fig 3.18)
• Random Forest Regression Model (Fig 3.19)
• XgBoost Regression Model (Fig 3.20)
21
Fig 3.15 Code Snippet for training & testing of the linear regression model.
Fig 3.16 Code Snippet for training & testing of the svm regression (linear kernel) model
Fig 3.17 Code Snippet for training & testing of the svm regression (poly kernel) model
22
Fig 3.18 Code Snippet for training & testing of the decision tree regression model
Fig 3.19 Code Snippet for training & testing of the random forest regression model
23
Fig 3.20 Code Snippet for training & testing of the xgboost regression model
Fig 3.23 Models and their respective mse and r2_score values.
Different evaluation metrics are used for different kinds of problems. In this work
r2_score and mean squared error metrics of the scikit-learn library will be used to
compare the performance of different models as shown in fig 3.23, fig 3.24 and fig 3.25.
25
Fig 3.24 Plot of r2_score of models. Fig 3.25 Plot of mse of models.
The next chapter will discuss about the results obtained in this proposed work.
26
CHAPTER 4
RESULTS AND DISCUSSION
The various machine learning models were trained along with hyperparameter
tuning with the help of the training dataset obtained after the splitting of the dataset. After
the training of the various machine learning models with the help of testing dataset,
r2_score metric of scikit-learn and mean squared error metric of scikit-learn their
performance were checked.
The models which were trained are linear regression model, support vector
machine model with linear kernel, support vector machine model with poly kernel,
decision tree model, random forest model, Xgboost model as tabulated in table 4.1
From the table, it can be clearly observed that, the Xgboost model has the highest
r2_score. The obtained score is satisfactorily acceptable for the final inference purpose.
The fig 4.1 and fig 4.2 obtained by the training in the earlier chapter depict the acceptable
result from the Xgboost model for mean squared error and r2_score of all the models.
27
Fig 4.1 Plot of r2_score of models. Fig 4.2 Plot of mse of models.
28
CHAPTER 5
CONCLUSIONS AND FUTURE SCOPES
5.1 Conclusions
This system provides precise prediction of future household electrical load with
the help of parameters of a power system. This system can make the traditional power
system more intelligent. This system can form the basis for the implementation of smart
grid in future. This system can become more adaptable than the current system, allowing
it to quickly predict energy demand of an area.
29
REFERENCES
[8] M. Hayati and Y. Shirvany, “Artificial neural network approach for short term load
forecasting for illam region,” World Academy of Science, Engineering and Technology,
vol. 28, pp. 280–284, 2007.
[9] N. Kandil, R. Wamkeue, M. Saad, and S. Georges, “An effi-cient approach for short
term load forecasting using artificial neural networks,” International Journal of Electrical
Power & Energy Systems, vol. 28, no. 8, pp. 525–530, 2006.
[11] G. Zhang, B. E. Patuwo, and M. Y. Hu, “Forecasting with artificial neural networks::
The state of the art,” International journal of forecasting, vol. 14, no. 1, pp. 35–62, 1998.
[12] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for short-term load
forecasting: A review and evaluation,” IEEE Transactions on power systems, vol. 16, no.
1, pp. 44–55, 2001.
30
LIST OF PUBLICATION
01. Sanskar R., Kunal R. Singh, Piyush K., Roopam Kr. Raj, Soham V., Amit K.
Choudhary, “Forecasting of future household electrical load using machine
learning algorithm” IEEE transactions on Power Systems. (Communicated)
31