0% found this document useful (0 votes)
21 views

Burned Calories Prediction Using Supervised Machine Learning Regression Algorithm

Machine learning

Uploaded by

21p61a0498
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Burned Calories Prediction Using Supervised Machine Learning Regression Algorithm

Machine learning

Uploaded by

21p61a0498
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Burned Calories Prediction using Supervised

Machine Learning: Regression Algorithm


Marte Nipas Aimee G. Acoba Jennalyn N. Mindoro
Computer Engineering Department Computer Engineering Department Computer Engineering Department
Technological Institute of the Technological Institute of the Technological Institute of the
Philippines Philippines Philippines
Manila, Philippines Manila, Philippines Manila, Philippines
[email protected] [email protected] [email protected]
2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T) | 978-1-6654-5858-0/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICPC2T53885.2022.9776710

Mon Arjay F. Malbog Julie Ann B. Susa Joshua S. Gulmatico


Computer Engineering Department Computer Engineering Department Computer Engineering Department
Technological Institute of the Technological Institute of the Technological Institute of the
Philippines Philippines Philippines
Manila, Philippines Manila, Philippines Manila, Philippines
[email protected] [email protected] [email protected]

Abstract— Regular physical activities are essential to staying metabolic rate to resting metabolic rate [12]. Metabolic rate is
healthy and fit. The estimation of calories burned by individuals the rate of energy consumed per unit of time which is based
is based on a formula and MET charts. This study aims to on the intensity of the activity or exercise.
predict the calories burned using a regression model as one of The study aims to propose a solution in predicting the
the machine learning algorithms to give more accurate results.
calories burned using machine learning algorithms. The
Data preparation, cleaning, and analysis are the primary steps
before they can be fed to the regression models. Model training learning algorithms considered are Linear Regression, Ridge
and testing using K-fold validation were done to determine the Regression, and Random Forest Regression. The goal of this
best model for the study. The performance and prediction study is to evaluate which algorithms would be the best in
accuracy of regression models were evaluated based on the predicting calories burned by an individual based on weight,
result of model testing after ten (10) iterations. The average gender, age, height, duration of the activity, heart rate, and
accuracy was computed and the result shows that Random body temperature. The model produced by the study can be
Forest regression is the best model for the study with an integrated or used with the existing technologies to have a
accuracy of 95.77%. It is very important to visualize and study better estimate of calories burned by individuals after some
the relationships of the variables in the data because it may
physical activities.
affect the performance of the algorithm in predicting the value
of the target variable. The Random Forest regression model was II. LITERATURE REVIEW
able to predict the calories burned with a high accuracy rate.
A. Calorie Expenditures
Keywords— calories, regression model, prediction, machine
The heart rate of an individual signifies the condition of
learning
the body, and heart rate can be measured through pulses. In
I. INTRODUCTION the study of [3], they developed a prototype for calculating
the calories burned through heart rate. The prototype used
A calorie is a unit of energy that comes from the foods
optical technology and photosensors to measure the heart
and drinks consumed by a person and the energy used in
physical activities [4]. The number of calories present in food rate. The controller then detects and counts the pulses. The
varies and depends on the energy it can provide. Therefore, measured value will be the basis of burned calories after
exercising.
proper consumption of calories is important because it may
In a mobile application created by [11], it is used to
lead to obesity, diabetes, and other related health problems
increase the physical activity of the smartphone users and to
[8].
In today’s modern living, regular physical activity is encourage them to exercise together. To measure the calorie
important to become fit, in good shape, and to maintain a burned, it uses the accelerometer sensor [6] built in the phone
to detect the movement of the players and a wearable polar
healthy body or to lose weight. Physical activities like
sensor to record the heart rate. A similar approach was made
running, walking, bicycling, swimming, exercising, or doing
by [1][2][9], but it uses the smartphone’s Global Position
regular daily tasks burn calories [5]. The calories burned
System (GPS) to track the movement of the players. The user
depends on many factors such as weight, gender, age, height,
metabolism, and the type of activity or exercise done. must move so that the character in the virtual world will also
Measurement of exact calories burned can be difficult. move.
The prototype of [13] uses an accelerometer and
The use activity tracker app can estimate the calorie burned,
microcontroller to measure the calories by calculating the
but the accuracy varies from product to product. Heart rate is
distance traveled by an individual through jogging, walking,
one of the best ways to measure calorie burn, but it differs
or running and applying a mathematical formula to compute
significantly between individuals based on fitness, age, and
genetics [3]. Metabolic Equivalent of Task (MET) is another the total calories burned. On the other hand, [14] used a
way to estimate calorie burned developed by researchers and gyroscope and wemos module and sent the acquired data to a
mobile application for processing to determine the number of
used by medical community. MET is the ratio of active
footsteps and calories burned in a single activity.

978-1-6654-5858-0/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: Vignana Bharathi Institute Of Technology. Downloaded on July 09,2024 at 17:34:23 UTC from IEEE Xplore. Restrictions apply.
The proposed algorithm of [12] can calculate the calories B. Data Analysis
expenditures based on predicted heat strain model(PHSM), The variables in the dataset must be analyzed first in order
which estimates the sweat rate and body temperature in to determine their relationship to the target variable, which is
response to a particular environment. In the study, heat the calorie burned. The variables heart rate, duration of
balance equation was considered to estimate the calorie exercise, body temperature is highly correlated with the
expenditure. dependent variable calorie, followed by the height and weight.
B. Regression Models
Machine Learning uses different algorithms to learn the
relationships in the data make predictions based on patterns
identified from the dataset. Regression is one of the machine
learning techniques where the model can predict the output
based on the given variables.
Linear regression model is used to identify the linear
relationship of the input variables and output variables [7].
Random Forest regression uses ensemble learning techniques
for regression. Ensemble learning is a method that combines
predictions from different machine learning algorithms to
have a higher accurate prediction than using a single model.
Random Forest uses trees instead of the equation to complete
the process of regression. It constructs several decision trees
during training time and outputs the mean of the classes to
predict all the trees [10]. Ridge regression is applied to data
with multi-collinearity.
Fig. 2 Pearson’s Correlation between Variables

Fig 3 shows the distribution of calories used for this


study. From the distribution plot, it can infer that it doesn’t
follow the pattern of normal distribution.

Fig. 1 Random Forest Regression Model

III. METHODOLOGY
A. Data Collection and Preparation
The data that was used in this study was taken from the
Kaggle website. Table I shows the dataset specification that
contains 15000 observations and nine variables. The Raw data Fig. 3 Calorie Distribution
gathered has eight (8) numeric variables and one (1) C. Feature Extraction
categorical variable. The data has no duplicate rows and no
missing cells. The data used in this study contains eight (8) features
which consist of ID, Age, Height, Weight, Duration, Heart
TABLE I. DATASET SPECIFICATION Rate, Body Temperature, and Gender. Based on the data
Parameter Value analysis, the feature ID was not included because it doesn’t
have any impact in predicting the calorie burned of a person.
Numeric variables 8
D. Model Training and Prediction
Categorical variable 1
The study considered three regression models, namely
Number of observations 15000
linear regression, ridge regression, and random forest
Missing cells 0 regression. The three models were trained with the dataset
Missing cells (%) 0.0% using the program developed in python programming. To test
Duplicate rows 0 the models, K-fold cross-validation with ten iterations was
Duplicate rows (%) 0.0%
used to reduce the chances of overfitting and to improve the
prediction accuracy.
Total size in memory 1.1MB
Average record size in memory 80.0B

Authorized licensed use limited to: Vignana Bharathi Institute Of Technology. Downloaded on July 09,2024 at 17:34:23 UTC from IEEE Xplore. Restrictions apply.
E. Model Assessment and Selection 13 111 111 Correct
14 94 94 Correct
Based on the result of testing, the models will be 15 29 29 Correct
evaluated using the scores got by each model in 10 iterations
of K-fold validation. The one with the highest average score V. CONCLUSION
in predicting the value of the target variable will be the basis Regression models are one of the Machine Learning
in selecting the best model for the study. Also, prediction algorithms used to make a prediction. In this study it was able
performance of each model was also considered to support to predict the calories burned by an individual using a given
the basis in selecting the best model. These performance dependent variable. After the identification of the problem,
measures are Mean Square Error(MSE), Root Mean Square cleaning and preparation the data are needed before it can be
Error(RMSE), and Mean Absolute Error(MAE). The model feed on the algorithm. This study needs to be analyzed to
with the lowest prediction errors can be considered the best determine the relationship present between each variable. It is
model. important to visualize and understand the relationships
between variables to look for problems like multi-collinearity
IV. RESULT AND DISCUSSION and other issues. Because those problems have an impact on
Table II shows the prediction performance of the three the algorithm that will be used. Plotting the samples in a 2-d
models. It shows that random forest has the lowest prediction plane can give us insights. In this study the researchers saw
errors and performs well compared to the other two models. that the target variable has a visible correlation between heart
rate, exercise duration, and body temperature. Lastly, the
TABLE II. MODEL PERFORMANCE OF REGRESSION MODEL training and selection model is important and must be
Metrics Linear Ridge Random Forest carefully conducted. Choosing the best fit model will make the
MSE 123.97 125.42 8.13 prediction more accurate and reliable. The researchers achieve
RMSE 11.13 11.20 2.85 95.77% accuracy using the random forest regression algorithm
MAE 8.23 8.30 1.81 with the following hyper parameters: estimator is set to 100,
maximum depth is none, minimum samples split is 2,
Table III shows the K-fold validation results, and to minimum samples leaf is 1, and maximum features is auto.
determine the best model, compute the average prediction As mentioned, the researchers did not modify any
result of each model. parameters in the regression models. For the researchers who
TABLE III. K-FOLD VALIDATION RESULTS might reference this study, it is recommended to alter some of
the parameters to see if there are significant changes to the
Iteration Linear Ridge Random
Forest
score. We also recommend trying other existing regression
1 0.92704 0.926758 0.957204 techniques that might fit better than what is used in this study.
2 0.929702 0.929207 0.957918
3 0.928701 0.927954 0.957368 ACKNOWLEDGEMENT
4 0.928419 0.927704 0.957537 The authors would like to show appreciation to the
5 0.92942 0.928509 0.958222 Technological Institute of the Philippines High-Performance
6 0.927602 0.926676 0.957631
7 0.929818 0.929614 0.957539 Computing Laboratory for letting the researchers use their
8 0.929463 0.92904 0.957892 computing facilities.
9 0.92779 0.927085 0.95767
10 0.930341 0.929758 0.957645 REFERENCES
[1] A. Koivisto, S. Merilampi, and K. Kiili. Mobile exergames for
preventing diseases related to childhood obesity. In Proceedings of the
By taking the average, it can infer that Linear regression 4th International Symposium on Applied in Sciences Biomedical and
has an accuracy of 92.88%, Ridge regression is 92.82%, and Communication Technologies, Barcelona, Spain, October 2011. 1
Random Forest regression is 95.77%. This shows that Random [2] C. G. Wylie. Mobile persuasive exergaming. In Proceedings of the
Forest Regression is the best model for this study because of 2009 International IEEE Consumer Electronics Society's Games
its capability to discover complex behaviors in the data being Innovations Conference, Lancaster, UK, August 2009. 1
understudy. Table IV shows the result of sample data [3] G. K. Reddy and K. L. Achari, "A non invasive method for calculating
calories burned during exercise using heartbeat," 2015 IEEE 9th
prediction using Random Forest regression. It was able to International Conference on Intelligent Systems and Control (ISCO),
predict 14 correct predictions out of 15 with an accuracy of 2015, pp. 1-5, doi: 10.1109/ISCO.2015.7282249
93.33%. [4] S. Kasim and F. A. Zakaria, "Daily Calorie Manager for basic daily
use," Third International Conference on Innovative Computing
TABLE IV. RANDOM FOREST PREDICTION RESULTS Technology (INTECH 2013), 2013, pp. 437-442, doi:
Test No. Predicted Calorie Actual Calorie Interpretation 10.1109/INTECH.2013.6653675
1 35 35 Correct [5] H. Rahaman and V. Dyo, "Counting calories without wearables:
2 58 58 Correct Device-free Human Energy Expenditure Estimation," 2020 16th
3 136 136 Correct International Conference on Wireless and Mobile Computing,
4 135 135 Correct Networking and Communications (WiMob), 2020, pp. 1-6, doi:
5 158 158 Correct 10.1109/WiMob50308.2020.9253424.
6 90 90 Correct [6] Jee Hyun Choi, Jeongwhan Lee, Hyun Tai Hwang, Jong Pal Kim, Jae
7 94 94 Correct Chan Park and Kunsoo Shin, "Estimation of Activity Energy
8 118 118 Correct Expenditure: Accelerometer Approach," 2005 IEEE Engineering in
9 72 72 Correct Medicine and Biology 27th Annual Conference, 2005, pp. 3830-3833,
10 134 136 Incorrect doi: 10.1109/IEMBS.2005.1615295.
11 75 75 Correct [7] M. S. Acharya, A. Armaan and A. S. Antony, "A Comparison of
12 79 79 Correct Regression Models for Prediction of Graduate Admissions," 2019

Authorized licensed use limited to: Vignana Bharathi Institute Of Technology. Downloaded on July 09,2024 at 17:34:23 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computational Intelligence in Data on Serious Games and Applications for Health (SeGAH), 2019, pp. 1-
Science (ICCIDS), 2019, pp. 1-5, doi: 10.1109/ICCIDS.2019.8862140. 5, doi: 10.1109/SeGAH.2019.8882453.
[8] N. Komiya et al., "Novel Application of 3D Range Image Sensor to [12] S. Yang, Y. Yeh, J. J. Ladasky and D. J. Schmidt, "Algorithm to
Caloric Expenditure Estimation based on Human Body Measurement," calculate human calorie expenditure based on a predicted heat strain
2018 12th International Conference on Sensing Technology (ICST), model," 2016 IEEE-EMBS International Conference on Biomedical
2018, pp. 371-374, doi: 10.1109/ICSensT.2018.8603651. and Health Informatics (BHI), 2016, pp. 545-548, doi:
[9] P. Buddharaju and N. S. C. P. Pamidi, "Mobile Exergames - Burn 10.1109/BHI.2016.7455955.
Calories While Playing Games on a Smartphone," 2013 IEEE [13] R. P. Sai, S. Bapanapalle, K. Praveen and M. P. Sunil, "Pedometer and
Conference on Computer Vision and Pattern Recognition Workshops, calorie calculator for fitness tracking using MEMS digital
2013, pp. 50-51, doi: 10.1109/CVPRW.2013.13. accelerometer," 2016 International Conference on Inventive
[10] P. Dong, H. Peng, X. Cheng, Y. Xing, X. Zhou and D. Huang, "A Computation Technologies (ICICT), 2016, pp. 1-6, doi:
Random Forest Regression Model for Predicting Residual Stresses and 10.1109/INVENTIVE.2016.7823237.
Cutting Forces Introduced by Turning IN718 Alloy," 2019 IEEE [14] W. A. Kusuma, Z. Sari, H. Wibowo, S. Norhabibah, S. N. Ubay and D.
International Conference on Computation, Communication and A. Fitriani, "Monitoring Walking Devices For Calorie Balance In
Engineering (ICCCE), 2019, pp. 5-8, doi: Patients With Medical Rehabilitation Needs," 2018 5th International
10.1109/ICCCE48422.2019.9010767. Conference on Electrical Engineering, Computer Science and
[11] S. Reiaz et al., "CalorieKiller: Burning Calories using Mobile Informatics (EECSI), 2018, pp. 460-463, doi:
Exergame with Wearables," 2019 IEEE 7th International Conference 10.1109/EECSI.2018.8752761.

Authorized licensed use limited to: Vignana Bharathi Institute Of Technology. Downloaded on July 09,2024 at 17:34:23 UTC from IEEE Xplore. Restrictions apply.

You might also like