Prediction of Obesity Level Based On Lifestyle and Eating Habits Data
Prediction of Obesity Level Based On Lifestyle and Eating Habits Data
Table of Contents
1. Title Page
2. Table of Contents
3. Introduction
4. Rationale
5. Objectives
6. Literature Review
7. Feasibility Study
8. Methodology/Planning of Work
9. Facilities Required
10.Expected Outcomes
11.References
Introduction
Obesity is a growing global health epidemic, imposing serious risks for chronic diseases and
increasing healthcare1 burdens
2 . The World Health Organization reports that in 2022 roughly one
in eight people worldwide was living with obesity, with 43% of adults classified as overweight and
16% as obese
2 . By standard WHO definitions, adults with body mass index (BMI) ≥25 kg/m² are overweight
and those with BMI ≥30 kg/m² are obese.3 Accurate prediction of an individual’s obesity level
from modifiable factors can enable early intervention. Recent research has demonstrated that
machine learning (ML) can effectively predict obesity risk from lifestyle and dietary data. For
example, using a public UCI dataset of people’s physical and eating-habit attributes, a Gradient
Boosting classifier achieved 98.11% accuracy in classifying obesity level . This project (a
Machine Learning specialization project) will use Python and libraries like scikit-learn and pandas
4 5
to build classification models that predict obesity level (non-obese vs overweight vs obese)
based on features such as diet, activity, and lifestyle. We will employ supervised learning
algorithms (e.g. decision trees, random forests, SVM, XGBoost) and evaluate performance on a
suitable dataset.
1
Rationale
Overweight and obesity have become a serious public health issue with rising prevalence
6 1
. Traditional broad prevention strategies (general diet and exercise advice) have had limited
impact, suggesting the need for personalized approaches. Recent work highlights AI and ML as
powerful tools to capture complex, nonlinear relationships among
7 risk factors. By analyzing
individual lifestyle, dietary, and demographic factors, a predictive model can identify high-risk
individuals before chronic conditions develop. Such a system could aid doctors and patients in
making targeted lifestyle modifications. Given the increasing availability of health data and the
success of ML-based risk models, this project meets an important need for data-driven obesity
management.
Objectives
•Collect or obtain a dataset containing obesity-related features (lifestyle, diet, physical
activity, demographics).
•Preprocess the data (cleaning, normalization, feature selection) for machine learning.
•Develop and train multiple ML classification models (e.g. Decision Tree, Random Forest,
SVM, XGBoost).
•Evaluate and compare model performance (accuracy, precision, recall) to select
the best predictor.
•Analyze feature importance (e.g. using SHAP or tree feature importances) to identify key
factors influencing obesity risk.
Literature Review
Several recent studies demonstrate the use of ML for obesity prediction. Kumar et al. (2022)
collected a UCI dataset of personal and eating-habit attributes and applied various ML algorithms
(Gradient Boosting, Random Forest, SVM, etc.) to predict obesity. They reported that Gradient
Boosting achieved the highest accuracy (98.11%) , underscoring the value of diet and lifestyle
features in prediction. Carabantes-Alarcón
5 et al. (2024) designed an ensemble cascade model
combining Gradient Boosting, Random Forest, and Logistic Regression. Their hybrid model
significantly outperformed individual algorithms, reaching about 79% accuracy in
overweight/obesity risk classification . Helforoush and Sayyad (2024) proposed a novel ANN-
PSO hybrid neural model and achieved 92% accuracy in predicting obesity risk 8 . They also used
SHAP analysis to interpret feature contributions. Du et al. (2024) built a visualized risk prediction
system using 9
XGBoost on a health checkup dataset (including lifestyle and lab factors) and
demonstrated high predictive performance with interpretability, aiding personalized
management . Another study by Sun et al. (2024) used decision trees, random forest, and
gradient-boosting on large survey data (CHNS/NHANES) to predict weight status from lifestyle
factors. They applied
10 11 interpretable ML (SHAP) and identified physical activity, diet, tobacco and
alcohol use as important predictors . These works collectively show that ML models,
especially tree-based and ensemble methods, can accurately classify obesity levels from lifestyle
and dietary data, justifying this project’s approach.
12 13
Feasibility Study
The project is highly feasible with available resources. Relevant data is publicly accessible: for
instance, the UCI Machine Learning Repository hosts an obesity dataset with demographic,
activity, and eating- habit features. We will implement the solution in Python using standard
4
packages ( Jupyter Notebook, Pandas, NumPy, scikit-learn, XGBoost, etc.), which are freely
available. No specialized hardware is needed beyond a typical personal computer. The
significance of the project is clear given
2
the obesity epidemic: a predictive tool could guide timely lifestyle interventions. The cost and
effort are moderate (mostly student effort), while potential benefits (improved health outcomes,
preventive care) are high. Thus, the proposed study is both practical and valuable.
Methodology/Planning of Work
We will follow a CRISP-DM style methodology. First, we will perform data collection by sourcing
an appropriate obesity-related dataset (features like age, diet, exercise, habits) and
understanding its structure. Next, data preprocessing will include cleaning missing values,
encoding categorical factors, and normalizing numeric attributes. In the modeling phase, we will
split the data into training and test sets (e.g. 80:20) and train multiple supervised classifiers
(Decision Tree, Random Forest, SVM, XGBoost, etc.). We will tune hyperparameters (via grid
search or cross-validation) to optimize performance. Following [14], we will also explore
ensemble techniques such as a stacked or cascade classifier (e.g. combining boosting, random
forest, and logistic regression) . Model evaluation will use metrics like accuracy, precision,
recall, and ROC-AUC on the test set. Finally, we will analyze
8 feature importance (using built-in
importance or SHAP) to determine which lifestyle factors most strongly influence obesity
predictions. The workflow steps are Data Acquisition → Preprocessing → Model Training & Validation
→ Evaluation & Interpretation → Documentation of results.
Facilities Required
This project requires a standard software development environment. We will use Python 3.x with
Jupyter Notebook or similar IDE. Key libraries include pandas (data handling), NumPy, scikit-learn
(ML models), XGBoost or LightGBM, and Matplotlib/Seaborn (visualization). For interpretability, we
may use the SHAP library. The dataset can be downloaded from the UCI repository or Kaggle.
Hardware requirements are minimal: a personal computer with at least 8 GB RAM will suffice. No
specialized equipment is needed.
Expected Outcomes
The expected outcome is a validated predictive model and insights from it. We anticipate
developing one or more classification models that can accurately estimate an individual’s obesity
level (e.g. normal, overweight, obese) from lifestyle and eating data. We will produce
performance reports (accuracy, confusion matrices) demonstrating model effectiveness.
Additionally, the project will highlight key factors (such as diet type, exercise frequency, etc.) that
contribute most to obesity risk. As a result, the work can contribute to health awareness by
providing a tool or recommendation system for early obesity risk assessment. Ideally, a
prototype interface (such as a web form) could be developed to let users input their lifestyle
data and receive a risk prediction, fostering personalized preventive strategies.
References
[1]J. Du et al., “Visualization obesity risk prediction system based on machine learning,” Sci. Rep.,
vol. 14, art. 22424, 2024.
3
[3] Z. Helforoush and H. Sayyad, “Prediction and classification of obesity risk based on a hybrid
metaheuristic machine learning approach,” Front. Big Data, vol. 7, art. 1469981, 2024.
[4] Z. Sun et al., “Using interpretable machine learning methods to identify the relative
importance of lifestyle factors for overweight and obesity in adults: pooled evidence from CHNS
and NHANES,” BMC Public Health, vol. 24, art. 3034, 2024.
[5]R. Kaur, R. Kumar, and M. Gupta, “Predicting risk of obesity and meal planning to reduce the
obese in adulthood using artificial intelligence,” Endocrine, vol. 78, no. 3, pp. 458–469, 2022.
[6] A. C. Genc and E. Arıcan, “Obesity classification: a comparative study of machine learning
models excluding weight and height data,” Rev. Assoc. Med. Bras., vol. 71, no. 1, e20241282, 2025.
3 7 Combination
8 of Machine Learning Techniques to Predict Overweight/Obesity in Adults
https://www.mdpi.com/2075-4426/14/8/816
4 5 Predicting risk of obesity and meal planning to reduce the obese in adulthood using
artificial intelligence - PMC
https://pmc.ncbi.nlm.nih.gov/articles/PMC9555702/