0% found this document useful (0 votes)

34 views8 pages

Enhancing Missing Values Imputation Through Transformer-Based Predictive Modeling

This research article presents a novel approach to missing values imputation using transformer-based predictive modeling, which significantly outperforms traditional methods such as zero, mean, and KNN imputation. The proposed model achieves impressive R2 scores of 0.96 for hourly data and 0.806 for daily data, demonstrating its ability to capture complex data relationships effectively. The study emphasizes the importance of maintaining data integrity and offers a robust framework for enhancing imputation accuracy in various datasets.

Uploaded by

igminresearch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views8 pages

Enhancing Missing Values Imputation Through Transformer-Based Predictive Modeling

Uploaded by

igminresearch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ENGINEERING ISSN 2995-8067

Information Technology | Artiϐicial Intelligence | Data Engineering

Article Information Research Article

Submitted: December 08, 2023
Approved: January 22 2024 Enhancing Missing Values
Published: January 23, 2024

How to cite this article: Ayub H, Jamil H. Enhancing Missing Imputation through
Values Imputation through Transformer-Based Predictive
Modeling. IgMin Res. Jan 23, 2024; 2(1): 025-031. IgMin ID:
igmin140; DOI: 10.61927/igmin140; Available at:
Transformer-Based
Predictive Modeling
www.igminresearch.com/articles/pdf/igmin140.pdf

Copyright license: © 2024 Ayub H, et al. This is an open access

article distributed under the Creative Commons Attribution Hina Ayub1 and Harun Jamil2*
License, which permits unrestricted use, distribution, and
1
reproduction in any medium, provided the original work is Interdisciplinary Graduate Program in Advance Convergence Technology and Science, Jeju
properly cited. National University, Jeju, 63243, Republic of Korea
2
Department of Electronics Engineering, Jeju National University, Jeju, 63243, Jeju-do,
Republic of Korea

*Correspondence: Harun Jamil, Department of Electronics Engineering, Jeju National

University, Jeju, 63243, Jeju-do, Republic of Korea, Email: [email protected]

Abstract
This paper tackles the vital issue of missing value imputation in data preprocessing, where traditional techniques like zero, mean, and KNN imputation fall short
in capturing intricate data relationships. This often results in suboptimal outcomes, and discarding records with missing values leads to significant information loss.
Our innovative approach leverages advanced transformer models renowned for handling sequential data. The proposed predictive framework trains a transformer
model to predict missing values, yielding a marked improvement in imputation accuracy. Comparative analysis against traditional methods—zero, mean, and KNN
imputation—consistently favors our transformer model. Importantly, LSTM validation further underscores the superior performance of our approach. In hourly data,
our model achieves a remarkable R2 score of 0.96, surpassing KNN imputation by 0.195. For daily data, the R2 score of 0.806 outperforms KNN imputation by 0.015
and exhibits a notable superiority of 0.25 over mean imputation. Additionally, in monthly data, the proposed model’s R2 score of 0.796 excels, showcasing a significant
improvement of 0.1 over mean imputation. These compelling results highlight the proposed model’s ability to capture underlying patterns, offering valuable insights
for enhancing missing values imputation in data analyses.

Introduction a paradigm shift from rule-based imputation methods to a more

data-driven and adaptive approach [3]. By harnessing the power
In the ever-evolving data analysis landscape, the challenge of of self-attention mechanisms [4], the transformer model can
missing values within datasets is a formidable hurdle, impacting the effectively capture intricate relationships and dependencies across
reliability and efficacy of downstream applications [1]. Addressing different features, leading to more accurate predictions of missing
this challenge has spurred the development of various imputation values.
techniques, each attempting to reconcile the absence of data points
with meaningful estimations. Traditional methods, such as zero This innovative approach is particularly advantageous when
imputation, mean imputation, and K-Nearest Neighbors (KNN) dealing with large and complex datasets, where conventional
imputation, have long been employed. Yet, their efficacy is often imputation techniques may struggle to capture the nuanced
limited in capturing complex dataset’s intricate relationships and patterns inherent in the data. The transformer model’s ability to
patterns. consider global context and long-range dependencies ensures a
holistic understanding of the dataset, enhancing its capacity to
Recognizing the shortcomings of conventional approaches impute missing values in a manner that aligns with the underlying
and the imperative to preserve valuable information lost through structure of the information [5]. Statistical imputation techniques
record removal, this paper introduces an innovative method for employ conventional statistical methods, such as substituting
missing values imputation. The proposed approach leverages missing values with the mean of available data and utilizing
the capabilities of advanced transformer models, which have regression models [6-8].
exhibited exceptional performance in handling sequential data
and contextual information. The transformer model is trained In addressing missing data in short-term air pollutant
to predict missing values based on the inherent contextual monitoring with real-time PM2.5 monitors [9], univariate methods
dependencies within the dataset, offering a promising alternative like Markov, random, and mean imputations prove superior,
to traditional imputation techniques [2]. Moreover, the utilization especially beneficial in resource-limited contexts. Furthermore,
of transformer models in missing values imputation represents DACMI [10] addresses missing clinical data with a shared dataset,

www.igminresearch.com 025
ISSN 2995-8067 DOI: 10.61927/igmin140

revealing that models like LightGBM and XGBoost, coupled with researchers and data scientists. Recent research [31] thoroughly
careful feature engineering, excel in imputation performance, compares seven data imputation methods for numeric datasets,
emphasizing the importance of balanced model complexity. In revealing kNN imputation’s consistent outperformance. This
scRNA-seq, vital for studying single-cell transcription, addressing contribution adds valuable insights to the ongoing discourse
high-dimensionality and dropout values is crucial. This study [11] on selecting optimal methods for handling missing data in data
evaluates advanced imputation methods, providing insights for mining tasks. Furthermore, we introduce an additional validation
selecting appropriate approaches in diverse data contexts and layer by subjecting the imputed data to scrutiny through Long
aiding downstream functional analysis. Both the Self-Organizing Short-Term Memory (LSTM) networks [32]. This not only assesses
Map (SOM) [12,13] and the MLP [14] represent additional ML the accuracy of imputation but also gauges the temporal coherence
techniques applied for the imputation of missing values. of the imputed values.

Furthermore, studies employing the regression approach By undertaking this exploration, we aim to contribute valuable
[15] implemented a novel method involving weighted quantile insights into the realm of missing values imputation, offering a
regression to estimate missing values within health data. In nuanced understanding of the capabilities of transformer-based
another article [16], the author introduced a comprehensive case models. The observed improvements in imputation accuracy,
regression approach for handling missing values, employing particularly validated through LSTM analysis, underscore the
functional principal components. Iterative regression is used for potential of our proposed approach to address the persistent
effective imputation in multivariate data [17]. Another method challenges associated with missing data. Through this work, we
Hot-deck imputation, matches missing values with complete aspire to provide a robust foundation for future advancements in
values on key variables [18]. Research is conducted on expectation data preprocessing and analysis methodologies. These are the key
minimization in handling missing data using a dataset analyzing contributions of the article:
the effects of feeding behaviors among drug-treated and untreated
• Introduced a novel missing values imputation approach
animals [19]. Recognizing the insufficiency of merely deleting or
using transformer models, deviating from traditional
discarding missing data [20], researchers often turn to employing
methods.
multiple imputations. Multiple imputation involves leveraging the
observed data distribution to estimate numerous values, reflecting • Leveraging self-attention mechanisms, the transformer-
the uncertainty surrounding the true value. This approach has based model provides a data-driven and adaptive solution
predominantly been utilized to address the constraints associated for capturing intricate data relationships.
with single imputation [21].
• Through a comprehensive comparative analysis, the
Moreover, another study [22] evaluates imputation methods for transformer model consistently outperforms traditional
incomplete water network data, focusing on small to medium-sized imputation techniques like zero, mean, and KNN.
utilities. Among the tested methods, IMPSEQ outperforms others
in imputing missing values in cast iron water mains data from the • The inclusion of LSTM validation adds a layer of scrutiny,
City of Calgary, offering insights for cost-effective water mains evaluating not only imputation accuracy but also the
renewal planning. The proposed one-hot encoding method by temporal coherence of imputed values.
[23] excels in addressing missing data for credit risk classification,
• The proposed model showcases robust performance across
demonstrating superior accuracy and computational efficiency,
diverse datasets, demonstrating its efficacy in preserving
especially in high missing-rate scenarios, when integrated with
data relationships and capturing variability.
the CART model. Another work [24] proposes a novel imputation
method for symbolic regression using Genetic Programming
Methodology
(GP) and weighted K-Nearest Neighbors (KNN). It outperforms
state-of-the-art methods in accuracy, symbolic regression, and Handling missing values in datasets is a crucial challenge,
imputation time on real-world datasets. Conventional techniques particularly when predicting these values based on available
for multiple imputations exhibit suboptimal performance when data. Figure 1 outlines a comprehensive process for predicting
confronted with high-dimensional data, prompting researchers to missing values using a transformer model. In the initial step, we
enhance these algorithms [25,26]. Likewise, indications exist that showcase an example dataset with missing values, highlighting
exercising caution is advisable when applying continuous-based the intricacies of the task. Moving to step two, we prepare the
approaches to impute categorical data, as it may introduce bias data for missing values imputation by segregating complete data
into the results [27]. for model training and reserving a test set for predicting missing
data. Before arranging the data, each data sequence is assigned
Motivated by the need for a comprehensive evaluation, we a unique identifier, ensuring traceability. Complete data features
conduct extensive experiments to compare the performance of our (f0, f3, f6, and f9) are repositioned on the right side in the third
transformer-based imputation against established methods. This step. Subsequently, in step four, all complete rows are relocated to
comparison extends beyond conventional imputation techniques, the top of the dataset.
encompassing zero [28], mean [29], and KNN imputations [30].
In the context of missing value imputation, it is noteworthy Step five reveals the division of the dataset into X-data and
that addressing missing values is a common concern among Y-data, forming the basis for training the model. In step six, we

ENGINEERING 026 January 23, 2024 - Volume 2 Issue 1

ISSN 2995-8067 DOI: 10.61927/igmin140

Figure 1: A detailed process of preparing data for the Transformer for missing values prediction.

select the complete X-Data and the target feature f1 from Y-Data, the subsequent step nine, the adeptly trained model takes on the
which contains missing data. Utilizing the train-test split on task of predicting missing values within the X-Data. The imputed
X-Data and Y-Data (f1), we generate X-Train, Y-Train, X-Test, and f1 feature is seamlessly integrated back into the X-Data, initiating
Y-Test. In step seven, the train data is prepared for our proposed a cascading eﬀect as subsequent missing values are accurately
prediction model, providing a complete set for training the predicted. This iterative refinement persists until the entirety of
Transformer model. missing values is meticulously filled.

Advancing further, at step eight, the transformer undergoes In the culminating step, the dataset is meticulously organized,
comprehensive training using the entirety of the available data. In preserving its inherent structure by adhering to initially assigned

ENGINEERING 027 January 23, 2024 - Volume 2 Issue 1

ISSN 2995-8067 DOI: 10.61927/igmin140

IDs. This methodical approach not only ensures the seamless Table 1: A detailed comparative analysis of the Imputation techniques.
integration of imputed values but also maintains the overall Proposed
FE P Zero Mean Mode KNN
integrity and coherence of the dataset. In essence, our methodology Data Measure Imputation imputation Imputation Imputation
Model
Imputation
provides a structured and systematic solution, navigating the
R2 score 0.233 0.647 0.437 0.765 0.96
intricacies of missing value imputation using a transformer model. MAE 0.058 0.113 0.075 0.037 0.036
MSE 0.006 0.02 0.008 0.003 0.003
Proposed model validation Hourly
Data RMSE 0.077 0.141 0.089 0.055 0.055
MAPE 1.2 0.92 1.01 0.83 0.423
After the missing data imputation process finished using
R2 score 0.391 0.556 0.471 0.791 0.806
our suggested transformer-based prediction model, a thorough
MAE 0.066 0.051 0.059 0.048 0.028
validation was carried out. During the validation stage, we aimed
MSE 0.009 0.008 0.0073 0.004 0.003
to assess how well our suggested model performed in comparison Daily
RMSE 0.077 0.095 0.055 0.045 0.045
Data
to other widely used imputation techniques, such as zero, mean, MAPE 0.93 0.85 0.89 0.47 0.32
mode, and KNN imputation. We used these various imputation R2 score 0.251 0.696 0.698 0.419 0.796
methods to produce five sets of imputed data. We validated each MAE 0.023 0.051 0.029 0.038 0.025
imputation model using Long Short-Term Memory (LSTM) Monthly
MSE 0.001 0.003 0.002 0.004 0.001
networks to evaluate its eﬀectiveness thoroughly. The LSTM Data RMSE 0.032 0.055 0.045 0.063 0.032
network was fed the imputed data from all five models, including MAPE 1.13 0.89 0.891 1.01 0.523

our suggested transformer-based imputation, as shown in Figure

2. Our objective was to conduct a comprehensive comparison and three datasets: hourly energy consumption data, daily energy
evaluation of the overall performance and forecast accuracy for each consumption data, and monthly energy consumption data; the R2
imputed dataset. Notably, the results consistently demonstrated score emerges as a critical metric for evaluating the effectiveness
the superior performance of our proposed transformer-based of various imputation methods. The proposed imputation model
prediction-based imputation model when pitted against other, yields a noteworthy R2 score of 0.96 in hourly data, showcasing a
less intricate imputation techniques. This underscores its robust substantial improvement of 0.195 over the next best method, KNN
predictive capabilities in effectively managing missing values imputation. This enhancement underscores the proposed model’s
within the dataset. ability to capture underlying patterns within the data, outshining
traditional techniques such as zero, mean, and mode imputation.
The validation procedure, when juxtaposed with conventional
Similarly, the proposed model’s R2 score of 0.806 in daily data
imputation techniques, serves as a testament to the resilience and
efficacy of our proposed model. The outcomes not only showcase its outperforms the KNN imputation by 0.015, demonstrating a
superiority but also affirm its ability to outperform fewer complex notable superiority of 0.25 over the mean imputation. Moving
alternatives. to monthly data, the proposed model’s R2 score of 0.796 excels,
showcasing a significant improvement of 0.1 over mean imputation
Results and analysis and an even more substantial gain of 0.359 over mode imputation.

Table 1 examines the imputation performance across Overall, these results consistently highlight the proposed

Figure 2: Validation process of the Imputed data using LSTM.

ENGINEERING 028 January 23, 2024 - Volume 2 Issue 1

ISSN 2995-8067 DOI: 10.61927/igmin140

model’s effectiveness in preserving data relationships and underscore the robustness of the proposed model in minimizing
capturing variability across diverse datasets, positioning it as a imputation errors. These comprehensive findings suggest that,
robust choice for imputing missing values when accurate modeling beyond R2 scores, the proposed imputation model consistently
of underlying data patterns is crucial. A visual analysis of the r2 excels across various error metrics, affirming its efficacy in
score for the selected imputation method is illustrated in Figure 3. accurately filling missing data and offering a comprehensive
solution for handling diverse datasets with absent values.
Beyond R2 scores, an in-depth analysis of other error metrics
further solidifies the superiority of the proposed imputation Critical discussion
model, as shown in Figure 4. In hourly consumption data, the
model’s Mean Absolute Error (MAE) of 0.036 is notably lower In this study, we have demonstrated the superior efficacy of
than that of other methods, reflecting its ability to predict missing transformer models over traditional methods like zero mean and
values with minimal deviation accurately. This trend continues in KNN imputation, particularly in handling accuracy and context
daily and monthly consumption data, where the proposed model in missing data. However, the performance of these models
consistently achieves the lowest MAE values, indicating superior varies with different data types and sizes, highlighting potential
imputation accuracy. Similarly, examining Mean Squared Error limitations in scalability and applicability to diverse datasets.
(MSE) and Root Mean Squared Error (RMSE) across all datasets, Comparative analysis suggests that while transformers excel in
the proposed model consistently outperforms alternative methods. interpreting sequential data, they may not be the most suitable
The observed reductions in MAE, MSE, and RMSE collectively choice for simpler or smaller datasets. The practical applications of

Figure 3: R2 score analysis for the selected imputation methods.

Figure 4: MAPE analysis for the selected imputation methods.

ENGINEERING 029 January 23, 2024 - Volume 2 Issue 1

ISSN 2995-8067 DOI: 10.61927/igmin140

our model are promising, yet they are accompanied by challenges 6. Schafer JL. Analysis of incomplete multivariate data. CRC press. 1997.
in computational demands and ethical considerations, especially in 7. Menard S. Applied logistic regression analysis. Sage. 2002. 106.
sensitive sectors like healthcare and finance. The generalizability of
8. Little RJ, Rubin DB. Statistical analysis with missing data. John Wiley & Sons.
our model across various types of missing data and its application 2019; 793.
across different fields remains an area ripe for further research and
9. Hadeed SJ, O’Rourke MK, Burgess JL, Harris RB, Canales RA. Imputation
validation.
methods for addressing missing data in short-term monitoring of air
pollutants. Sci Total Environ. 2020 Aug 15;730:139140. doi: 10.1016/j.
Future studies should focus on integrating advanced machine scitotenv.2020.139140. Epub 2020 May 3. PMID: 32402974; PMCID:
learning techniques to enhance the robustness and applicability PMC7745257.
of our model. Additionally, while the use of LSTM networks for
10. Luo Y. Evaluating the state of the art in missing data imputation for clinical
validation is beneficial, alternative methods might provide a data. Brief Bioinform. 2022 Jan 17;23(1):bbab489. doi: 10.1093/bib/bbab489.
more comprehensive evaluation. It is crucial to acknowledge PMID: 34882223; PMCID: PMC8769894.
that the quality of imputation has a significant impact on the 11. Wang M, Gan J, Han C, Guo Y, Chen K, Shi YZ, Zhang BG. Imputation
predictive accuracy of models, particularly in fields where data methods for scRNA sequencing data. Applied Sciences. 2022; 12(20):10684.
integrity is crucial. Our findings highlight the importance of
12. Samad T, Harp SA. Self–organization with partial data. Network: Computation
continuous development in imputation methods, keeping pace in Neural Systems. 1992; 3(2):205-212.
with evolving data complexities and advancements in AI. This
13. Fessant F, Midenet S. Self-organising map for data imputation and correction
research contributes to the broader understanding of missing data in surveys. Neural Computing & Applications. 2002; 10:300-310.
imputation, setting a foundational stage for future innovations in
14. Westin LK. Missing data and the preprocessing perceptron. Univ. 2004.
predictive modeling.
15. Sherwood B, Wang L, Zhou XH. Weighted quantile regression for
Conclusion analyzing health care cost data with missing covariates. Stat Med. 2013 Dec
10;32(28):4967-79. doi: 10.1002/sim.5883. Epub 2013 Jul 9. PMID: 23836597.
This paper introduces a novel transformer-based prediction
16. Crambes C, Henchiri Y. Regression imputation in the functional linear
model to handle the critical problem of dataset missing value model with missing values in the response. Journal of Statistical Planning and
imputation. By methodically explaining the process, we Inference. 2019; 201:103-119.
demonstrated a comprehensive strategy that outperformed 17. Siswantining T, Soemartojo SM, Sarwinda D. Application of sequential
conventional imputation strategies, such as zero imputation, regression multivariate imputation method on multivariate normal missing
mean imputation, and KNN imputation. The suggested model data. In 2019 3rd International Conference on Informatics and Computational
demonstrated exceptional prediction powers by capturing complex Sciences (ICICoS). IEEE. 2019; 1-6.
patterns in sequential data. Our model significantly outperformed 18. Andridge RR, Little RJ. A Review of Hot Deck Imputation for Survey
alternative imputation techniques after extensive validation using Non-response. Int Stat Rev. 2010 Apr;78(1):40-64. doi: 10.1111/j.1751-
5823.2010.00103.x. PMID: 21743766; PMCID: PMC3130338.
LSTM networks, highlighting its effectiveness and resilience.
The present study significantly contributes to advancing missing 19. Rubin LH, Witkiewitz K, Andre JS, Reilly S. Methods for Handling Missing
values imputation approaches by providing a detailed comparative Data in the Behavioral Neurosciences: Don’t Throw the Baby Rat out with the
Bath Water. J Undergrad Neurosci Educ. 2007 Spring;5(2):A71-7. Epub 2007
analysis of transformer-based and conventional methods. In light Jun 15. PMID: 23493038; PMCID: PMC3592650.
of the difficulties associated with missing data, the suggested
20. Rubin DB. Inference and missing data. Biometrika. 1976; 63(3):581-592.
approach closes a large gap in the literature and offers a viable path
toward more trustworthy data analysis. 21. Uusitalo L, Lehikoinen A, Helle I, Myrberg K. An overview of methods
to evaluate uncertainty of deterministic models in decision support.
References Environmental Modelling & Software. 2015; 63:24-31.
22. Kabir G, Tesfamariam S, Hemsing J, Sadiq R. Handling incomplete and missing
1. Du J, Hu M, Zhang W. Missing data problem in the monitoring system: A
data in water network database using imputation methods. Sustainable and
review. IEEE Sensors Journal. 2020; 20(23):13984-13998.
Resilient Infrastructure. 2020; 5(6):365-377.
2. Alruhaymi AZ, Kim CJ. Study on the Missing Data Mechanisms and
Imputation Methods. Open Journal of Statistics. 2021; 11(4):477-492. 23. Yu L, Zhou R, Chen R, Lai KK. Missing data preprocessing in credit
classification: One-hot encoding or imputation?. Emerging Markets Finance
3. Liu J, Pasumarthi S, Duffy B, Gong E, Datta K, Zaharchuk G. One Model to and Trade. 2022; 58(2):472-482.
Synthesize Them All: Multi-Contrast Multi-Scale Transformer for Missing
Data Imputation. IEEE Trans Med Imaging. 2023 Sep;42(9):2577-2591. doi: 24. Al-Helali B, Chen Q, Xue B, Zhang M. A new imputation method based
10.1109/TMI.2023.3261707. Epub 2023 Aug 31. PMID: 37030684; PMCID: on genetic programming and weighted KNN for symbolic regression with
PMC10543020. incomplete data. Soft Computing. 2021; 25:5993-6012.

4. Edelman BL, Goel S, Kakade S, Zhang C. Inductive biases and variable creation 25. Zhao Y, Long Q. Multiple imputation in the presence of high-
in self-attention mechanisms. In International Conference on Machine dimensional data. Stat Methods Med Res. 2016 Oct;25(5):2021-2035. doi:
Learning. PMLR. 2022; 5793-5831. 10.1177/0962280213511027. Epub 2013 Nov 25. PMID: 24275026.

5. Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in 26. Huque MH, Carlin JB, Simpson JA, Lee KJ. A comparison of multiple
Genome Data Analysis: A Comprehensive Review. Biology (Basel). 2023 Jul imputation methods for missing data in longitudinal studies. BMC Med Res
22;12(7):1033. doi: 10.3390/biology12071033. PMID: 37508462; PMCID: Methodol. 2018 Dec 12;18(1):168. doi: 10.1186/s12874-018-0615-6. PMID:
PMC10376273. 30541455; PMCID: PMC6292063.

ENGINEERING 030 January 23, 2024 - Volume 2 Issue 1

ISSN 2995-8067 DOI: 10.61927/igmin140

27. Horton NJ, Lipsitz SR, Parzen M. A potential for bias when rounding in 30. Mohammed MB, Zulkafli HS, Adam MB, Ali N, Baba IA. Comparison of five
multiple imputation. The American Statistician. 2003; 57(4):229-232. imputation methods in handling missing data in a continuous frequency table.
In AIP Conference Proceedings. AIP Publishing. 2021; 2355:1
28. Yi J, Lee J, Kim KJ, Hwang SJ, Yang E. Why not to use zero imputation?
correcting sparsity bias in training neural networks. arXiv preprint 31. Jadhav A, Pramod D, Ramanathan K. Comparison of performance of data
arXiv:1906.00150. 2019. imputation methods for numeric dataset. Applied Artificial Intelligence. 2019;
33(10):913-933.
29. Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A
survey on missing data in machine learning. J Big Data. 2021;8(1):140. doi: 32. Staudemeyer RC, Morris ER. Understanding LSTM--a tutorial into long short-
10.1186/s40537-021-00516-9. Epub 2021 Oct 27. PMID: 34722113; PMCID: term memory recurrent neural networks. arXiv preprint arXiv:1909.09586.
PMC8549433. 2019.

How to cite this article: Ayub H, Jamil H. Enhancing Missing Values Imputation through Transformer-Based Predictive Modeling. IgMin Res. Jan 23, 2024; 2(1): 025-031. IgMin ID:
igmin140; DOI: 10.61927/igmin140; Available at: www.igminresearch.com/articles/pdf/igmin140.pdf

ENGINEERING 031 January 23, 2024 - Volume 2 Issue 1

Publisher note: Thank you for providing this insightful research study—it’s a valuable asset that will empower us in our future undertakings.
INSTRUCTIONS FOR AUTHORS APC
IgMin Research - A BioMed & Engineering Open Access Journal is a prestigious multidisciplinary In addressing Article Processing Charges (APCs), IgMin Research: recognizes their signi icance in
journal committed to the advancement of research and knowledge in the expansive domains of Biology, facilitating open access and global collaboration. The APC structure is designed for affordability and
Medicine, and Engineering. With a strong emphasis on scholarly excellence, our journal serves as a transparency, re lecting the commitment to breaking inancial barriers and making scienti ic research
platform for scientists, researchers, and scholars to disseminate their groundbreaking indings and accessible to all.
contribute to the ever-evolving landscape of Biology, Medicine and Engineering disciplines.
At IgMin Research - A BioMed & Engineering Open Access Journal, fosters cross-disciplinary
For book and educational material reviews, send them to IgMin Research, at communication and collaboration, aiming to address global challenges. Authors gain increased exposure
[email protected]. The Copyright Clearance Centre’s Rights link program manages article and readership, connecting with researchers from various disciplines. The commitment to open access
permission requests via the journal’s website (https://www.igminresearch.com). Inquiries about ensures global availability of published research. Join IgMin Research - A BioMed & Engineering Open
Rights link can be directed to [email protected] or by calling +1 (860) 967-3839. Access Journal at the forefront of scienti ic progress.

https://www.igminresearch.com/pages/publish-now/author-guidelines https://www.igminresearch.com/pages/publish-now/apc
WHY WITH US
IgMin Research | A BioMed & Engineering Open Access Journal employs a rigorous peer-review process, ensuring the publication of high-quality research spanning STEM disciplines. The journal offers a
global platform for researchers to share groundbreaking indings, promoting scienti ic advancement.

JOURNAL INFORMATION
Journal Full Title: IgMin Research-A BioMed & Engineering Open Access Journal Regularity: Monthly License: Open Access by IgMin Research is
Journal NLM Abbreviation: IgMin Res Review Type: Double Blind licensed under a Creative Commons Attribution 4.0
Journal Website Link: https://www.igminresearch.com Publication Time: 14 Days International License. Based on a work at IgMin
Topics Summation: 150 GoogleScholar: https://www.igminresearch.com/gs Publications Inc.
Subject Areas: Biology, Engineering, Medicine and General Science Plagiarism software: iThenticate Online Manuscript Submission:
Organized by: IgMin Publications Inc. Language: English https://www.igminresearch.com/submission or can be
Collecting capability: Worldwide mailed to [email protected]

Project
91% (11)
Project
20 pages
Jds 1135
No ratings yet
Jds 1135
13 pages
c9e1efe8cf6a011c5bdc83aaee7b78650ec4dc68229e9578be602b3b36291df5
No ratings yet
c9e1efe8cf6a011c5bdc83aaee7b78650ec4dc68229e9578be602b3b36291df5
15 pages
2 PB
No ratings yet
2 PB
10 pages
Daba Research Proposal
100% (1)
Daba Research Proposal
31 pages
SAITS (Attention)
No ratings yet
SAITS (Attention)
22 pages
A Modified Deep Residual-Convolutional Neural Netw
No ratings yet
A Modified Deep Residual-Convolutional Neural Netw
23 pages
Time Series Data Imputation - A Survey On Deep Learning Approaches
No ratings yet
Time Series Data Imputation - A Survey On Deep Learning Approaches
9 pages
Intermediate Machine Learning
No ratings yet
Intermediate Machine Learning
12 pages
Missing Imput Values
No ratings yet
Missing Imput Values
2 pages
V Rin (Vae RNN)
No ratings yet
V Rin (Vae RNN)
11 pages
McCombe Etal Supplementary Materials 2021
No ratings yet
McCombe Etal Supplementary Materials 2021
6 pages
Neurocomputing: Vadlamani Ravi, Mannepalli Krishna
No ratings yet
Neurocomputing: Vadlamani Ravi, Mannepalli Krishna
8 pages
Missing Value Imputation Using Hybrid K-Means and Association Rules
No ratings yet
Missing Value Imputation Using Hybrid K-Means and Association Rules
9 pages
Reviewing Autoencoders For Missing Data Imputation Technical Trends, Applications and Outcomes
No ratings yet
Reviewing Autoencoders For Missing Data Imputation Technical Trends, Applications and Outcomes
31 pages
Machine Learning Based Missing Data Imputation
No ratings yet
Machine Learning Based Missing Data Imputation
13 pages
A Method For Missing Values Imputation of Machine Learning Datasets
No ratings yet
A Method For Missing Values Imputation of Machine Learning Datasets
11 pages
Sefidian2018 PDF
No ratings yet
Sefidian2018 PDF
61 pages
J Patrec 2015 08 023
No ratings yet
J Patrec 2015 08 023
9 pages
Uniglobe College: Lesson Plan: Business Statistics
No ratings yet
Uniglobe College: Lesson Plan: Business Statistics
6 pages
Spatio-Temporal Included Paper
No ratings yet
Spatio-Temporal Included Paper
16 pages
Missing Value Imputation Via Clusterwise Linear Regression
No ratings yet
Missing Value Imputation Via Clusterwise Linear Regression
13 pages
Meta-Learning-Based Approach For IoT Data Analytics
No ratings yet
Meta-Learning-Based Approach For IoT Data Analytics
9 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
No ratings yet
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
16 pages
An Analysis of Four Missing Data Treatment Methods
No ratings yet
An Analysis of Four Missing Data Treatment Methods
13 pages
Emmanuel 2021 A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel 2021 A Survey On Missing Data in Machine Learning
37 pages
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
No ratings yet
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
8 pages
A Robust Missing Value Imputation Method Mifoimpute For Incomplete Molecular Descriptor Data and Comparative Analysis With Other Missing Value Imputation Methods
No ratings yet
A Robust Missing Value Imputation Method Mifoimpute For Incomplete Molecular Descriptor Data and Comparative Analysis With Other Missing Value Imputation Methods
12 pages
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
37 pages
Data Cleaning - Project Work
No ratings yet
Data Cleaning - Project Work
10 pages
SICE: An Improved Missing Data Imputation Technique: Open Access Research
No ratings yet
SICE: An Improved Missing Data Imputation Technique: Open Access Research
21 pages
Missing Value
No ratings yet
Missing Value
11 pages
Missing Values
No ratings yet
Missing Values
3 pages
Imputation
No ratings yet
Imputation
3 pages
Bba
No ratings yet
Bba
167 pages
6 Different Ways To Compensate For Missing Values in A Dataset
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
12 pages
Fintech and Disruptions-An Impact Assessment
No ratings yet
Fintech and Disruptions-An Impact Assessment
9 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
Data Imputation For Missing Values
No ratings yet
Data Imputation For Missing Values
14 pages
Fuzzy Based Techniques For Handling Missing Values
No ratings yet
Fuzzy Based Techniques For Handling Missing Values
6 pages
Federated Learning - Hope and Scope
No ratings yet
Federated Learning - Hope and Scope
4 pages
Updated ABC Document
No ratings yet
Updated ABC Document
1 page
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
DT - Missing Values
No ratings yet
DT - Missing Values
11 pages
Updated ABC Document
No ratings yet
Updated ABC Document
3 pages
Lec 45
No ratings yet
Lec 45
9 pages
NNDL - Unit - I Notes
No ratings yet
NNDL - Unit - I Notes
23 pages
6 Different Ways To Compensate For Missing Values in A Dataset (Data Imputation With Examples) - by Will Badr - Towards Data Science
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset (Data Imputation With Examples) - by Will Badr - Towards Data Science
10 pages
Imputation Visualization Accuracy
No ratings yet
Imputation Visualization Accuracy
5 pages
"Deep" Learning For Missing Value Imputation in Tables With Non-Numerical Data
No ratings yet
"Deep" Learning For Missing Value Imputation in Tables With Non-Numerical Data
9 pages
Fin Math
100% (1)
Fin Math
151 pages
Imputation For All Kinds of Data
No ratings yet
Imputation For All Kinds of Data
1 page
Impacts of Globalization On Communication - A Study
No ratings yet
Impacts of Globalization On Communication - A Study
17 pages
Imputation
No ratings yet
Imputation
10 pages
Ads Exp2
No ratings yet
Ads Exp2
3 pages
Imet131 K Chapitre 7
No ratings yet
Imet131 K Chapitre 7
25 pages
IJDKP
No ratings yet
IJDKP
17 pages
Missing Data Handling
No ratings yet
Missing Data Handling
19 pages
Business Analytics ST1
No ratings yet
Business Analytics ST1
13 pages
Factorial and Fractional Factorial Experiments For Process Design and Improvement
No ratings yet
Factorial and Fractional Factorial Experiments For Process Design and Improvement
80 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
M Akaba 2019
No ratings yet
M Akaba 2019
7 pages
Alchemyst Data Science and Machine Learning Program
No ratings yet
Alchemyst Data Science and Machine Learning Program
4 pages
Adsl Exp 3 2024
No ratings yet
Adsl Exp 3 2024
11 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
Notes Part 2
No ratings yet
Notes Part 2
101 pages
Free Fall Lab
No ratings yet
Free Fall Lab
24 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Deformation Resistance of Asphalt Mixtures by The Wheel Tracking Test
No ratings yet
Deformation Resistance of Asphalt Mixtures by The Wheel Tracking Test
6 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Business Analytics: With Certification From NSE Academy
No ratings yet
Business Analytics: With Certification From NSE Academy
8 pages
What's New in Econometrics? Difference-in-Differences Estimation
No ratings yet
What's New in Econometrics? Difference-in-Differences Estimation
31 pages
6 Different Ways To Compensate For Missing Values in A Dataset (Data Imputation With Examples)
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset (Data Imputation With Examples)
10 pages
LN Strategic Management & Define
No ratings yet
LN Strategic Management & Define
99 pages
6 Different Ways To Compensate For Missing Values in A Dataset
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
6 pages
RASS: A Real-Time, Accurate, and Scalable System For Tracking Transceiver-Free Objects
No ratings yet
RASS: A Real-Time, Accurate, and Scalable System For Tracking Transceiver-Free Objects
13 pages
Association Between Proactive Personality
No ratings yet
Association Between Proactive Personality
10 pages
Nu - Edu.kz Econometrics-I Assignment 3 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 3 Answer Key
7 pages
Phan
No ratings yet
Phan
50 pages
Unit II. Methods and Techniques For Data Analytics
No ratings yet
Unit II. Methods and Techniques For Data Analytics
91 pages
Sample Data Sets For Linear Regression1
0% (1)
Sample Data Sets For Linear Regression1
3 pages
YOU Will Need It!
No ratings yet
YOU Will Need It!
19 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
Artificial Intelligence in Gravel Packing
No ratings yet
Artificial Intelligence in Gravel Packing
22 pages
A Case Study of 'High-Failure Rate' Mathematics Courses and Its' Contributing Factors On UiTM Sarawak Diploma Students
100% (3)
A Case Study of 'High-Failure Rate' Mathematics Courses and Its' Contributing Factors On UiTM Sarawak Diploma Students
11 pages
Accident Data Analysis Using Statistical Methods
No ratings yet
Accident Data Analysis Using Statistical Methods
8 pages
The Antioxidant and Antidepressant Properties of Dietary Proteins Derived From Egg and Bean Extracts and Their Acute Toxicity: A Journey From Nutrition To Pharmacognosy
No ratings yet
The Antioxidant and Antidepressant Properties of Dietary Proteins Derived From Egg and Bean Extracts and Their Acute Toxicity: A Journey From Nutrition To Pharmacognosy
4 pages
Inferential Statistics
No ratings yet
Inferential Statistics
2 pages
Data Analyst Roadmap 1687174001
No ratings yet
Data Analyst Roadmap 1687174001
8 pages
0-Cheatsheet Capstone Part 1
No ratings yet
0-Cheatsheet Capstone Part 1
4 pages
Reliability Evaluation of Professional Assessments
No ratings yet
Reliability Evaluation of Professional Assessments
22 pages
Strengthening IoT Network Protocols: A Model Resilient Against Cyber Attacks
No ratings yet
Strengthening IoT Network Protocols: A Model Resilient Against Cyber Attacks
13 pages
A Comprehensive Methodology For Assessing The Business Reputation of Industrial and Production Personnel
No ratings yet
A Comprehensive Methodology For Assessing The Business Reputation of Industrial and Production Personnel
14 pages
The Kazakh Language Requires Reform of Its Writing
No ratings yet
The Kazakh Language Requires Reform of Its Writing
12 pages
Methodology of The Professional-Business Game For The Development of A Cadet Leader in Professional Training Courses (L-1B) of The Tactical Level of Military Education
No ratings yet
Methodology of The Professional-Business Game For The Development of A Cadet Leader in Professional Training Courses (L-1B) of The Tactical Level of Military Education
11 pages
Influence of Polycarboxylate Superplasticizer On The Calorimetric and Physicomechanical Properties of Mortar
No ratings yet
Influence of Polycarboxylate Superplasticizer On The Calorimetric and Physicomechanical Properties of Mortar
4 pages
The Policy Development and Current Situation of Information Technology Education in Taiwan
No ratings yet
The Policy Development and Current Situation of Information Technology Education in Taiwan
8 pages
Climate Changes and Mango Production (Temperature)
No ratings yet
Climate Changes and Mango Production (Temperature)
4 pages
Properties of Indium Antimonide Nanocrystals As Nanoelectronic Elements
No ratings yet
Properties of Indium Antimonide Nanocrystals As Nanoelectronic Elements
7 pages
Efficacy of Different Concentrations of Insect Growth Regulators (IGRs) On Maize Stem Borer Infestation
No ratings yet
Efficacy of Different Concentrations of Insect Growth Regulators (IGRs) On Maize Stem Borer Infestation
7 pages
Lifestyle and Well-Being Among Portuguese Firefighters
No ratings yet
Lifestyle and Well-Being Among Portuguese Firefighters
7 pages
Educational Innovation Amidst Globalization: Higher Education Institutions and Societal Integration
No ratings yet
Educational Innovation Amidst Globalization: Higher Education Institutions and Societal Integration
7 pages
Designing A Compact High-Precision Positioner With Large Stroke Capability For Nanoindentation Devices
No ratings yet
Designing A Compact High-Precision Positioner With Large Stroke Capability For Nanoindentation Devices
7 pages
The Model For Clinical, Laboratory, and Genetic Prediction of Recurrent Ischemic Stroke Against The Background of Laboratory Aspirin Resistance Using Machine Learning
No ratings yet
The Model For Clinical, Laboratory, and Genetic Prediction of Recurrent Ischemic Stroke Against The Background of Laboratory Aspirin Resistance Using Machine Learning
6 pages
Exploring Upper Limb Kinematics in Limited Vision Conditions: Preliminary Insights From 3D Motion Analysis and IMU Data
No ratings yet
Exploring Upper Limb Kinematics in Limited Vision Conditions: Preliminary Insights From 3D Motion Analysis and IMU Data
6 pages
Deep Semantic Segmentation New Model of Natural and Medical Images
No ratings yet
Deep Semantic Segmentation New Model of Natural and Medical Images
4 pages
Solar Energy Resource Potentials of The City of Arkadag
No ratings yet
Solar Energy Resource Potentials of The City of Arkadag
6 pages
Efficient Room Temperature Ethanol Vapor Sensing by Unique Fractal Features of Tin Oxide
No ratings yet
Efficient Room Temperature Ethanol Vapor Sensing by Unique Fractal Features of Tin Oxide
5 pages
Association and New Therapy Perspectives in Post-Stroke Aphasia With Hand Motor Dysfunction
No ratings yet
Association and New Therapy Perspectives in Post-Stroke Aphasia With Hand Motor Dysfunction
4 pages
Roadmap For Renewable Energy Development
No ratings yet
Roadmap For Renewable Energy Development
4 pages
Homologous Series of Chemical Compounds in Three-Component Systems (Aa+ - BB+ - CC-) and (Zn2+ - Ge4+ - P3-) in Generalized Form
No ratings yet
Homologous Series of Chemical Compounds in Three-Component Systems (Aa+ - BB+ - CC-) and (Zn2+ - Ge4+ - P3-) in Generalized Form
4 pages
Benzo (4',5') Imidazo (2',1':6,1) Pyrido (2,3-d) Pyrimidines: Past and Present
No ratings yet
Benzo (4',5') Imidazo (2',1':6,1) Pyrido (2,3-d) Pyrimidines: Past and Present
4 pages
On The Governing Equations For Velocity and Shear Stress of Some Magnetohydrodynamic Motions of Rate-Type Fluids and Their Applications
No ratings yet
On The Governing Equations For Velocity and Shear Stress of Some Magnetohydrodynamic Motions of Rate-Type Fluids and Their Applications
3 pages
The Contribution of Medical Periodicals To The Development of Pediatric Science in Modern Conditions
No ratings yet
The Contribution of Medical Periodicals To The Development of Pediatric Science in Modern Conditions
3 pages
Enforcement and Enlargement of The Saccharomyces Cerevisiae Endoplasmic Reticulum Through Artificial Evocation of The Unfolded Protein Response
No ratings yet
Enforcement and Enlargement of The Saccharomyces Cerevisiae Endoplasmic Reticulum Through Artificial Evocation of The Unfolded Protein Response
3 pages
A Survey of Motion Data Processing and Classification Techniques Based On Wearable Sensors
No ratings yet
A Survey of Motion Data Processing and Classification Techniques Based On Wearable Sensors
3 pages
Wishful Thinking or Valuable Forecasts? The Value of Policy Rate Predictions in Sweden
No ratings yet
Wishful Thinking or Valuable Forecasts? The Value of Policy Rate Predictions in Sweden
3 pages
The Educational Role of Cinema in Physical Sciences
No ratings yet
The Educational Role of Cinema in Physical Sciences
2 pages
Fluent Simulation and Modeling Techniques: Definitive Reference for Developers and Engineers
From Everand
Fluent Simulation and Modeling Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Enhancing Missing Values Imputation Through Transformer-Based Predictive Modeling

Uploaded by

Enhancing Missing Values Imputation Through Transformer-Based Predictive Modeling

Uploaded by

ENGINEERING ISSN 2995-8067

Information Technology | Artiϐicial Intelligence | Data Engineering

Article Information Research Article

Copyright license: © 2024 Ayub H, et al. This is an open access

*Correspondence: Harun Jamil, Department of Electronics Engineering, Jeju National

Introduction a paradigm shift from rule-based imputation methods to a more

ENGINEERING 026 January 23, 2024 - Volume 2 Issue 1

ENGINEERING 027 January 23, 2024 - Volume 2 Issue 1

our suggested transformer-based imputation, as shown in Figure

Figure 2: Validation process of the Imputed data using LSTM.

ENGINEERING 028 January 23, 2024 - Volume 2 Issue 1

Figure 3: R2 score analysis for the selected imputation methods.

Figure 4: MAPE analysis for the selected imputation methods.

ENGINEERING 029 January 23, 2024 - Volume 2 Issue 1

ENGINEERING 030 January 23, 2024 - Volume 2 Issue 1

ENGINEERING 031 January 23, 2024 - Volume 2 Issue 1

You might also like