0% found this document useful (0 votes)

17 views5 pages

Integrated ECOD-KNN Algorithm For Missing Values Imputation in Datasets: Outlier Removal

Missing data cause the incompleteness of data sets and can lead to poor performance of models which also can result in poor decisions, despite using the best handling methods. When there is a presence of outliers in the data, using KNN algorithm for missing values imputation produce less accurate results. Outliers are anomalies from the observations and removing outliers is one of the most important pre-processing step in all data analysis models.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Integrated ECOD-KNN Algorithm For Missing Values Imputation in Datasets: Outlier Removal

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1459

Integrated ECOD-KNN Algorithm for Missing

Values Imputation in Datasets: Outlier Removal
Tsitsi Jester Mugejo1 Weston Govere2
Department of Cloud Computing Department of Cloud Computing
School of Information Science and Technology, Harare School of Information Science and Technology,
Institute of Technology Harare Institute of Technology
Harare, Zimbabwe Harare, Zimbabwe

Abstract:- Missing data cause the incompleteness of data (IForest), how the outliers where substituted using the median
sets and can lead to poor performance of models which also of the non-outlier data and the imputation of missing values
can result in poor decisions, despite using the best handling using KNN algorithm in a single model.
methods. When there is a presence of outliers in the data,
using KNN algorithm for missing values imputation Outliers are anomalies from the observations and
produce less accurate results. Outliers are anomalies from removing outliers is one of the important pre-processing step in
the observations and removing outliers is one of the most all data analysis models (1). It is important to first identify the
important pre-processing step in all data analysis models. outliers, in this paper, using the Outlier Detection, to be able to
KNN algorithms are able to adapt to missing value remove or substitute them. Data is bound to have some noisy
imputation even though they are sensitive to outliers, data or rather outliers which definitely affect the KNN missing
which might end up affecting the quality of the imputation value imputation process and the performance of the trained
results. KNN is mainly used among other machine learning models (2). It is of most importance to filter all the noisy data
algorithms because it is simple to implement and have a from any training dataset. This step should be the first before
relatively high accuracy. In the literature, various studies imputing missing values to have more accurate results.
have explored the application of KNN in different Imputation result will not be good enough if performed before
domains, however failing to address the issue of how outlier handling.
sensitive it is to outliers. In the proposed model, outliers
are identified using a combination of the Empirical- Missing values are important when involving big data (3),
Cumulative-distribution-based Outlier Detection (ECOD), which is very large amounts of data or large datasets which
Local Outlier Factor (LOF) and isolation forest (IForest). requires analysis and storage. Missing values generally pose a
The outliers are substituted using the median of the non- weakness to models (4) as they affect the quality of results
outlier data and the imputation of missing values is done especially with prediction systems. In the pre-processing stage
using the k-nearest neighbors algorithm. For the of datasets with numeric values, it is noted that one of the main
evaluation of the model, different metrics were used such challenges is the processing of missing values. So it is
as the Root Mean Square Error (RMSE), (MSE), R2 important to deal with missing values in our datasets during
squared (R2) and Mean Absolute Error (MAE). It clearly pre-processing (5).
indicated that dealing with outliers first before imputing
missing values produces better imputation results than Challenges may also arise from choosing the wrong
just using the traditional KNN technique which is sensitive handling method of missing values (6) and this also affect the
to outliers. effectiveness of any model. Previous studies have provided
information on imputation using the KNN algorithm and other
Keywords:- Imputation; Outlier; Missing Values; Incomplete; various extensions of the algorithm, however failing to
Algorithm. consider outlier detection and normalization before the missing
value imputation process. The performance of the KNN
I. INTRODUCTION imputation method can be greatly improved with solving
outliers and normalization of the data (7). It has been proved
Missing data cause the incompleteness of data sets and that using normalization and imputation mean together is more
can lead to poor performance of models which also can result accurate than the original mean and median methods (8).
in poor decisions, despite using the best handling methods.
Analyses of datasets containing missing values can perpetuate This study is taking note of the outliers, by firstly
deriving actions from a biased model. In this paper, we want to detecting them using ECOD and substituting them using mean
reveal the impact of how solving missing data using KNN as it is more effective and then proceeding to impute the
algorithm may produce less accurate results, especially when missing values in the datasets, for an improved accuracy of the
there is a presence of outliers in the data. Additionally, there is imputation result. It has been noted that this combination has
a demonstration of how outliers can be identified using the never been used in previous studies of imputation using KNN
Empirical-Cumulative-distribution-based Outlier Detection or other imputation methods, although it has been proved by
(ECOD), Local Outlier Factor (LOF) and isolation forest

IJISRT24JUL1459 www.ijisrt.com 2307

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1459

this paper to improve the accuracy of the missing values In the context of movie recommender systems, a
imputation process. comparative study was conducted (14) on pre-processing
algorithms for Singular Value Decomposition (SVD) to help
II. LITERATURE REVIEW data managers choose the most suitable algorithm for their
business needs (15). This study underscores the importance of
Many studies have been done to solve the issue of missing selecting the right imputation method to enhance the accuracy
values in datasets. Incompleteness of data is handled depending and reliability of data analysis. Furthermore, in the medical
on the type and requirements mainly. The two main methods of field, (16) the study focused on missing value estimation
imputation are statistics and machine learning. These methods methods for arrhythmia classification, emphasizing the
then generate values or approximations from the observable significance of handling missing values in datasets to ensure
variables in order to replace the missing values (9). KNN is accurate classification results (18).
mainly used among other machine learning algorithms because
it is simple to implement and have a relatively high accuracy. Furthermore, a novel KNN variant (KNNV) algorithm
In the literature, various studies have explored the application was introduced (17) for accurate classification of COVID-
of KNN in different domains. 19 based on incomplete heterogeneous data, showcasing
improved results through experimental work (18). The KNNV
This highlights the adaptability of KNN algorithms in algorithm addresses incompleteness by imputation and
addressing missing values and improving classification heterogeneity by converting categorical data into numerical
accuracy in diverse applications. In summary, the literature values. Moreover, a hybrid missing data imputation method
review showcases the significance of using the KNN algorithm called KI was proposed (19), which combines k-nearest
for imputing missing values in datasets across various domains, neighbors and iterative imputation algorithms to address
including proteomics, recommendation systems, medical missing values effectively (20). This approach leverages
diagnostics, and other domains. similarity learning techniques to impute missing data
accurately.
In the field of proteomics, (10) it is highlighted that there
is a complexity of identifying the subcellular locations of Fig 1 shows the experimental design of the systems of
proteins, especially when proteins can exist in multiple most of the current studies (21). The studies show that they
locations simultaneously (11). To address missing values in introduce missing values as the first step, if a dataset with no
proteomic data, (12) the Cluster-based KNN (CKNN) missing values is being used. An imputation algorithm is then
imputation method was introduced, which incorporates local picked for the imputation process. The imputed result is then
data clustering for improved quality and efficiency (13). evaluated using various different metrics.

Fig 1 Block Diagram of the Proposed System Experimental Design.

Overall, the literature highlights the significance of the studies discussed emphasize the importance of selecting
KNN algorithm in imputing missing values in datasets. appropriate imputation methods, which all involve KNN, to
Researchers have developed novel approaches, such as the enhance data quality, analysis accuracy, and classification
MKDF-WKNN classifier and KNNV algorithm, to enhance the performance. However, all the KNN algorithms used by
accuracy of classification models when dealing with different researchers fail to address the issue that KNN is
incomplete data (22). Additionally, hybrid methods like KI sensitive to outliers and can affect the result of the missing
have been proposed to improve missing data imputation by values imputation process.
incorporating k-nearest neighbors and iterative algorithms. The

IJISRT24JUL1459 www.ijisrt.com 2308

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1459

III. METHODOLOGY non-outlier data and the imputation of missing values is done
using the k-nearest neighbors algorithm. It identifies the 'k'
The first step was data exploration. Pandas was used for nearest data points to the missing value based on a distance
the data frames which makes it easy to work with structured metric. For numerical data, the mean of these neighbors is used
data. To support the large datasets and calculate the average to replace the missing value. For categorical data, the most
arithmetic set of values in the datasets, Numpy was also used frequent category (mode) among the neighbors is used. This
and for plotting graphs for visual comprehension Matplot was approach leverages the similarity between data points to
also used. The outliers are identified using a combination of the provide a more accurate imputation compared to simple mean
Empirical-Cumulative-distribution-based Outlier Detection or mode imputation. Fig 2 illustrates the experimental design
(ECOD), Local Outlier Factor (LOF) and isolation forest of the proposed system.
(IForest). The outliers are substituted using the median of the

Fig 2 Block Diagram of the Proposed System Experimental Design.

 Dataset Preparation  Evaluation Criteria

This experiment was implemented using five datasets For the evaluation of the model, different metrics were
from the Kaggle Website tabulated in Table 1. The datasets used such as the Root Mean Square Error (RMSE), (MSE), R2
were loaded for feature extraction and standardization of the squared (R2) and Mean Absolute Error (MAE).
features. Preprocessing to check if outliers and missing values
were present was done. This would then lead to next step of  RMSE metric is used in machine learning to compute the
Outlier Analysis where the emphasis of the experiment is. This difference between the observed value and the imputed
would involve detecting outliers and substituting them with the value.
results from the analysis.  MSE is used to measure the average of the squared
differences between the predicted values and actual target
values. The lower the MSE, the closer the models results
are to the true values.

Table 1 Details of the Datasets used

Details of the datasets used
Dataset No.
Dataset Name Rows Attributes
1 Dissolved O2 River Water 3500 37
2 Crop Recommendation 1470 20
3 Online Course Engagement 4650 12
4 Health Care Diabetes 1460 6
5 Amazon cell phone and accessories 10448570 12

IJISRT24JUL1459 www.ijisrt.com 2309

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1459

 R2 or the Coefficient of Determination is another metric IV. RESULTS AND DISCUSSION

used to evaluate the model’s goodness. It is known as the
Goodness of fit. R2 squared score move towards one, which The proposed model consists of outlier removal and
means that regression line moves towards perfection. It was imputation whilst the other KNN imputation techniques does
used in this study because of its ability to measure not take outlier removal in consideration. The results in table
variability. 2-6 below show both the proposed model and basic KNN
 MAE, which is the mean absolute error matches the error evaluation results for comparison.
value units to the predicted target value units. MAE has
changes that are intuitive and it penalize large errors more. Of all the metrics used, the proposed model seem to have
The square of the error value increases or inflates the mean better results compared to KNN, as shown from the above
error value. tables of various datasets. RMSE evaluation had the worst
results for both models and in all the datasets, but better for the
proposed model based on the simulation results.

Table 2 Dissolved O2 River Water Results

Dissolved O2 River Water Results
Metric ECOD-KNN(Proposed System) KNN
RMSE 1.775 1.958
MSE 3.246 3.834
R2 0.542 0.548
MAE 1.320 1.425

Table 3 Online Course Engagement Data Dataset

Online Course Engagement Data Dataset
Metric ECOD-KNN(Proposed System) KNN
RMSE 1.523 2.518
MSE 0.223 1.331
R2 0.742 0.948
MAE 1.202 1.282

Table 4 Crop Recommendation Dataset

Crop Recommendation Dataset
Metric ECOD-KNN(Proposed System) KNN
RMSE 0.923 1.210
MSE 0.389 1.765
R2 0.427 0.812
MAE 1.897 3.423

Table 5 Health Care Diabetes Dataset

Health Care Diabetes Dataset
Metric ECOD-KNN(Proposed System) KNN
RMSE 1.302 1.838
MSE 0.482 0.935
R2 0.923 1.275
MAE 1.193 1.585

Table 6 Amazon of cell phone and accessories Product Ratings Dataset

Amazon of cell phone and accessories Product Ratings Dataset
Metric ECOD-KNN(Proposed System) KNN
MSE 0.0153 0.0529
R2 0.9645 0.9742
MAE 0.046 0.245

V. CONCLUSION widely used to impute missing values among other techniques.

However one of the disadvantages is that, it sensitive to
Important information goes missing when a dataset has outliers, which was the focus of this study. The study focused
missing values. Missing values have to be imputed to avoid on detecting outliers using a combination of Local Outlier
such scenarios. Imputing missing values insures that the dataset Factor (LOF), isolation forest (IForest) and ECOD. After
is complete and this will help the various models to produce averaging the detectors, for the outliers result, they are replaced
accurate results where decision making is based on. KNN is in dataset with median of the non-outlier data. The k-nearest

IJISRT24JUL1459 www.ijisrt.com 2310

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1459

neighbor’s algorithm is then used to impute the missing values. [13]. S. Patra, and B. Ganguly; "Improvising Singular Value
After testing the model with more than 5 different datasets, the Decomposition By KNN for Use in Movie
evaluation criteria using RMSE, MSE, R2 and MAE clearly Recommender Systems", JOURNAL OF
indicated that dealing with outliers first before imputing OPERATIONS AND STRATEGIC PLANNING,
missing values produces better imputation results than just 2019.
using the traditional KNN technique which is sensitive to [14]. N. Rabiei, A.R. Soltanian, M. Farhadian, and F.
outliers. Despite the good performance of the proposed ECOD- Bahreini; "The Performance Evaluation of The
KNN model, there are other missing value imputation Random Forest Algorithm for A Gene Selection in
techniques that can perform better. Also, KNN operates by Identifying Genes Associated with Resectable
memorizing the entire dataset which can be a disadvantage. Pancreatic Cancer in Microarray Dataset: A
Retrospective Study", CELL JOURNAL, 2023.
REFERENCES [15]. F. Yang, J. Du, J. Lang, W. Lu, L. Liu, C. Jin, and Q.
Kang; "Missing Value Estimation Methods Research
[1]. H. Nugroho, N.P Utama, and K. Surendro, for Arrhythmia Classification Using The Modified
“Normalization and outlier removal in class Kernel Difference-Weighted KNN Algorithms",
center‑based firefly algorithm for missing value BIOMED RESEARCH INTERNATIONAL, 2020.
imputation,” Open Access, J Big Data, (2021)8:129. (IF: 3)
[2]. D. Chehal, P. Gupta, P. Gulati, and T. Gupta, [16]. Z. Zhang, "Introduction To Machine Learning: K-
“Comparative Study of Missing Value Imputation nearest Neighbors", ANNALS OF
Techniques on E Commerce Product Ratings,” TRANSLATIONAL MEDICINE, 2016. (IF: 7)
Informatica 47 (2023) 373–382. [17]. A. Hamed, A. Sobhy, and H. Nassar; "Accurate
[3]. A.F. Sallaby, Azlan, “Analysis of Missing Value Classification of COVID-19 Based on Incomplete
Imputation Application with K-Nearest Neighbor (K- Heterogeneous Data Using A KNN Variant
NN) Algorithm in Dataset,” (International Journal of Algorithm", ARABIAN JOURNAL FOR SCIENCE
Informatics and Computer Science) Vol 5 No 2, July AND ENGINEERING, 2021. (IF: 3)
2021, Page 141-144. [18]. N. Rabiei, A.R. Soltanian, M. Farhadian, and F.
[4]. P. Mishra, K.D. Mani, P. Johri, and D. Arya, “ FCMI: Bahreini; "The Performance Evaluation of The
Feature Correlation based Missing Data Imputation” Random Forest Algorithm for A Gene Selection in
[5]. I.S. Jacobs and C.P. Bean, “Fine particles, thin films Identifying Genes Associated with Resectable
and exchange anisotropy,” in Magnetism, vol. III, G.T. Pancreatic Cancer in Microarray Dataset: A
Rado and H. Suhl, Eds. New York: Academic, 1963, Retrospective Study"CELL JOURNAL, 2023,
pp. 271-350. [19]. ] M. Zaki, Shao-jie Chen, Jicheng Zhang, Fan
[6]. F. E. Harrell, Jr., “Regression Modeling Strategies,” Feng, Liu Qi, M.A. Mahdy, and Linlin
Nashville, TN, USA July 2015, ISSN 2197-568X Jin, "Optimized Weighted Ensemble Approach for
[7]. C. K. Enders, “Applied Missing Data Analysis,” Enhancing Gold Mineralization
Second Edition, 2022 pp1-43, Prediction", APPLIED SCIENCES, 2023.
[8]. M. Tannous, M. Miraglia, F. Inglese, L. Giorgini, F. [20]. S. Sheikhi; M.T. Kheirabadi, and A. Bazzazi; "A
Ricciardi, R. Pelliccia, M. Milazzo, and C. Novel Scheme for Improving Accuracy of KNN
Stefanini, "Haptic-based Touch Detection for Classification Algorithm Based on The New Weighting
Collaborative Robots in Welding Technique and Stepwise Feature Selection", 2020.
Applications", ROBOTICS COMPUT. INTEGR. [21]. M. Zhang, and W. Xu; "Study on An Improved Lie
MANUF., 2020. (IF: 3) Group Machine Learning-based Classification
[9]. L.Y. Wang, D. Wang; Y.H. Chen, "Prediction Of Algorithm", 2020 IEEE 3RD INTERNATIONAL
Protein Subcellular Multisite Localization Using A CONFERENCE OF SAFE PRODUCTION ..., 2020.
New Feature Extraction Method", GENETICS AND [22]. E.Y. Boateng, J. Otoo, and D.A. Abaye; "Basic
MOLECULAR RESEARCH : GMR, 2016 Tenets of Classification Algorithms K-Nearest-
[10]. F. Pirotti, R. Ravanelli, F. Fissore, and A. Masiero, Neighbor, Support Vector Machine, Random Forest
"Implementation and Assessment of Two Density- and Neural Network: A Review", 2020. (IF: 4)
based Outlier Detection Methods Over Large Spatial
Point Clouds", OPEN GEOSPATIAL DATA,
SOFTWARE AND STANDARDS, 2018. (IF: 3).
[11]. P. Keerin, W. Kurutach, and T. Boongoen, "Cluster-
based KNN Missing Value Imputation for DNA
Microarray Data", 2012 IEEE INTERNATIONAL
CONFERENCE ON SYSTEMS, MAN, AND ...,
2012. (IF: 3)
[12]. K.M. Fouad, M.M. Ismail, A.T. Azar, and M.M Arafa,
"Advanced Methods for Missing Values Imputation
Based on Similarity Learning," PEERJ. COMPUTER
SCIENCE, 2021. (IF: 3).

IJISRT24JUL1459 www.ijisrt.com 2311

Core Java - Munishwar Gulati
No ratings yet
Core Java - Munishwar Gulati
252 pages
Data Quality
100% (2)
Data Quality
16 pages
Use of ICT in Automobile Industry
100% (3)
Use of ICT in Automobile Industry
3 pages
Procedure For Paut
No ratings yet
Procedure For Paut
21 pages
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
No ratings yet
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
17 pages
Project Management Life Cycle
50% (2)
Project Management Life Cycle
5 pages
Assessment of Caregivers' Knowledge and Acceptance of The Human Papilloma Virus Vaccine in Maihula Community, Bali Lga, Taraba State, Nigeria
No ratings yet
Assessment of Caregivers' Knowledge and Acceptance of The Human Papilloma Virus Vaccine in Maihula Community, Bali Lga, Taraba State, Nigeria
8 pages
Pharmacological Evaluation of the Analgesic Potential of Eleusine indica (Poaceae) Ethanolic Root Extract
No ratings yet
Pharmacological Evaluation of the Analgesic Potential of Eleusine indica (Poaceae) Ethanolic Root Extract
15 pages
Rethinking Urban Mobility Through Public Parking Facilities in Yaounde : A Case Study of Mokolo, Yaounde
No ratings yet
Rethinking Urban Mobility Through Public Parking Facilities in Yaounde : A Case Study of Mokolo, Yaounde
17 pages
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
No ratings yet
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
18 pages
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
No ratings yet
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
17 pages
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
No ratings yet
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
12 pages
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
No ratings yet
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
11 pages
Invoice Template
No ratings yet
Invoice Template
5 pages
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
No ratings yet
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
8 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
Performance Metrics: - Bandwidth (Throughput) - Latency (Delay) - Bandwidth
No ratings yet
Performance Metrics: - Bandwidth (Throughput) - Latency (Delay) - Bandwidth
17 pages
Missing Values and Outliers in R-Software
No ratings yet
Missing Values and Outliers in R-Software
17 pages
MS OFFICE APPLICATION-ren
No ratings yet
MS OFFICE APPLICATION-ren
17 pages
A MATLAB-based Mean-Field-Type Games Toolbox: Continuous-Time
No ratings yet
A MATLAB-based Mean-Field-Type Games Toolbox: Continuous-Time
16 pages
Ilovepdf Merged 4c7bdd33 159a 4e13 8c6b De1ec358b3ca
No ratings yet
Ilovepdf Merged 4c7bdd33 159a 4e13 8c6b De1ec358b3ca
75 pages
DWDV Notes
No ratings yet
DWDV Notes
111 pages
Installation Procedure: EVC-E Volvo Penta IPS Triple: Typical Installation / Main Station
No ratings yet
Installation Procedure: EVC-E Volvo Penta IPS Triple: Typical Installation / Main Station
2 pages
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
37 pages
DWM Exp6 C49
No ratings yet
DWM Exp6 C49
15 pages
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
No ratings yet
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
9 pages
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
No ratings yet
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
4 pages
Math 5 Reviewer
100% (1)
Math 5 Reviewer
2 pages
Chorus Trio Expander User Manual Rev 1.8 en 05.2022
No ratings yet
Chorus Trio Expander User Manual Rev 1.8 en 05.2022
104 pages
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
No ratings yet
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
7 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Design and Implementation of Smart Dustbin for Automated Wet and Dry Waste Segregation
No ratings yet
Design and Implementation of Smart Dustbin for Automated Wet and Dry Waste Segregation
5 pages
A Method For Missing Values Imputation of Machine Learning Datasets
No ratings yet
A Method For Missing Values Imputation of Machine Learning Datasets
11 pages
Chapter 3
No ratings yet
Chapter 3
58 pages
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
No ratings yet
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
13 pages
SVM Data Imputation
No ratings yet
SVM Data Imputation
6 pages
SICE: An Improved Missing Data Imputation Technique: Open Access Research
No ratings yet
SICE: An Improved Missing Data Imputation Technique: Open Access Research
21 pages
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
No ratings yet
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
10 pages
SP916GK Manual
No ratings yet
SP916GK Manual
41 pages
Businnes Intelligence
No ratings yet
Businnes Intelligence
36 pages
Perception and Readiness of Graduate Level Students Toward E-Governance Implementation in Nepal: A Study at Far Western University
No ratings yet
Perception and Readiness of Graduate Level Students Toward E-Governance Implementation in Nepal: A Study at Far Western University
15 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
CN4 IP Primer
No ratings yet
CN4 IP Primer
79 pages
IJDKP
No ratings yet
IJDKP
17 pages
Molecular Insights into Prion Degradation in Creutzfeldt Jakob Disease’s Challenges and Future Directions: A Review
No ratings yet
Molecular Insights into Prion Degradation in Creutzfeldt Jakob Disease’s Challenges and Future Directions: A Review
13 pages
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
No ratings yet
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
8 pages
Osy Question Bank
No ratings yet
Osy Question Bank
8 pages
Multi CATE Adaptation
No ratings yet
Multi CATE Adaptation
53 pages
Emmanuel 2021 A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel 2021 A Survey On Missing Data in Machine Learning
37 pages
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
No ratings yet
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
7 pages
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
No ratings yet
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
7 pages
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
No ratings yet
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
7 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Key Management Services (KMS) Client Activation and Product Keys For Windows Server and Windows - Microsoft Learn
No ratings yet
Key Management Services (KMS) Client Activation and Product Keys For Windows Server and Windows - Microsoft Learn
8 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
46 pages
Data Visualization: (CA - 03) Name: Nikheel C Khadakabhavi Roll No: CS7 - 35 PRN: 202401110020 Batch: CS72
No ratings yet
Data Visualization: (CA - 03) Name: Nikheel C Khadakabhavi Roll No: CS7 - 35 PRN: 202401110020 Batch: CS72
20 pages
A Comprehensive Insight into Adult Congenital Heart Disease: A Battle of Survival into Adulthood
No ratings yet
A Comprehensive Insight into Adult Congenital Heart Disease: A Battle of Survival into Adulthood
11 pages
Novel Methods For Imputing Missing Values in Water Level Monitoring Data
No ratings yet
Novel Methods For Imputing Missing Values in Water Level Monitoring Data
28 pages
Unit 1
No ratings yet
Unit 1
21 pages
Managing Cardiovascular Toxicities in Cancer Therapy
No ratings yet
Managing Cardiovascular Toxicities in Cancer Therapy
9 pages
Promptsecure: Secure Prompt Engineering Protocols for Regulated Genai Environments
No ratings yet
Promptsecure: Secure Prompt Engineering Protocols for Regulated Genai Environments
9 pages
SAP Abap Quiz Part 2
No ratings yet
SAP Abap Quiz Part 2
10 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
Analysis of Anomaly and Novelty Detection in Time
No ratings yet
Analysis of Anomaly and Novelty Detection in Time
14 pages
An Analysis of Cognitive Flexibility and Student Engagement: Reimagining Teaching Strategies in Post-Pandemic Higher Education
No ratings yet
An Analysis of Cognitive Flexibility and Student Engagement: Reimagining Teaching Strategies in Post-Pandemic Higher Education
9 pages
188 1496475265 - 03-06-2017 PDF
No ratings yet
188 1496475265 - 03-06-2017 PDF
6 pages
DT - Missing Values
No ratings yet
DT - Missing Values
11 pages
An Analysis of Four Missing Data Treatment Methods
No ratings yet
An Analysis of Four Missing Data Treatment Methods
13 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
No ratings yet
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
7 pages
Missing Value Imputation Via Clusterwise Linear Regression
No ratings yet
Missing Value Imputation Via Clusterwise Linear Regression
13 pages
M Akaba 2019
No ratings yet
M Akaba 2019
7 pages
The Data Mining Based Model For Detection of Fraudulent Behavior in Water Consumption
No ratings yet
The Data Mining Based Model For Detection of Fraudulent Behavior in Water Consumption
5 pages
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
No ratings yet
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
8 pages
CC&BD Unit 4
No ratings yet
CC&BD Unit 4
12 pages
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
No ratings yet
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
6 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
Machine Learning Based Missing Data Imputation
No ratings yet
Machine Learning Based Missing Data Imputation
13 pages
SonicOS Enhanced 5.1 Log Event Reference Guide
No ratings yet
SonicOS Enhanced 5.1 Log Event Reference Guide
56 pages
Missing Value
No ratings yet
Missing Value
11 pages
R N N M - T S M V: Ecurrent Eural Etworks For Ultivari ATE IME Eries With Issing Alues
No ratings yet
R N N M - T S M V: Ecurrent Eural Etworks For Ultivari ATE IME Eries With Issing Alues
14 pages
Zinner Syndrome: A Radiological Case Report with Multimodal Imaging Insights
No ratings yet
Zinner Syndrome: A Radiological Case Report with Multimodal Imaging Insights
6 pages
Extreme Learning Machine For Missing Data Using Multiple Imputations
No ratings yet
Extreme Learning Machine For Missing Data Using Multiple Imputations
18 pages
Subject List 2017 18
No ratings yet
Subject List 2017 18
5 pages
Introduction To Software Testing Tools
100% (1)
Introduction To Software Testing Tools
3 pages
Impact of Yogic Intervention on Refractive Error Among Adolescents: An Experimental Study
No ratings yet
Impact of Yogic Intervention on Refractive Error Among Adolescents: An Experimental Study
5 pages
Baldwin 2020 The Shift To The Third Unbundling in The World
No ratings yet
Baldwin 2020 The Shift To The Third Unbundling in The World
13 pages
Recurrent Neural Networks For Multivariate Time Series With Missing Values
No ratings yet
Recurrent Neural Networks For Multivariate Time Series With Missing Values
12 pages
An Overview of Evans Syndrome–A Rare Disease
No ratings yet
An Overview of Evans Syndrome–A Rare Disease
5 pages
FAQ - ReCell
No ratings yet
FAQ - ReCell
7 pages
PAACDA Comprehensive Data Corruption Detection Algorithm
No ratings yet
PAACDA Comprehensive Data Corruption Detection Algorithm
8 pages
Comparison of Imputation Techniques After Classifying The Dataset Using KNN Classifier For The Imputation of Missing Data
No ratings yet
Comparison of Imputation Techniques After Classifying The Dataset Using KNN Classifier For The Imputation of Missing Data
4 pages
COSSy Manual
No ratings yet
COSSy Manual
10 pages
Bringing India to the Global Table: The Transformative Power of International Joint Ventures
No ratings yet
Bringing India to the Global Table: The Transformative Power of International Joint Ventures
4 pages
Data Prep
No ratings yet
Data Prep
5 pages
Arithmetic 2 Teacher Edition
No ratings yet
Arithmetic 2 Teacher Edition
8 pages
2 PB
No ratings yet
2 PB
10 pages
VNX - VNX 5100 Procedures-Replacing A SFP in A SP
No ratings yet
VNX - VNX 5100 Procedures-Replacing A SFP in A SP
15 pages
Article 2
No ratings yet
Article 2
9 pages
Integrative Approach to Type 1 Diabetes Mellitus: An Unani Perspective on Asbab-E-Sitta Zaruriya
No ratings yet
Integrative Approach to Type 1 Diabetes Mellitus: An Unani Perspective on Asbab-E-Sitta Zaruriya
3 pages
Time Series Data Imputation - A Survey On Deep Learning Approaches
No ratings yet
Time Series Data Imputation - A Survey On Deep Learning Approaches
9 pages
Engr 104: Computer Aided Engineering Course Syllabus
No ratings yet
Engr 104: Computer Aided Engineering Course Syllabus
4 pages
BA UNIT-3 - Part 1
No ratings yet
BA UNIT-3 - Part 1
4 pages
Artificial Intelligence Operated Elevator Using RL AIOERL
No ratings yet
Artificial Intelligence Operated Elevator Using RL AIOERL
4 pages
Fuzzy Based Techniques For Handling Missing Values
No ratings yet
Fuzzy Based Techniques For Handling Missing Values
6 pages
Paper 4-Imputation and Classification of Missing Data Using Least Square Support Vector Machines - A New Approach in Dementia Diagnosis
No ratings yet
Paper 4-Imputation and Classification of Missing Data Using Least Square Support Vector Machines - A New Approach in Dementia Diagnosis
6 pages
Data Cleaning Workshop:: Club Data Science and Cloud Computing
No ratings yet
Data Cleaning Workshop:: Club Data Science and Cloud Computing
6 pages
Lined Interjection Approach
No ratings yet
Lined Interjection Approach
7 pages
Da Theory 03
No ratings yet
Da Theory 03
2 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
Data Preprocessing For Supervised Leaning
No ratings yet
Data Preprocessing For Supervised Leaning
6 pages
Missing Data Imputation Using Singular Value Decomposition
No ratings yet
Missing Data Imputation Using Singular Value Decomposition
6 pages
1preparing Data
No ratings yet
1preparing Data
6 pages
K-Nearest Neighbor in Missing Data Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
No ratings yet
K-Nearest Neighbor in Missing Data Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
3 pages
Waltbh1617 Pinv00360
No ratings yet
Waltbh1617 Pinv00360
2 pages
Framework For Missing Value Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
No ratings yet
Framework For Missing Value Imputation: Ms.R.Malarvizhi, DR - Antony Selvadoss Thanamani
3 pages
CycloTouch R-Series Brochure V03
No ratings yet
CycloTouch R-Series Brochure V03
2 pages

Integrated ECOD-KNN Algorithm For Missing Values Imputation in Datasets: Outlier Removal

Uploaded by

Integrated ECOD-KNN Algorithm For Missing Values Imputation in Datasets: Outlier Removal

Uploaded by

Volume 9, Issue 7, July – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24JUL1459

Integrated ECOD-KNN Algorithm for Missing

IJISRT24JUL1459 www.ijisrt.com 2307

Fig 1 Block Diagram of the Proposed System Experimental Design.

IJISRT24JUL1459 www.ijisrt.com 2308

Fig 2 Block Diagram of the Proposed System Experimental Design.

 Dataset Preparation  Evaluation Criteria

Table 1 Details of the Datasets used

IJISRT24JUL1459 www.ijisrt.com 2309

 R2 or the Coefficient of Determination is another metric IV. RESULTS AND DISCUSSION

Table 2 Dissolved O2 River Water Results

Table 3 Online Course Engagement Data Dataset

Table 4 Crop Recommendation Dataset

Table 5 Health Care Diabetes Dataset

Table 6 Amazon of cell phone and accessories Product Ratings Dataset

V. CONCLUSION widely used to impute missing values among other techniques.

IJISRT24JUL1459 www.ijisrt.com 2310

IJISRT24JUL1459 www.ijisrt.com 2311

You might also like