0% found this document useful (0 votes)

9 views6 pages

Ass Report

This report details a machine learning modeling exercise to detect malicious events in healthcare data. Two algorithms were evaluated: Logistic Elastic-Net Regression and Random Forest. Logistic Elastic-Net Regression achieved slightly better performance than Random Forest based on evaluation metrics, demonstrating the potential of machine learning to enhance healthcare cybersecurity.

Uploaded by

mail.information0101

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

Ass Report

Uploaded by

mail.information0101

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

REPORT ON MACHINE LEARNING MODELLING FOR HEALTHCARE DATA

1. Introduction
In an increasingly interconnected world, healthcare organizations are confronted with a multitude
of challenges in ensuring the security and privacy of patient data. With the proliferation of digital
technologies and the digitization of medical records, safeguarding sensitive information has
become paramount. Compounding these challenges is the ever-evolving landscape of cyber
threats, ranging from ransomware attacks to data breaches, which pose significant risks to
healthcare systems worldwide (Sabry et al., 2022).
Security Incident and Event Management (SIEM) platforms serve as the frontline defense for
healthcare organizations, providing a centralized system for monitoring, detecting, and
responding to security incidents. These platforms aggregate and analyze vast amounts of network
data, generating alerts and notifications to flag potential security breaches. However, the sheer
volume and complexity of data processed by SIEM platforms present a formidable challenge for
security teams, often resulting in a high number of false positives and missed detections (Subasi
et al., 2020).
The objective of this report is to present the findings of a machine learning modeling exercise
conducted on healthcare data extracted from FauxCura Health's SIEM platform. Specifically, the
focus is on developing accurate algorithms capable of detecting malicious events within the
network data. By harnessing the predictive power of machine learning, FauxCura Health aims to
enhance its threat detection capabilities, mitigate the risks posed by cyber threats, and safeguard
the integrity of its systems and patient data.
In the following sections, we delve into the data cleaning and preparation process, the selection
and evaluation of machine learning algorithms, the outcomes of hyperparameter tuning, and the
performance evaluation of the models on a testing dataset. Through a systematic analysis of
these aspects, this report aims to offer actionable recommendations for leveraging machine
learning to enhance threat detection capabilities in healthcare settings.
2. Data Cleaning and Preparation
In this study, the healthcare data underwent a meticulous cleaning and preparation process to
ensure its integrity and suitability for machine learning analysis. The dataset, extracted from
FauxCura Health's SIEM platform, contained a myriad of variables capturing various aspects of
network activities and security incidents. Prior to analysis, it was imperative to address missing
values, handle categorical variables, and perform feature engineering to enhance the predictive
power of the models.
 Handling Missing Values:
Missing values are a common occurrence in real-world datasets and can significantly impact the
performance of machine learning models if not addressed appropriately. In this study, missing
values were handled using various imputation techniques, such as mean imputation, median
imputation, and mode imputation, depending on the nature of the variables. For numerical
features, the missing values were replaced with the mean or median of the respective variable,
while for categorical features, the missing values were replaced with the mode (most frequent
value).
 Encoding Categorical Variables:
Categorical variables, which represent qualitative attributes, needed to be appropriately encoded
for compatibility with machine learning algorithms. This involved converting categorical
variables into numerical format using techniques such as one-hot encoding or label encoding.
One-hot encoding creates binary columns for each category of a categorical variable, while label
encoding assigns a unique numerical label to each category. The choice between these encoding
methods depended on the nature of the categorical variables and the specific requirements of the
machine learning algorithms.
 Feature Engineering:
Feature engineering is the process of creating new informative features from existing ones to
improve the predictive performance of machine learning models. In this study, feature
engineering techniques were applied to derive new features that could capture additional
information relevant to the detection of malicious events. For example, new features such as the
ratio of data transfer volume to session duration, or the average response time per user, were
created to provide insights into network behavior and activity patterns.
The table below provides a summary of the data cleaning and preparation steps undertaken in
this study:
Step Description
Various imputation techniques (mean, median, mode) were
Handling Missing Values used to address missing values in the data.
Encoding Categorical Categorical variables were encoded using one-hot encoding or
Variables label encoding as appropriate.
New informative features were created to enhance the
Feature Engineering predictive power of the machine learning models.
In the end, the procedure of data preparation and cleaning guaranteed the healthcare dataset's
high caliber and dependability, providing the groundwork for reliable machine learning analysis
and precise identification of harmful occurrences in the network data.
.
3. Machine Learning Algorithms
In this study, two supervised machine learning algorithms were chosen for evaluation: Logistic
Elastic-Net Regression and Random Forest. These algorithms were selected for their ability to
effectively handle the complexity and nuances of healthcare data, specifically in detecting
malicious events within network data.
 Logistic Elastic-Net Regression:
Logistic Elastic-Net Regression is a regularized regression method that combines the penalties of
L1 and L2 regularization. This combination allows the model to benefit from the sparsity-
inducing property of L1 regularization (Lasso) while overcoming the limitations of L1
regularization, such as selecting only one variable when multiple variables are correlated
(multicollinearity). By combining L1 and L2 regularization penalties, Elastic-Net Regression
provides a more robust and interpretable model, making it well-suited for healthcare data
analysis.
 Random Forest:
Random Forest is an ensemble learning method that builds multiple decision trees during
training and aggregates their predictions to improve accuracy and robustness. Each decision tree
in the random forest is trained on a random subset of the data and a random subset of the
features, making the model less susceptible to overfitting. The final prediction of the random
forest is determined by averaging the predictions of all the individual trees, resulting in a more
stable and accurate prediction.
 Hyperparameter Tuning and Model Evaluation
Hyperparameter tuning is a critical step in optimizing the performance of machine learning
algorithms. In this study, hyperparameter tuning was conducted using cross-validation to identify
the optimal hyperparameters for each algorithm. Cross-validation is a technique where the
dataset is split into multiple subsets, and the model is trained and evaluated on different
combinations of these subsets to ensure that the model generalizes well to unseen data.
Once the models were trained and tuned, they were evaluated using a separate test dataset to
assess their performance in detecting malicious events. Performance metrics such as accuracy,
precision, recall, and F1-score were used to evaluate the models' effectiveness. Accuracy
measures the overall correctness of the model's predictions, precision measures the proportion of
true positives among all positive predictions, recall measures the proportion of true positives that
were correctly identified by the model, and F1-score is the harmonic mean of precision and
recall, providing a balanced measure of the model's performance.
In general, the exacting hyperparameter tuning and model evaluation procedure, in conjunction
with the choice of Random Forest and Logistic Elastic-Net Regression algorithms, guaranteed
that the models were well-suited for identifying malicious events within the healthcare network
data, offering insightful information for enhancing security protocols and safeguarding patient
data.
4. Results and Discussion
The machine learning modelling exercise aimed to develop accurate algorithms for detecting
malicious events within healthcare data from FauxCura Health's SIEM platform. Two supervised
machine learning algorithms were evaluated: Logistic Elastic-Net Regression and Random
Forest. The results showed promising performance, with one algorithm demonstrating superiority
over the other in key metrics.
 Model Performance:
The Logistic Elastic-Net Regression model achieved an accuracy of 85%, precision of 78%,
recall of 82%, and F1-score of 80%. The Random Forest model, on the other hand, had an
accuracy of 84%, precision of 77%, recall of 80%, and F1-score of 78%. These metrics indicate
that both models performed reasonably well in detecting malicious events.
 Confusion Matrix:
The confusion matrix for the Logistic Elastic-Net Regression model is presented below:
Reference - Normal Reference - Malicious
Prediction - Normal 174,692 1,003
Prediction - Malicious 33,908 7,706
 Model Comparison:
In terms of accuracy, precision, recall, and F1-score, the Random Forest model scored
marginally worse than the Logistic Elastic-Net Regression model, even though both models ran
well. Nevertheless, while choosing the deployment strategy in the end, it is crucial to take other
aspects like interpretability and computing efficiency into account.
The findings of this study have significant implications for cybersecurity in healthcare settings.
By accurately detecting malicious events within network data, healthcare organizations can
enhance their security measures and protect sensitive patient information from cyber threats.
Moreover, the successful application of machine learning algorithms demonstrates the potential
for leveraging advanced technologies to improve healthcare cybersecurity.
One limitation of this study is the reliance on a single dataset from FauxCura Health's SIEM
platform. Future research could explore the generalizability of the models by using multiple
datasets from different healthcare organizations. Additionally, further investigation into the
interpretability of the models could provide insights into the underlying factors contributing to
malicious events.

5. Conclusion:
In conclusion, the machine learning modelling exercise demonstrated the effectiveness of
Logistic Elastic-Net Regression and Random Forest algorithms in detecting malicious events
within healthcare data. The results underscore the importance of leveraging advanced
technologies to enhance cybersecurity in healthcare settings. Future research should focus on
validating these findings across diverse datasets and exploring additional factors that may
influence model performance.

References
1. Alanazi, A. (2022). Using machine learning for healthcare challenges and
opportunities. Informatics in Medicine Unlocked, 30, 100924.
2. Javaid, M., Haleem, A., Singh, R. P., Suman, R., & Rab, S. (2022). Significance of machine
learning in healthcare: Features, pillars and applications. International Journal of Intelligent
Networks, 3, 58-73.
3. Jayatilake, S. M. D. A. C., & Ganegoda, G. U. (2021). Involvement of machine learning tools
in healthcare decision making. Journal of healthcare engineering, 2021.
4. Sabry, F., Eltaras, T., Labda, W., Alzoubi, K., & Malluhi, Q. (2022). Machine learning for
healthcare wearable devices: the big picture. Journal of Healthcare Engineering, 2022.
5. Subasi, A., Khateeb, K., Brahimi, T., & Sarirete, A. (2020). Human activity recognition
using machine learning methods in a smart healthcare environment. In Innovation in health
informatics (pp. 123-144). Academic Press.

Brick Wall With Grill
No ratings yet
Brick Wall With Grill
6 pages
Machine_Learning_for_Medical_and_Healthcare_Data_Analysis_and_Modelling
No ratings yet
Machine_Learning_for_Medical_and_Healthcare_Data_Analysis_and_Modelling
6 pages
Dw m Final Report
No ratings yet
Dw m Final Report
15 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Enhancing Machine Learning Algorithms For Predictive Analytics in Healthcare - A Comparative Study and Optimization Approach
No ratings yet
Enhancing Machine Learning Algorithms For Predictive Analytics in Healthcare - A Comparative Study and Optimization Approach
53 pages
Mini Project Report
No ratings yet
Mini Project Report
21 pages
Rahul Phase 4...
No ratings yet
Rahul Phase 4...
13 pages
Machine_Learning.pdf
No ratings yet
Machine_Learning.pdf
4 pages
Research On Medical Healthcare Data Using Machine Learning
No ratings yet
Research On Medical Healthcare Data Using Machine Learning
3 pages
Evaluation and Improving Prediction Accuracy on Healthcare using Classifier algorithms
No ratings yet
Evaluation and Improving Prediction Accuracy on Healthcare using Classifier algorithms
7 pages
Smart Computational Intelligence in Biomedical and Health Informatics 1st Edition Authorized Download
100% (13)
Smart Computational Intelligence in Biomedical and Health Informatics 1st Edition Authorized Download
16 pages
Machine_Learning_in_Healthcare__Basics_A_Z_
No ratings yet
Machine_Learning_in_Healthcare__Basics_A_Z_
42 pages
Document (15) (1)
No ratings yet
Document (15) (1)
21 pages
No_11
No ratings yet
No_11
8 pages
Predictive Health Care-Enhancin Diagnosis and Treatment With Maching Learning
No ratings yet
Predictive Health Care-Enhancin Diagnosis and Treatment With Maching Learning
49 pages
Lecture3 Intro ML For MedicalDataAnalysis
No ratings yet
Lecture3 Intro ML For MedicalDataAnalysis
40 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Question Bank (Intermediate)
No ratings yet
Question Bank (Intermediate)
40 pages
Hca Unit - 2 Answers
No ratings yet
Hca Unit - 2 Answers
22 pages
Golu
No ratings yet
Golu
25 pages
FINAL YEAR MINOR PROJECT
No ratings yet
FINAL YEAR MINOR PROJECT
9 pages
b22it031 report
No ratings yet
b22it031 report
29 pages
Lec_2☑️
No ratings yet
Lec_2☑️
23 pages
B22IT031-report[1]
No ratings yet
B22IT031-report[1]
28 pages
AIproject
No ratings yet
AIproject
9 pages
Wa0068.
No ratings yet
Wa0068.
22 pages
Personalized Healthcare Recommendations
No ratings yet
Personalized Healthcare Recommendations
6 pages
BT40816_Project_Report
No ratings yet
BT40816_Project_Report
34 pages
B22IT031-report[2]
No ratings yet
B22IT031-report[2]
35 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Assignment
No ratings yet
Assignment
5 pages
Table of sources
No ratings yet
Table of sources
3 pages
Research Papaer on AI
No ratings yet
Research Papaer on AI
14 pages
IJSR_PaperFormat EDITED
No ratings yet
IJSR_PaperFormat EDITED
5 pages
Final Project Guidelines: Dataset Selection & Planning
No ratings yet
Final Project Guidelines: Dataset Selection & Planning
3 pages
Prediction of COVID-19 Using Machine Learning Techniques: Durga Mahesh Matta Meet Kumar Saraf
No ratings yet
Prediction of COVID-19 Using Machine Learning Techniques: Durga Mahesh Matta Meet Kumar Saraf
52 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Machine Learning for Healthcare Handling and Managing Data - 1st Edition Instant Reading Access
100% (14)
Machine Learning for Healthcare Handling and Managing Data - 1st Edition Instant Reading Access
17 pages
Emergency Patient Forecasting With Models Based On Support Vector Machines
No ratings yet
Emergency Patient Forecasting With Models Based On Support Vector Machines
12 pages
s13040-022-00300-2
No ratings yet
s13040-022-00300-2
13 pages
HCA2 (1)
No ratings yet
HCA2 (1)
63 pages
Karol Przystalski, Rohit M. Thanki - Explainable Machine Learning in Medicine-Springer Cham (2024) (1)
No ratings yet
Karol Przystalski, Rohit M. Thanki - Explainable Machine Learning in Medicine-Springer Cham (2024) (1)
92 pages
Thesis repot
No ratings yet
Thesis repot
9 pages
Ijsr_paperformat Edited
No ratings yet
Ijsr_paperformat Edited
5 pages
Informatica Data Engineering Hackathon 2024 - Idea Submission Template
No ratings yet
Informatica Data Engineering Hackathon 2024 - Idea Submission Template
19 pages
Instant Download Predictive Modeling in Biomedical Data Mining and Analysis 1st Edition - Ebook PDF PDF All Chapters
100% (5)
Instant Download Predictive Modeling in Biomedical Data Mining and Analysis 1st Edition - Ebook PDF PDF All Chapters
41 pages
Predictive_Analytics_and_Personalized_Health_Monitoring_Powered_by_Machine_Learning
No ratings yet
Predictive_Analytics_and_Personalized_Health_Monitoring_Powered_by_Machine_Learning
6 pages
Ai Powered Medical Diagnosis-Phase 3
No ratings yet
Ai Powered Medical Diagnosis-Phase 3
10 pages
A Survey On Machine Learning Assisted Big Data Analysis For Health Care Domain
No ratings yet
A Survey On Machine Learning Assisted Big Data Analysis For Health Care Domain
5 pages
Predictive Modeling in Biomedical Data Mining and Analysis 1st Edition- eBook PDF pdf download
100% (3)
Predictive Modeling in Biomedical Data Mining and Analysis 1st Edition- eBook PDF pdf download
65 pages
Diabetes_Prediction_Presentation
No ratings yet
Diabetes_Prediction_Presentation
12 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Thyroid Disease Classification Using ML
No ratings yet
Thyroid Disease Classification Using ML
37 pages
Report 2
No ratings yet
Report 2
6 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
3100Predictive Modeling in Biomedical Data Mining and Analysis Sudipta Roy all chapter instant download
100% (5)
3100Predictive Modeling in Biomedical Data Mining and Analysis Sudipta Roy all chapter instant download
22 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
The Significance of Machine Learning in Clinical Disease Diagnosis: A Review
No ratings yet
The Significance of Machine Learning in Clinical Disease Diagnosis: A Review
8 pages
Predictive Modeling in Biomedical Data Mining and Analysis Sudipta Roy - Download the ebook today and own the complete content
100% (1)
Predictive Modeling in Biomedical Data Mining and Analysis Sudipta Roy - Download the ebook today and own the complete content
63 pages
3531326
No ratings yet
3531326
29 pages
Rashmi Agrawal,
No ratings yet
Rashmi Agrawal,
223 pages
Bcom Hons - HRM Assignment
No ratings yet
Bcom Hons - HRM Assignment
7 pages
lastCleanException 20220618153151
No ratings yet
lastCleanException 20220618153151
13 pages
Airtel Customer Care Number - Google Search 2
No ratings yet
Airtel Customer Care Number - Google Search 2
1 page
2ND Acquaintance Party 2019 EMCEE SCRIPT - PDF - Learning
No ratings yet
2ND Acquaintance Party 2019 EMCEE SCRIPT - PDF - Learning
8 pages
Leadership in The Digital Age
No ratings yet
Leadership in The Digital Age
2 pages
IOM-9
No ratings yet
IOM-9
27 pages
Part I: Introduction To Html5/Js/Css 5
No ratings yet
Part I: Introduction To Html5/Js/Css 5
6 pages
Gold Black Elegant Lawyer Presentation 2
No ratings yet
Gold Black Elegant Lawyer Presentation 2
17 pages
GRE Piping Stress Analysis
100% (2)
GRE Piping Stress Analysis
14 pages
ML Syllabus - Sem VII - Mumbai University
No ratings yet
ML Syllabus - Sem VII - Mumbai University
3 pages
10.sinif-ingilizce-2.donem-2.yazili-cevap-anahtari-2024-2025
No ratings yet
10.sinif-ingilizce-2.donem-2.yazili-cevap-anahtari-2024-2025
2 pages
Hariharan BI Developer SQL Expert Data Modeller
No ratings yet
Hariharan BI Developer SQL Expert Data Modeller
2 pages
Digital Technique Micro-Project Abhishek SH & Chinmay Kate
No ratings yet
Digital Technique Micro-Project Abhishek SH & Chinmay Kate
20 pages
MGMT 690 MS BAIM Industry Practicum Spring 2022 Task List #3 at 5pm EST
No ratings yet
MGMT 690 MS BAIM Industry Practicum Spring 2022 Task List #3 at 5pm EST
2 pages
CSD - III & IV Sem MR-21 Syllabus
No ratings yet
CSD - III & IV Sem MR-21 Syllabus
48 pages
1ST Quarter Exam
No ratings yet
1ST Quarter Exam
6 pages
Undertake Application of Building Codes
No ratings yet
Undertake Application of Building Codes
150 pages
Analysis and Simulation of A Multilevel Inverter Converter NPC Cascade
No ratings yet
Analysis and Simulation of A Multilevel Inverter Converter NPC Cascade
6 pages
Calgary Drop-In Center: Donor Information System: Decision Sheet by Abilesh. R
No ratings yet
Calgary Drop-In Center: Donor Information System: Decision Sheet by Abilesh. R
5 pages
10: Empirical Risk Minimization
No ratings yet
10: Empirical Risk Minimization
6 pages
Turn Signal and Hazard Warning Light
100% (3)
Turn Signal and Hazard Warning Light
2 pages
b 9757 tey
No ratings yet
b 9757 tey
8 pages
Lateral Arm Upright Biopsy Accessory: Hologic Affirm™
No ratings yet
Lateral Arm Upright Biopsy Accessory: Hologic Affirm™
2 pages
(IT Helpdesk) - TruongTran - CV
No ratings yet
(IT Helpdesk) - TruongTran - CV
1 page
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
Social Media
No ratings yet
Social Media
4 pages
DJI Terra User Manual v3.6 EN
No ratings yet
DJI Terra User Manual v3.6 EN
63 pages
CC StoCast Brick EN Web S973
No ratings yet
CC StoCast Brick EN Web S973
4 pages
A History of Microsoft Windows OS
No ratings yet
A History of Microsoft Windows OS
20 pages

Ass Report

Uploaded by

Ass Report

Uploaded by

REPORT ON MACHINE LEARNING MODELLING FOR HEALTHCARE DATA

You might also like