0% found this document useful (0 votes)
23 views6 pages

Capstone Assignment

The document summarizes 8 research studies that apply machine learning and data mining methods to address various healthcare and medical issues: 1. The studies use machine learning to analyze diabetes and breast cancer data, predict COVID-19 case trends, detect financial fraud, and more. 2. Methodologies included supervised learning algorithms like logistic regression, decision trees, and neural networks applied to different datasets. 3. Results showed machine learning can effectively classify diabetes data, detect cancer, forecast pandemic trends, and detect fraud transactions with over 90% accuracy in some cases. 4. Recommendations emphasized the need for machine learning in healthcare to draw insights from large datasets and help address challenges like optimizing treatment and mitigating risks

Uploaded by

phlwin722
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

Capstone Assignment

The document summarizes 8 research studies that apply machine learning and data mining methods to address various healthcare and medical issues: 1. The studies use machine learning to analyze diabetes and breast cancer data, predict COVID-19 case trends, detect financial fraud, and more. 2. Methodologies included supervised learning algorithms like logistic regression, decision trees, and neural networks applied to different datasets. 3. Results showed machine learning can effectively classify diabetes data, detect cancer, forecast pandemic trends, and detect fraud transactions with over 90% accuracy in some cases. 4. Recommendations emphasized the need for machine learning in healthcare to draw insights from large datasets and help address challenges like optimizing treatment and mitigating risks

Uploaded by

phlwin722
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Research 1: “Machine Learning and Data Mining Methods in Diabetes Research”

1. Problem Statement
The remarkable advances in biotechnology and health sciences have led to a significant production of data,
no(EHRs).
2. Objectives
To this end, application of machine learning and data mining methods in biosciences is presently, more than
ever before, vital and indispensable in efforts to transform intelligently all available information into valuable
knowledge.
3. Framework/Methodology
In this term, the current collection consists of research work conducted the last five years. Moreover, in the
present study, specific keywords were used such as machine learning and data mining.
4. Results
Support vector machine the most successful algorithm in both biological and clinical datasets in Diabetes
Mellitus. A great deal of articles 85% used the supervised learning approaches, i.e. in classification and
regression tasks. In the remaining 15%, association rules were employed mainly to study associations between
biomarkers.
5. Recommendation
Applying machine learning and data mining methods in DM research is a key approach to utilizing large
volumes of available diabetes-related data for extracting knowledge.

Research 2 “A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and
Machine Learning Applications"
1. Problem Statement
Cancer is one of the major problems for human kind. Even though there are many ways to prevent it before
happening, some cancer types still do not have any treatment.
One of the most common cancer types is breast cancer, and early diagnosis is the most important thing in its
treatment.
2. Objectives
The objectives of this study were to analyze the Wisconsin breast cancer dataset by visualizing it and classify
tumor types, whether benign or malignant, by using machine learning algorithms
3. Framework/Methodology
Using data visualization and machine learning applications for breast cancer detection and diagnosis.
Diagnostic performances of applications were comparable for detecting breast cancers. Data visualization and
machine learning techniques can provide significant benefits and impact cancer detection
4. Results
All accuracy result values under six Machine learning algorithms are given Logistic regression, Decision tree
algorithm, Random forest algorithm, Rotation forest algorithm, k-nearest neighbors algorithm, and support
vector machine. And Rotation forest algorithm accuracy results were obtained as 97.4%, 95.89%, and 92.99%
for each dataset.
5. Recommendation
To enhance breast cancer detection and diagnosis, it is recommended to integrate advanced data visualization
techniques with machine learning applications.
This combination can provide a comprehensive and insightful comparative analysis, offering a more accurate
and efficient approach to identifying patterns and anomalies in medical data.

Research 3: “Big Data Analysis Using Modern Statistical and Machine Learning Methods in Medicine”
1. Problem Statement
In the field of medicine and biomedical sciences, there exists a gap between the perspectives of bio
informaticians, biostatisticians, epidemiologists, and physicians/biological scientists regarding the
interpretation of gene interactions and their influence on traits.
2. Objectives

Recognize the complexity of modeling clinical data, gene-gene, and gene-environment causal interactions
within a statistical framework and emphasize the need for comprehensive models.

3. Framework/Methodology
This article employs a comprehensive methodology to introduce modern statistical machine learning and
bioinformatics approaches for learning statistical relationships from big data in medicine and behavioral
science.
4. Results
the federal government has invested billions of dollars in enhancing clinical data analysis by implementing
electronic patient records.
The adoption of electronic health records is expected to play a pivotal role in mitigating errors and optimizing
patient care.
5. Recommendation
The review of bioinformatics and statistical methods for studying clinical data, gene-gene, and gene-
environment interactions has highlighted the limitations of traditional statistical approaches, particularly in
handling complex interactions.
Research 4 “Data analysis of Covid-19 pandemic and short-term cumulative case forecasting using machine
learning time series methods”
1. Problem Statement
The challenge lies in predicting the trend of the pandemic, given the rapidly increasing number of infections
worldwide. Despite various studies and analyses, the unpredictability and high infectious power of the virus
continue to impede effective control measures.
2. Objectives
Analyze and predict the trajectory of the Covid-19 pandemic using data collected.
Determine the most effective method for forecasting the epidemic tendency.
3. Framework/Methodology
This study on predicting the trend of the Covid-19 pandemic spans multiple phases, beginning with data
collection and distribution analysis.
4. Results
The performance evaluation, based on APE, MAPE, and RMSE, highlighted the consistent superiority of the
SVM method across global, Germany, and USA datasets. The SVM method demonstrated the lowest error
values, indicating its robustness and accuracy in forecasting cumulative infections.

The dataset covered 35 weeks, with 18 weeks used for training and 17 for testing. Comparative evaluation
using APE, MAPE, and RMSE metrics revealed that the SVM method demonstrated superior performance across
global, Germany, and USA data, with the most accurate predictions for cumulative Covid-19 cases, peaking at
approximately 80 million globally by the end of January 2021.

5. Recommendation

The comprehensive review of various trend analysis studies on COVID-19 underscores the importance of
diverse methodologies in understanding and predicting the dynamics of the pandemic.

Research 5: “Computational prediction of inter-species relationships through omics data analysis and machine
learning”
1. Problem Statement
The overconsumption of antibiotics, which threatens medical progress and has led the development if
resistance in bacteria.
Discovering new antibiotics is very time consuming and high cost.
2. Objectives
Understand the mechanism of phage-bacteria interaction.
Investigate the bacterial defense mechanisms against phages, including mutations.
Investigate the potential of phage therapy as an alternative to antibiotics for treating bacterial infections.
3. Framework/Methodology
Generation of putative non-interacting pairs to complete the training dataset since public databases lack clear
annotations for the absence of interaction.
4. Results
Based on the analysis of public data from GenBank and phagesDB.org, more than a thousand
positive phage-bacterium interactions with their complete genomes.
On this base built predictive models exhibiting predictive performance of around 90% in terms of
F1-score, sensitivity, specificity, and accuracy, obtained on the test set with 10-fold cross-
validation
5. Recommendation

As a recommendation, future research efforts could delve deeper into the application of machine learning in
this domain, exploring additional features and refining algorithms to enhance predictive accuracy and
applicability.

Research
1. Problem Statement
2. Objectives
3. Framework/Methodology
4. Results
5. Recommendation

Research 7 Fraudulent Financial Transactions Detection Using Machine Learning


1. Problem Statement
Fraudulent transactions are happening more frequently than ever before, principally in today's era of Internet,
and it is the cause of foremost financial losses.
2. Objectives
Develop a system that can predict the risk of transaction in financial company to improve customer experience
and minimize financial loss.
3. Framework/Methodology
The researchers are attempting to develop fraud detection technologies that uses machine learning and deep
learning techniques to determine whether the online transactions are real or fake based on the transaction
databases
4. Results
This dataset on fraud transactions from Kaggle, which had like 6 million rows and 10 columns. Most
transactions were small, around $145k on average, but there was this huge one at $1.99 million. Used Random
Forest for the unbalanced set, got 99.97% accuracy. Bagging Classifier ruled for the balanced set, scoring
99.96% accuracy.
5. Recommendation
Due to the increasing frequency of fraudulent transactions, especially in the digital age, where financial losses
are significant, it becomes essential for banks and financial service providers to implement an automatic fraud
detection tool. The use of machine learning, specifically deep learning techniques, can significantly increase
the effectiveness of fraud detection systems and offer a proactive approach to mitigating financial risks and
protecting against potential losses.
Research 8: Coronavirus disease (COVID-19) cases analysis using machine-learning applications
1. Problem Statement
The global outbreak of COVID-19 has led to an increasing number of cases and deaths, posing a serious
challenge to healthcare systems globally. The ongoing COVID-19 pandemic has not only created significant
health challenges but has also highlighted the need for innovative and efficient strategies to mitigate its
impact.
2. Objectives
The objective of this study is to assess and enhance the application of artificial intelligence and machine
learning in the diagnosis and management of COVID-19.
3. Framework/Methodology
This study employs a comprehensive framework for assessing the role of machine-learning applications in
diagnosing and managing COVID-19. The analysis focuses on studies implementing supervised learning
techniques with varied algorithms, such as Logistic Regression, Multinomial Naïve Bayes, Convolutional Neural
Network (CNN), K-Nearest Neighbors Classifier (K-NN), and Neural Network Algorithm, across different
countries.

4. Results

The study found that supervised learning, particularly Logistic regression (used in 5 articles), demonstrated high
accuracy (above 92%), outperforming unsupervised learning, which showed a mere 7.1% accuracy, emphasizing
the promising potential of machine learning applications in achieving accurate results in COVID-19 healthcare
settings.

5. Recommendation
Given the significant impact of the COVID-19 pandemic and the potential demonstrated by machine-learning
applications, it is recommended that further research and development be conducted in the integration of
supervised learning algorithms, particularly recurrent supervised learning, into healthcare programs.
Research 9: Deep Learning and Blockchain-Empowered Security Framework for Intelligent 5G-Enabled IoT
1. Problem Statement

The rapid evolution of fifth-generation (5G) technology has led to the emergence of diverse Internet of Things
(IoT) applications, such as smart transportation and healthcare, aiming to enhance Quality of Service (QoS)
and user experience. The large number of devices supported by 5G-enabled IoT poses challenges in
processing massive data, leading to inefficiencies in data caching, classification, and prediction.
2. Objectives

Objective of this research is to design and develop an efficient security and data analytic solution for intelligent
5G-enabled IoT. The research aims to contribute to the identification of design principles for emerging
networks and services, proposing a comprehensive DL and blockchain-empowered security framework.

3. Framework/Methodology
The proposed framework for intelligent and secure data analytics in 5G-enabled IoT combines Deep Learning
(DL) and blockchain technologies across a hierarchical architecture encompassing cloud, fog, edge plane, and
device layers. High-performance computing servers at the cloud layer leverage advanced operations such as
DL and big data mining, facilitating proactive networking and computing tasks. The fog layer, with multiple fog
nodes dynamically configured by an SDN controller, enhances real-time application performance through
massive parallelism, while blockchain ensures secure, decentralized data transactions among fog and edge
nodes.
4. Results
The global 5G-enabled IoT market is experiencing remarkable growth, with MarketsandMarkets projecting a
55.4% Compound Annual Growth Rate (CAGR), expecting it to rise from $0.7 billion to $6.3 billion by 2025. The
deployment of 5G-enabled IoT across various industries, including manufacturing and autonomous systems, is
contributing to a substantial increase in data generation, reaching an estimated 79.4 zettabytes (ZB) by the
end of 2025. This surge in data, as predicted by the International Data Corporation (IDC), underscores the
critical need for intelligent services and applications to effectively handle the massive data influx in 5G-
enabled IoT.
5. Recommendation

Recommended to prioritize the adoption of the proposed DL and blockchain-empowered security framework.
To enhance the effectiveness of the framework, further research and development should focus on real-world
implementation and testing, considering diverse use cases and scenarios.

Research
1. Problem Statement
2. Objectives
3. Framework/Methodology
4. Results
5. Recommendation

You might also like