0% found this document useful (0 votes)
56 views32 pages

DS in RME Using RM-Adi W

This document summarizes a presentation about implementing health data science in electronic medical records using data mining techniques without coding. It discusses data mining processes like KDD and CRISP-DM and common data mining tasks like estimation, prediction, classification, clustering and association. Specific data mining methods and knowledge patterns are described along with evaluation metrics. The presentation outlines how data mining can be applied in electronic medical records to extract useful health-related insights and knowledge from patient data.

Uploaded by

angga binatama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views32 pages

DS in RME Using RM-Adi W

This document summarizes a presentation about implementing health data science in electronic medical records using data mining techniques without coding. It discusses data mining processes like KDD and CRISP-DM and common data mining tasks like estimation, prediction, classification, clustering and association. Specific data mining methods and knowledge patterns are described along with evaluation metrics. The presentation outlines how data mining can be applied in electronic medical records to extract useful health-related insights and knowledge from patient data.

Uploaded by

angga binatama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Lectures and Talks in

Scholarly Communication

Implementasi Sains Data Kesehatan


pada Rekam Medis Elektronik
- Serie Data Mining tanpa Koding -

Dr. Adi Wijaya, MKom*

Presented at Webinar Rekam Medis Elektronik di Era Digital


Politeknik Karya Husada, Jakarta
6 September 2022

*) Lecturer/Researcher and IT Professional

[email protected] Lectures and Talks in Scholarly Communication


Short Bio Expertise:

DR. Adi Wijaya, MKom App. & Use Case Pro & Master
Machine Learning Pro & Master
[ ]
Data Engineering Pro

Contact:
R Programming
• Telegram: @adiwjj Data Mining
• Email: [email protected] Data Scientist
Project Management
• HP/WA: 0838-789-19-456 Software Testing
SDLC
I am a lecturer and research fellow at the Universitas Indonesia Maju
Data Governance Foundation
(UIMA) Jakarta. Apart from that, I am also involved in IT project
pertaining data science, data governance, enterprise architecture, and Certified ATLAS.ti Prof. Trainer Junior
software development. Received a Doctorate in Electrical Engineering
(2021) at the Dept. of Electrical and Information Engineering, Universitas
Project Management Essentials Certified
Gadjah Mada. My research topics are: information processing (including
bibliometric analysis), data mining and machine learning, and health-
Scrum Fundamentals Certified
related informatics including brain-computer interface.

ISO/IEC 27001 Information Security Associate

[email protected] Lectures and Talks in Scholarly Communication


Outline
►Introduction to Data Science/Data Mining
►KDD and CRISP-DM
►Data Mining Task
►Data Mining in Electronic Medical Record

[email protected] Lectures and Talks in Scholarly Communication


Interesting-related books

--- and much more…

[email protected] Lectures and Talks in Scholarly Communication


Introduction to Data Science/Data Mining

[email protected] Lectures and Talks in Scholarly Communication


Source of Big Data

Health Data Science

Source:
Zieliński JS. (2017) New Informatics Tools in Data Management, The Xth SIGSAND/PLAYS EuroSymposium 2017

[email protected] Lectures and Talks in Scholarly Communication


Data Mining Position in Big Data and Data Science
Volume

Velocity
Big
Big Data Analytics Value
Data
Variety

Data Data Data


Data Mining
Engineering Exploration Visualization
• Estimation
• Data Architecture
• Prediction
• Data Acquisition •Statistics • Classification
• Highlighting
• Data Cleaning • Plotting • Plot, Graph, Map
• Clustering
• Cloud Technology
• Association

Data Science
[email protected] Lectures and Talks in Scholarly Communication
Data Mining in Big Data Analytics

Pre-processed Sample Data

Data Mining Predictive Modeling


BIG DATA

Knowledge Big Data Analytics' Value

[email protected] Lectures and Talks in Scholarly Communication


KDD and CRISP-DM

KDD – Knowledge Discovery in Databases


CRISP-DM – Cross Industry Standard Process for Data Mining

[email protected] Lectures and Talks in Scholarly Communication


Data mining in the process of knowledge discovery

Source:
Siguenza-Guzman, L., Saquicela, V., Avila-Ordóñez, E., Vandewalle, J., & Cattrysse, D. (2015).
Literature Review of Data Mining Applications in Academic Libraries. The Journal of Academic Librarianship, 41(4), 499–510.
[email protected] Lectures and Talks in Scholarly Communication
Proses Data Mining

2. Metode Data 3. Pengetahuan 4. Evaluation


1. Himpunan Data
Mining
(Pemahaman dan
(Pilih Metode (Pola/Model/Rumus/ (Akurasi, AUC, RMSE, Lift
Pengolahan Data)
Sesuai Karakter Data) Tree/Rule/Cluster) Ratio,…)

DATA PRE-PROCESSING Estimation


Data Cleaning Prediction
Data Integration Classification
Data Reduction Clustering
Association
Data Transformation

[email protected]
11 Lectures and Talks in Scholarly Communication
CRISP-DM

[email protected] Lectures and Talks in Scholarly Communication


Data Mining Task

[email protected] Lectures and Talks in Scholarly Communication


Data Mining Task *)
►Estimation
►Prediction
►Classification
►Clustering
► Association

*) the bigger the font means the bigger the attention/usage/implementation

[email protected] Lectures and Talks in Scholarly Communication


Metode Data Mining (DM)
►Estimation (Estimasi):
Linear Regression, Neural Network, Support Vector Machine, etc
►Prediction/Forecasting (Prediksi/Peramalan):
Linear Regression, Neural Network, Support Vector Machine, etc
►Classification (Klasifikasi):
Naive Bayes, K-Nearest Neighbor, C4.5, ID3, CART, Linear Discriminant Analysis, Logistic
Regression, etc
►Clustering (Klastering):
K-Means, K-Medoids, Self-Organizing Map (SOM), Fuzzy C-Means, etc
►Association (Asosiasi):
FP-Growth, A Priori, Coefficient of Correlation, Chi Square, etc

[email protected]
15 Lectures and Talks in Scholarly Communication
Pengetahuan (Pola/Model)
►Formula/Function (Rumus atau Fungsi Regresi)
WAKTU TEMPUH = 0.48 + 0.6 JARAK + 0.34 LAMPU + 0.2 PESANAN

►Decision Tree (Pohon Keputusan)

►Tingkat Korelasi

►Rule (Aturan)
IF ips3=2.8 THEN lulustepatwaktu

►Cluster (Klaster)

[email protected]
16 Lectures and Talks in Scholarly Communication
Evaluasi (Akurasi, Error, etc)
► Estimation:
Error: Root Mean Square Error (RMSE), MSE, MAPE, etc
► Prediction/Forecasting (Prediksi/Peramalan):
Error: Root Mean Square Error (RMSE) , MSE, MAPE, etc
► Classification:
Confusion Matrix: Accuracy
ROC Curve: Area Under Curve (AUC)
► Clustering:
Internal Evaluation: Davies–Bouldin index, Dunn index, etc
External Evaluation: Rand measure, F-measure, Jaccard index, Fowlkes–Mallows
index, Confusion matrix
► Association:
Lift Charts: Lift Ratio
Precision and Recall (F-measure)

[email protected]
17 Lectures and Talks in Scholarly Communication
Data Mining in Electronic Medical Record

[email protected] Lectures and Talks in Scholarly Communication


Data Mining Tool

Tool – No need coding

Scripting/Programming

[email protected] Lectures and Talks in Scholarly Communication


Automated Machine Learning Tool/Package

Auto ML Tool/Package/Library:
► MLBox [python]
► Auto-Sklearn [python]
► Cloud Auto ML [Google cloud]
► TPOT
► Auto-Keras
► DataRobot
► BigML
► H2O AutoML
► Rapidminer AutoML
Image source:
https://medium.datadriveninvestor.com/everything-you-want-to-know-about-automated-machine-learning-pipeline-df9e44612ff

[email protected] Lectures and Talks in Scholarly Communication


Why Rapidminer
► Free
free license support up to 10,000 records
Available Educational License (free)  no record limitation
► Knowledge support  Rapidminer Academy
► User friendly and intuitive interface
► Available huge algorithms
► Python and R programming seamless integration

[email protected] Lectures and Talks in Scholarly Communication


Why Rapidminer?

Rapidminer:
Visionaries quadrant
 Completeness of vision: high
 Ability to execute: middle
Together with other popular free tool
such as: KNIME, H20.ai

[email protected] Lectures and Talks in Scholarly Communication


Common Workflow
Note:
Model yang dihasilkan dapat disimpan,
sehingga proses testing atau penggunaan
model tsb tidak perlu lagi proses training

Training
Read Training
Data
pre-processing modeling

Testing
Read Testing
Data
pre-processing apply model results

Note:
Kita lakukan training lagi (re-train) jika kita
ingin memperbaiki model, misalnya dengan
menambahkan data training yg lebih banyak,
sehingga model akan lebih baik kinerjanya

[email protected] Lectures and Talks in Scholarly Communication


Data Mining in EMR Use Case

URL dataset:
https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records

[email protected] Lectures and Talks in Scholarly Communication


Dataset Description

[email protected] Lectures and Talks in Scholarly Communication


Rapidminer Modeling Process

[email protected] Lectures and Talks in Scholarly Communication


[email protected] Lectures and Talks in Scholarly Communication
Results

Sensitifity = 68.42%

Specificity = 92.68%
[email protected] Lectures and Talks in Scholarly Communication
Compartion with previous study

Decision Tree – Rapidminer


 F1 Score = 74.39% = 0.744  Lebih besar (> 0.554)
 Accuracy = 85% = 0.850  lebih besar (> 0.737)  lebih besar dari yang terbaik (Random forests = 0.740)
 TP rate (Sensitivity) = 68.42% = 0.684  lebih besar (> 0.532)
 TN rate (Specificity) = 92.68% = 0.927  lebih besar (> 0.831)  3rd position form the best
 ROC AUC = 0.850  lebih besar (> 0.681)  lebih besar dari yang terbaik (Random forests = 0.800)

[email protected] Lectures and Talks in Scholarly Communication


Model

[email protected] Lectures and Talks in Scholarly Communication


Using the model
Start here…
Patient with: Time = 10
 Age = 62
 Creatinine_ph = 231
 Ejection_fr = 25
 Serum_cr = 0.9
Serum_sod = 140
 Serum_sod= 140
 Sex = 1
 Time = 10 Age = 62

Ejection_fr = 25
Prediction  1

Decision = 1
[email protected] Lectures and Talks in Scholarly Communication
Lectures and Talks in
Scholarly Communication

Thank you…

[email protected] Lectures and Talks in Scholarly Communication

You might also like