Business Understanding, Data Understanding, Data Preprocessing, Learning Methods
Business Understanding, Data Understanding, Data Preprocessing, Learning Methods
Seedscientific
(2020) & statista
(2019)
Data harus diolah menjadi pengetahuan agar bisa bermanfaat untuk manusia.
• Sistem Prediksi
• Sistem Informasi Kelulusan
Akademik Mahasiswa
• Sistem Pencatatan • Sistem Prediksi Hasil
Pemilu Pemilu
• Sistem Laporan • Sistem Prediksi
Kekayan Pejabat Koruptor
• Sistem Pencatatan • Sistem Penentu
Kredit Kelayakan Kredit
https://www.cs.purdue.edu/people/faculty/wsc.html
= DATA SCIENCE
(Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Communication
of ACM, 45(11): 50-54, Nov. 2002)
Data Science
Information Discovery and Modelling Data Scientist
Data Exploration
Statistical Summary, Metadata, and Description
Statistics
Computing
Pattern Algorithms
Recognition
DATA
SCIENCE
Machine Database
Learning Technology
1 Estimasi
2 Forecasting
3 Klasifikasi
4 Klastering
5 Asosiasi
1 Estimasi
2 Forecasting
3 Klasifikasi
4 Klastering
5 Asosiasi
5. Cluster (Klaster)
Supervised Semi-
Supervised Unsupervised
Learning Learning Learning
Association based
Learning
▪ Business Understanding
▪ Data Understanding
▪ Data Preparation
▪ Modeling
▪ Evaluation
▪ Deployment
6. Captured data: GPS data, sensors, IoT, usually unstructured and can be
internal or external
7. User-generated data: individuals and companies generate consciously – or at
least knowingly, usually unstructured and can be internal or external.
COFFEE BREAK
https://rapidminer.com/get-started/
• Informasi dapat diekstrak dari data yang ada (umur dapat dihitung dari tanggal lahir)
• Kadang informasi pada data harus disajikan secara eksplisit untuk meningkatkan
performa model
• Duplicate records
Other Data
• Incomplete data
Problems
• Inconsistent data
Features Features
Data Cleaning Data Reduction
Extraction Transformation
Dataset: Missingdataset.csv
Join table
• Klik Run
Hal ini terjadi karena atribut yang missing berada di beberapa tempat
(cara ini tidak bisa digunakan karna data tidak bisa di mining)
• Data sesudah