0% found this document useful (0 votes)
29 views52 pages

Supersived Machine Learning

The document outlines the schedule for meetings after the midterm exam for an Artificial Intelligence course. It includes the topics, teaching methods, and time allocation for each of the 9 meetings between weeks 9-16. The topics progress from classification and clustering to decision trees, neural networks, machine learning algorithms for classification like logistic regression, random forests, and naive bayes. It allocates a total of 18 hours of instruction time over the 8 weeks.

Uploaded by

farhan yutub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views52 pages

Supersived Machine Learning

The document outlines the schedule for meetings after the midterm exam for an Artificial Intelligence course. It includes the topics, teaching methods, and time allocation for each of the 9 meetings between weeks 9-16. The topics progress from classification and clustering to decision trees, neural networks, machine learning algorithms for classification like logistic regression, random forests, and naive bayes. It allocates a total of 18 hours of instruction time over the 8 weeks.

Uploaded by

farhan yutub
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Sistem Cerdas (TIF 150702)

M. Angga Gumilang
Rencana Pertemuan Setalah UTS

Minggu Ke Materi Metode Waktu


9 Klasifikasi dan Clustering Praktisi Mengajar 4 Jam
10 Logika Fuzzy & Sistem Pakar Praktisi Mengajar 4 Jam
11 Decision Tree & Jaringan Syaraf Tiruan Praktisi Mengajar 4 Jam
12 Machine Learning untuk Klasifikasi Ceramah 1 Jam
13 Logistic Regression & Decision Tree Diskusi 1 Jam
14 Random Forest & Support Vector Machine (SVM) Diskusi 1 Jam
15 K Nearest Neighbour (KNN) dan Naïve Bayes Diskusi 1 Jam
16 Ujian Akhir Semester Soal Subjektif 2 Jam
Total Waktu Pembelajaran 18 Jam
Machine Learning
Dan Penerapannya untuk Klasifikasi
Outline

● Konsep Machine Learning


● Algoritma Machine Learning
● Contoh Studi Kasus Klasifikasi
● Logistic Regression & Decision Tree
● Random Forest & Support Vector Machine (SVM)
● K Nearest Neighbour (KNN) dan Naïve Bayes
Konsep Machine
Learning
Konsep Machine Learning
Machine Learning = Algorithm + Math (Statistics)
Mind Map Machine Learning
Machine Learning in Classification
How Machine Learning Works ?
Supervised vs
Unsupervised
Learning
● The easiest way to
distinguish a
supervised learning
and unsupervised
learning is to see
whether the data is
labelled or not.
Unsupervised Learning
Confuse Machine Learning ?
Reinforcement Learning
Algoritma
Machine Learning
Machine Learning
Machine Learning ?
1. Logistic
Regression
● Logistics regression uses
sigmoid function above to return
the probability of a label. It is
widely used when the
classification problem is binary
— true or false, win or lose,
positive or negative ...

● The sigmoid function generates


a probability output. By
comparing the probability with a
pre-defined threshold, the object
is assigned to a label
accordingly.
Logistic Regression Illustration
Logistic Regression Code Snippets

● Kode Program dan Penjelasan Lebih Detail :


https://towardsdatascience.com/tuning-the-hyperparameters-of-your-machine-learning-model-using-
gridsearchcv-7fc2bb76ff27

logistic regression common hyperparameters: penalty, max_iter, C, solver


2. Decision Tree
● Decision tree builds tree
branches in a hierarchy
approach and each
branch can be considered
as an if-else statement.
The branches develop by
partitioning the dataset
into subsets based on
most important features.
Final classification
happens at the leaves of
the decision tree.
Decision Tree Illustration
Decision Tree Code Snippets

● Penjelasan Lebih Lanjut : https://towardsdatascience.com/how-to-tune-a-decision-tree-


f03721801680

● decision tree common hyperparameters: criterion, max_depth, min_samples_split,


min_samples_leaf; max_features
3. Random Forest
● Random forest is a collection of
decision trees. It is a common
type of ensemble methods
which aggregate results from
multiple predictors. Random
forest additionally utilizes
bagging technique that allows
each tree trained on a random
sampling of original dataset and
takes the majority vote from
trees.
● Compared to decision tree, it
has better generalization but
less interpretable, because of
more layers added to the model.
Random Forest Illustration
Random Forest Code Snippets

● decision tree common hyperparameters: criterion, max_depth, min_samples_split,


min_samples_leaf; max_features
● https://towardsdatascience.com/how-to-tune-a-decision-tree-f03721801680
4. Support Vector
Machine (SVM)
● Support vector machine finds
the best way to classify the data
based on the position in relation
to a border between positive
class and negative class. This
border is known as the
hyperplane which maximize the
distance between data points
from different classes.
● Similar to decision tree and
random forest, support vector
machine can be used in both
classification and regression,
SVC (support vector classifier)
is for classification problem.
SVM Illustration
SVM Code Snippets

● support vector machine common hyperparameters: c, kernel, gamma


● https://www.vebuso.com/2020/03/svm-hyperparameter-tuning-using-gridsearchcv/
5. K-Nearest
Neighbour (KNN)
● You can think of k nearest
neighbour algorithm as
representing each data point in
a n dimensional space — which
is defined by n features. And it
calculates the distance between
one point to another, then
assign the label of unobserved
data based on the labels of
nearest observed data points.

● KNN can also be used for


building recommendation
system,
KNN Illustrations

KNN has three basic steps.


1. Calculate the distance.

2. Find the k nearest


neighbours.

3. Vote for classes


KNN Code Snippets

● KNN common hyperparameters: n_neighbors, weights, leaf_size, p


● More detailed : https://towardsdatascience.com/knn-visualization-in-just-
13-lines-of-code-32820d72c6b6
6. Naïve Bayes

● Naive Bayes is based on Bayes’


Theorem — an approach to
calculate conditional probability
based on prior knowledge, and
the naive assumption that each
feature is independent to each
other.
● The biggest advantage of Naive
Bayes is that, while most
machine learning algorithms rely
on large amount of training data,
it performs relatively well even
when the training data size is
small. Gaussian Naive Bayes is
a type of Naive Bayes classifier
that follows the normal
distribution.
Naïve Bayes illustration

https://ranasinghiitkgp.medium.com/mathematic-behind-naive-bayes-and-its-application-9ec8cc4f0a91
Naïve Bayes Code Snippets

● gaussian naive bayes common hyperparameters: priors, var_smoothing


● https://www.analyticsvidhya.com/blog/2021/01/gaussian-naive-bayes-with-hyperpameter-tuning/
Case Study
(an Example)
1. Loading Dataset and Data Overview

● I chose the popular dataset Heart Disease UCI on Kaggle for predicting the presence of heart disease
based on several health related factors.
● https://www.kaggle.com/ronitf/heart-disease-uci
1. Loading Dataset
and Data Overview
● Use df.info()to have a
summarized view of dataset,
including data type, missing
data and number of records.
2. Exploratory Data
Analysis (EDA)
● Histogram, grouped
bar chart and box
plot are suitable EDA
techniques for
classification
machine learning
algorithms.
● Univariate Analysis
Categorical Features vs. Target — Grouped Bar Chart
Numerical Features vs. Target — Box Plot
3. Split Dataset into Training and Testing Set
4. Machine Learning Model Pipeline
5. Model Evaluation

Below is an abstraction explanation of commonly used evaluation methods for


classification models — accuracy, ROC & AUC and confusion matrix.
Accuracy Results
Confusion Matrix
Accuracy and Confusion Matrix
Some useful References

● https://destingong.medium.com/list/practical-guides-to-machine-learning-
a877c2a39884
● https://www.kaggle.com/
● https://towardsdatascience.com/top-machine-learning-algorithms-for-
classification-2197870ff501
● https://repository.unimal.ac.id/6707/1/Machine%20Learning.pdf (Ebook)
● https://wiragotama.github.io/resources/ebook/intro-to-ml-secured.pdf
(Ebook)
● https://scikit-learn.org/stable/ (Sklearn Documentation)
Mini Project (Tugas Kelompok)
Instruksi Mini Project (Tugas Kelompok)

● Bagi jumlah anggota dalam satu golongan menjadi 6 (enam) Kelompok !


● Bagi Topik bahasan berikut di setiap kelompok
1. Logistic Regression
2. Decision Tree
3. Random Forest
4. Support Vector Machine (SVM)
5. K Nearest Neighbour (KNN)
6. Naïve Bayes
Instruksi Mini Project (Tugas Kelompok) -2

1. Carilah Sebuah Dataset di Kaggle / Laman web lain yang cocok untuk
dipecahkan dengan Topik yang dipilih
2. Buat Pemodelan Classification sesuai dengan topik yang dipilih (Bahasa
pemrograman dan IDE bebas, yang direkomendasikan : Python dan
SKLearn).
3. Tulis Kembali hasil pemecahan studi kasus, pemodelan, dan analisis ke
dalam laman website Medium / LinkedIn, kumpulkan assignment di
elearning.
Contoh Sistematika Penulisan di Medium

1. Introduction : kenapa mengambil studi kasus tersebut ?


2. Dataset Overview : Bagaimana Sample Dataset yang telah didapat ?
3. Explanatory Data Analysis (EDA)
4. Splitting Dataset for Modelling Classification
5. Machine Learning Implementation
6. Model Evaluation
7. Conclusion : apakah berhasil memecahkan masalah ?
8. Referensi : Tulis seluruh artikel, website, dataset, dan seluruh sumber yang
anda gunakan !

You might also like