0% menganggap dokumen ini bermanfaat (0 suara)
469 tayangan79 halaman

Data Science

Dokumen tersebut membahas metodologi pengembangan sistem informasi dan data science, termasuk perbandingan metodologi seperti SDLC tradisional dan engineering, jenis metodologi dan pengembangan sistem, serta berbagai pendekatan untuk penyelesaian masalah seperti algoritma, bahasa domain khusus, pembelajaran mesin, dan penggunaan komponen yang tersedia.

Diunggah oleh

Safa Aulia
Hak Cipta
© © All Rights Reserved
Kami menangani hak cipta konten dengan serius. Jika Anda merasa konten ini milik Anda, ajukan klaim di sini.
Format Tersedia
Unduh sebagai PDF, TXT atau baca online di Scribd
0% menganggap dokumen ini bermanfaat (0 suara)
469 tayangan79 halaman

Data Science

Dokumen tersebut membahas metodologi pengembangan sistem informasi dan data science, termasuk perbandingan metodologi seperti SDLC tradisional dan engineering, jenis metodologi dan pengembangan sistem, serta berbagai pendekatan untuk penyelesaian masalah seperti algoritma, bahasa domain khusus, pembelajaran mesin, dan penggunaan komponen yang tersedia.

Diunggah oleh

Safa Aulia
Hak Cipta
© © All Rights Reserved
Kami menangani hak cipta konten dengan serius. Jika Anda merasa konten ini milik Anda, ajukan klaim di sini.
Format Tersedia
Unduh sebagai PDF, TXT atau baca online di Scribd
Anda di halaman 1/ 79

COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.

id

Pengantar Data Science


Prodi : Sistem Informasi

1. Kebutuhan Metodologi Data Science


2. Perbandingan Metodologi Data Science
3. CRISP-DM dan Manajemen Proyek
4. Ilustrasi Organisasi dan SDM pada DS

1. Pengantar Data Science 9. Membangun Model (1)


2. Pengenalan Berbagai Metoda DS 10. Membangun Model (2)
3. Pengalan Tools untuk Data 11. Membangun Model (3)
4. Pengenalan Tools Data Science 12. Membangun Model (4)
5. Data Understanding 13. Menggunakan dan menguji Model
6. Data Preparation (1) 14. MLOps dan Pipelining
7. Data Preparation (2) 15. Mengenal teknologi deployment
8. Ujian Mid 16. Ujian Akhir
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Kebutuhan Metodologi Data Science


Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Dari “Craft ke Engineering”

● Kutak katik ● Terarah


● Tanpa Metode ● Gunakan Metode tertentu
● Tanpa Desain ● Desain sebelum Implementasi
● Tanpa Documentasi ● terDokumentasi Baik

3
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Jenis Metodologi
● Metodologi kegiatan Teknis
● Metodologi kegiatan bisnis (dan teknis)

4
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Metode Pengembangan Sistem

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Software Development Life Cycle (SDLC)

● Metode SDLC (Software Development Life Cycle)


adalah proses pembuatan dan pengubahan sistem
serta model dan metodologi yang digunakan untuk
mengembangkan sistem rekayasa perangkat lunak
● Proses logika yang digunakan oleh seorang analis
sistem untuk mengembangkan sebuah sistem
informasi yang melibatkan requirments, validation,
training dan pemilik sistem (Prof. Dr. Sri Mulyani, AK.,
CA. 2017)
● proses yang memproduksi sebuah software dengan
kualitas setinggi-tingginya tetapi dengan biaya yang
serendah-rendahnya (Stackify)

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Berbagai Metodologi SDLC



Setiap metodologi cocok
untuk permasalahan
dan constraint tertentu

Setiap metodologi
membutuhkan personal
(perencanaan SDM) dan
tools yang berbeda

Setiap metodologi
membutuhkan
penjadwalan
(perencanaan waktu)
yang berbeda

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Jenis Pengembangan
● Setiap jenis sistem
membutuhkan metodologi
yang berbeda
● Tugas arsitek pertama kali
memahami model SIM
mana yang dibutuhkan

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Berbagai Pendekatan Pengembangan Sistem

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Pendekatan Penyelesaian Masalah


Struktur Data + Algoritma Domain Specific Lang. + Skrip Data set + Machine Learning

Pilih Algoritma yang akan ●
Tentukan domain ●
Pilih pendekatan Machine Learning
diterapkan dan implementasi (statistik,

Tentukan masalah konvensional ANN,

Pilih Struktur data yang diterapkan

Pilih “Domain Specific Language” DeepBeliefLearning)

Pilih Struktur data yang digunakan ●
Tulis dalam DSL ●
Pilih strategi Learning

Pilih Algoritma yang digunakan ●
Contoh : SQL

Pilih “dataset” untuk Learning

Strategi P1 Strategi P2

Program ditulis dari scratch

Menggunakan komponen siap pakai yang telah
dikembangkan sebelumnya (orang lain atau diri sendiri)

Berawal dari algoritma dan struktur data

Dilakukan implementasi algoritma dan struktur data di dalam

Komponen siap pakai :
bahasa pemrograman yang dipilih – Sub rutin atau fungsi
– Library

Pemrograman memanfaatkan editor biasa ataupun IDE sederhana
– Interpreter (embedded DSL)

Strategi P3 Strategi P4

Menggunakan unit atau program jadi kecil yang dapat ●
Memanfaatkan Services yang tersedia melalui Application Program
disusun menjadi satu (glue) Interface (API): GoogleAPI, TweeterAPI, FacebookAPI dll

Banyak diterapkan di lingkungan Unix (1 program kecil yang ●
Tidak perlu memahami bagaimana internal, yang penting semantik
memiliki fungsi) pemanggilan services (REST, non REST)

Memanfaatkan “pipe” dan “redirect”

Mengetahui struktur data hasil service (JSON, BSON, XML, lainnya)

Contoh : cat Fileku.dat | sort | uniq
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Perlu Metodologi Pengembangan

Pengembangan Sistem AI berdasar data



Data + Machine Learning (ML) Algorithms

Metodologi Pengembangan
Metoda iterative yang dipakai untuk menyelesaikan masalah dengan mengguna-kan data dan data science melalui urutan
langkah yang ditentukan

11
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Mengapa harus ada standar proses?



Proses data mining harus handal
dan dapat diulang oleh orang
dengan latar belakang data mining
yang sedikit.

Framework untuk merekam
pengalaman → memungkinkan
proyek diulangi

Alat bantu untuk perencanaan
proyek dan manajemen

Bagi pengembang baru akan
memudahkan

Menunjukkan maturitas pekerjaan
data mining

Meminimalkan ketergantungan
pada personal utama

12
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Perbandingan Metodologi Data Science


Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Metodologi SEMMA

Sample: Mengambil sampel data. Tahap
ini merupakan opsional

Explore: Mengeksplorasi data untuk pola
dan keanehan yang tidak diharapkan
dengan tujuan untuk mendapatkan
pengertian dan ide

Modify: Memodifikasi data dengan
membuat, menyeleksi dan
mentransformasi variabel-variabel untuk
fokus pada proses pemilihan model

Model: Memodelkan data dengan
menyediakan software untuk mencari
kombinasi data yang memprediksi hasil
terpercaya yang diinginkan secara
otomatis

Assess: Menilai data dengan
mengevaluasi kegunaan dan keandalan
penemuan dari proses data mining dan
mengevaluasi sebaik mana itu bekerja
https://documentation.sas.com/?docsetId=emref&docsetTar
get=n061bzurmej4j3n1jnj8bbjjm1a2.htm&docsetVersion=14
.3&locale=en
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Metodologi Knowledge Discovery and Data Mining



Selection: Membuat sebuah target
data, fokus dalam bagian dari
variabel atau sampel data yang
mana discovery akan dilakukan.

Preprocessing: Cleaning target
data dengan tujuan mendapatkan
data yang konsisten

Transformation: Transformasi data
menggunakan reduksi dimensional
atau metode transformasi

Data Mining: Mencari pola
menarik di dalam sebuah bentuk
tertentu, begantung dari tujuan
data mining (biasanya prediksi)

Interpretation/Evaluation:
Interpretasi dan evaluasi dari pola
yang sudah dimining. https://www.kdnuggets.com/gpspubs/aimag
-kdd-overview-1996-Fayyad.pdf

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

IBM Data Science Methodology



Problem to Approach
– The Business Understanding stage is crucial because it helps to clarify the goal of the customer. In this stage, we have to
ask a lot of questions to the customer about every single aspect of the problem; in this manner, we are sure that we will
study data related, and at the end of this stage, we will have a list of business requirements.
– The next step is the Analytic Approach, where, once the business problem has been clearly stated, the data scientist can
define the analytic approach to solve the problem. This step entails expressing the problem in the context of statistical and
machine-learning techniques, and it is essential because it helps identify what type of patterns will be needed to address
the question most effectively. If the issue is to determine the probabilities of something, then a predictive model might be
used; if the question is to show relationships, a descriptive approach may be required, and if our problem requires counts,
then statistical analysis is the best way to solve it. For each type of approach, we can use different algorithms.

Requirements to Collection
– Data Requirements is the stage where we identify the necessary data content, formats, and sources for initial data
collection, and we use this data inside the algorithm of the approach we chose.
– In the Data Collection Stage, data scientists identify the available data resources relevant to the problem domain. To
retrieve data, we can do web scraping on a related website, or we can use repository with premade datasets ready to use.
Usually, premade datasets are CSV files or Excel; anyway, if we want to collect data from any website or repository, we
should use Pandas, a useful tool to download, convert, and modify datasets. Here is an example of the data collection stage
with pandas.

Understanding to Preparation
– In the Data Understanding stage, data scientists try to understand more about the data collected before. We have to check
the type of each data and to learn more about the attributes and their names.
– In the Data Preparation stage, data scientists prepare data for modeling, which is one of the most crucial steps because the
model has to be clean and without errors. In this stage, we have to be sure that the data are in the correct format for the
machine learning algorithm we chose in the analytic approach stage. The dataframe has to have appropriate columns name,
unified boolean value (yes, no or 1, 0). We have to pay attention to the name of each data because sometimes they might be
written in different characters, but they are the same thing; for example (WaTeR, water), we can fix this making all the value
of a column lowercase. Another improvement can be made by deleting data exceptions from the dataframe because of
their irrelevance.

Modeling to Evaluation
– In the Modeling stage, the data scientist has the chance to understand if his work is ready to go or if it needs review.
Modeling focuses on developing models that are either descriptive or predictive, and these models are based on the
analytic approach that was taken statistically or through machine learning. Descriptive modeling is a mathematical process
that describes real-world events and the relationships between factors responsible for them, for example, a descriptive
model might examine things like: if a person did this, then they’re likely to prefer that. Predictive modeling is a process that
uses data mining and probability to forecast outcomes; for example, a predictive model might be used to determine
whether an email is a spam or not. For predictive modeling, data scientists use a training set that is a set of historical data in
which the outcomes are already known. This step can be repeated more times until the model understands the question
and answer to it.
– In the Model Evaluation stage, data scientists can evaluate the model in two ways: Hold-Out and Cross-Validation. In the
Hold-Out method, the dataset is divided into three subsets: a training set as we said in the modeling stage; a validation set
that is a subset used to assess the performance of the model built in the training phase; a test set is a subset to evaluate the
likely future performance of a model.
https://www.slideshare.net/JohnBRollinsPhD/foundational-methodology-for-data-science

Deployment to Feedback
– The Deployment stage depends on the purpose of the model, and it may be rolled out to a limited group of users or in a test
environment. A real case study example can be for a model destined for the healthcare system; the model can be deployed
for some patients with low-risk and after for high-risk patients too.
– The Feedback stage is usually made the most from the customer. Customers after the deployment stage can say if the
model works for their purposes or not. Data scientists take this feedback and decide if they should improve the model;
that’s because the process from modeling to feedback is highly iterative.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Microsoft’s Team Data Science Process


TDSP Cylce:
– Business understanding
– Data acquisition and
understanding
– Modeling
– Deployment
– Customer acceptance

Role in Project:
– Solution architect
– Project manager
– Data engineer
– Data scientist
– Application developer
– Project lead https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-proces
s/overview

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Microsoft’s Team Data Science Roles

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Domino DataLab Methodology



“Expect and embrace iteration” but “prevent iterations from meaningfully delaying projects, or distracting them
from the goal at hand”

“Enable compounding collaboration” by creating components that are reusable in other projects

“Anticipate auditability needs” and “preserve all relevant artifacts associated with the development and
deployment of a model”
https://www.dominodatalab.com/

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Domino Datalab Methodology



I: Ideation. The initial phase puts the “problem first, not data first” by defining the underlying business problem and
conducting business analysis activities such as current state process mapping, project ROI analysis, and upfront
documentation. It also incorporates common agile practices including developing a stakeholder-driven backlog and creating
deliverable mockups. IT and engineering are looped in early and models might be baselined with synthetic data. The phase
ends with a project kick-off. Ideation mirrors the business understanding phase from CRISP-DM.

II: Data Acquisition and Exploration. .Data science teams should identify data sources with help from stakeholders who can
provide leads based on their intuition. Decisions are made to capture data or buy data from vendors. Exploratory data
analysis is conducted, and the data is prepared for both the current project modeling and as re-usable components for future
projects. This phase incorporates many elements from the data understanding and data preparation phases of CRISP-DM.

III: Research and Development. Similar to the core modeling phase of CRISP-DM or any other data science process, this
phase iterates through hypothesis generations, experimentation, and insight delivery. True to agile principles, Domino
recommends starting with simple models, setting a cadence for insight deliveries, tracking business KPIs, and establishing
standard hardware and software configurations.

IV: Validation. This phase focuses on both business and technical validations and loosely mirrors the evaluation phase from
CRISP-DM. True to its principle to “enable compounding collaboration”, Domino stresses the importance of ensuring
reproducibility of results, automated validation checks, and documentation. The main goal of this phase is to “ultimately
receiving sign-off from stakeholders”.

V: Delivery. This is when models become products. Deployment, A/B testing, test infrastructure, and user acceptance testing,
similar to those of any software project, are in this phase. Domino recommends additional considerations such as preserving
links between deliverable artifacts, flagging dependencies, and developing a monitoring and training plan. The deployment
phase of CRISP-DM is split between this phase and the last one.

VI: Monitoring. Given models’ non-deterministic nature, Domino recommends monitoring techniques that extend beyond
standard software monitoring practices. For example, consider using control groups in production models so that you can
continually monitor model performance and value creation to the organization. Moreover, automatic monitoring of
acceptable output ranges can help identify model issues before they become too pervasive.
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

AI Project Cycle


Problem Scoping – Memahami permasalahan dengan cara memahami faktor yang mempengaruhi permaslaha, tujuan
dari proyek. Kegiatan ini akan mencoba mendefinisikan:
– Who – “Who” part helps us in comprehending and categorizing who all are affected directly and indirectly with the problem and who are called the Stake Holders
– What – “What” part helps us in understanding and identifying the nature of the problem and under this block, you also gather evidence to prove that the problem you
have selected exists.
– Where – “Where” does the problem arise, situation, and location.
– Why – “Why” is the given problem worth solving.

Data Acquisition – Tahapan ini meruapakn proses mengumpulkan data yang akurat dan handal agar dapat diproses.
Data dapat berupa teks, video, image, audio atau lainya yang dikumpulkan dari berbagai sumber, internet, koran, media
sosial dan lain sebagainya

Data Exploration – Mengatur data agar dapat diproses dengan baik. Data dapat diatur dalam bentuk tabel, grafik plot
atau database.

Modelling – Membuat model dari data hal ini dilakukan dengan mencoba berbagai model berbasiskan data yang
divisualisasi dengan mempertimbangkan keuntungan dan kerugian dari model tersebut

Evaluation – Mengevaluasi proyek dengan melihat keuaran yang diberikan sistem setelah data diberikan pada model
dan membandingkan dengan keluaran sesungguhnya

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Bagaimana di Indonesia?
Standar Kompetensi Kerja Nasional:
KepMen Ketenagakerjaan No 299 thn 2020

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

CRISP-DM dan Manajemen Proyek


Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Life Cycle dan Manajemen Proyek

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Cross Industry Standard Process for Data mining


(CRISP-DM)

Inisiatif dimulai tahun 1996 oleh 3 veteran data mining,
Daimler Chrysler (then Daimler-Benz), SPSS (then ISL) ,
NCR

Dikembangkan melalui berbagai workshop dan lebih dari
300 organsiasi berkontribusi

Versi pertama CRISP-DM dirilis tahun 1999

Lebih dari 200 anggota CRISP-DM seluruh dunia, di
antaranya:
– DM Vendors - SPSS, NCR, IBM, SAS, SGI, Data Distilleries, Syllogic, etc.
– System Suppliers / consultants - Cap Gemini, ICL Retail, Deloitte & Touche, etc.
– End Users - BT, ABB, Lloyds Bank, AirTouch, Experian, etc.

CRISP DM ini bersifat technology neutral dan vendor netral

Siklus lengkap 6 fase dari aspek bisnis hingga aspek
teknologi

Banyak diadopsi oleh perusahan-perusahaan

Menjadi acuan untuk metodologi internal

Menyediakan suatu framework agar mudah diikuti serta
template untuk analisis

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Metode CRISP-DM dan Dokumentasi

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Rencana Proyek Data Science


● Setiap proyek dimulai dengan pemahaman bisnis.
● Proyek Data Science merupakan proyek Bisnis, sehingga harus selalu berorientasi
pada pencapaian hasil yang berfokus pada bisnis
● Proyek Data Science harus memiliki visi global yang selaras dengan strategi bisnis.
● Sponsor bisnis membutuhkan solusi analitik.
● Tahapan Proyek Data Science

Solusi dari Instrumen


Penentuan Masalah Tujuan Proyek Perspektif Pengukuran
Bisnis Keberhasilan

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Example UG

Example. A Web-Mining Scenario Using CRISP-DM


As more companies make the transition to selling over the Web, an established
computer/electronics e-retailer is facing increasing competition from newer sites. Faced with the
reality that Web stores are cropping up as fast (or faster!) than customers are migrating to the Web,
the company must find ways to remain profitable despite the rising costs of customer acquisition.

One proposed solution is to cultivate existing customer relationships in order to maximize the alue
of each of the company’s current customers.


Thus, a study is commissioned with the following objectives:
 Improve cross-sales by making better recommendations.

 Increase customer loyalty with a more personalized service.


Tentatively, the study will be judged a success if:
 Cross-sales increase by 10%.
 Customers spend more time and see more pages on the site per visit.
 The study finishes on time and under budget.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Understanding UG


Your first task is to try to gain as much insight as possible into the business goals
for data mining. This may not be as easy as it seems, but you can minimize later
risk by clarifying problems, goals,


Task List
 Start gathering background information about the current business situation.
 Document specific business objectives decided upon by key decision makers
 Agree upon criteria used to determine data mining success from a business perspective.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Understanding

Identifying your business goals

Busines Busines Busines


Background Goals Success Criteria

Assesing your situation

Requirement
Inventory of Risk and Cost and
Assumption, Terminology
Resources Contigency Benefit
Constraints

Defining your data mining goals

Data mining Data mining


Goals Success Criteria

Producing your project plan

Initial Assestment
Project Plan Of Tools and Techniques

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Understanding Task



Task 1: Identifying your business goals. The first thing you must do in any project is to
find out exactly what you’re trying to accomplish! That’s less obvious than it sounds.
Many data miners have invested time on data analysis, only to find that their
management wasn’t particularly interested in the issue they were investigating.

Task 2 : Assessing your situation. This is where you get into more detail on the issues
associated with your business goals. Now you will go deeper into fact-finding, building out
a much fleshier explanation of the issues outlined in the business goals task.

Task 3: Defining your data-mining goals. Reaching the business goal often requires
action from many people, not just the data miner. So now, you must define your little part
within the bigger picture. If the business goal is to reduce customer attrition, for example,
your data-mining goals might be to identify attrition rates for several customer segments,
and develop models to predict which customers are at greatest risk.

Task 4: Producing your project plan, Now you specify every step that you, the data miner,
intend to take until the project is completed and the results are presented and reviewed.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Task 1: Identifying your business goals. UG


You must start with a clear understanding of
 A problem that your management wants to address
 The business goals
 Constraints (limitations on what you may do, the kinds of solutions that can be used, when the work must be
completed, and so on)
 Impact (how the problem and possible solutions fit in with the business)


Deliveribilities
 Background: Explain the business situation that drives the project. This item, like many that follow, amounts
only to a few paragraphs.
 Business goals: Define what your organization intends to accomplish with the project. This is usually a
broader goal than you, as a data miner, can accomplish independently. For example, the business goal might
be to increase sales from a holiday ad campaign by 10 percent year over year.
 Business success criteria: Define how the results will be measured. Try to get clearly defined quantitative
success criteria. If you must use subjective criteria (hint: terms like gain insight or get a handle on imply
subjective criteria), at least get agreement on exactly who will judge whether or not those criteria have been
fulfilled.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Background UG


Task 1—Determine Organizational Structure
 Develop organizational charts to illustrate corporate divisions, departments, and
project groups. Be sure to include managers’ names and responsibilities.
 Identify key individuals in the organization.
 Identify an internal sponsor who will provide financial support and/or domain expertise.

Understanding your
organization’s business
 Determine whether there is a steering committee and procure a list of members.
situation helps you know  Identify business units that will be affected by the data mining project.
what you’re working with in
terms of:

Task 2—Describe Problem Area
 Available resources (personnel and  Identify the problem area, such as marketing, customer care, or business development.
material)
 Describe the problem in general terms.
 Problems
 Clarify the prerequisites of the project. What are the motivations behind the project?
 Goal
Does the business already use data mining?
 Check on the status of the data mining project within the business group. Has the effort
been approved, or does data mining need to be “advertised” as a key technology for th
business group?
 If necessary, prepare informational presentations on data mining to your organization.


Task 3—Describe Current Solution
 Describe any solutions currently used to address the business problem.
 Describe the advantages and disadvantages of the current solution. Also, address the
level of acceptance this solution has had within the organization.
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Objective UG


This is where things get specific. As a result of your research and meetings, you
should construct a concrete primary objective agreed upon by the project
sponsors and other business units affected by the results. This goal will
eventually be translated from something as nebulous as “reducing customer
churn” to specific data mining objectives that will guide your analytics.


Task
 Describe the problem you want to solve using data mining.
 Specify all business questions as precisely as possible.
 Determine any other business requirements (such as not losing any existing customers while increasing
cross-sell opportunities).
 Specify expected benefits in business terms (such as reducing churn among high-value customers by
10%).

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Success Criteria UG


Busines Success criteria fall into two categories:
 Objective. These criteria can be as simple as a specific increase in the accuracy of audits or an agreed-upon
reduction in churn.
 Subjective. Subjective criteria such as “discover clusters of effective treatments” are more difficult to pin
down, but you can agree upon who makes the final decision.


Task List
 As precisely as possible, document the success criteria for this project.
 Make sure each business objective has a correlative criterion for success.
 Align the arbiters of the subjective measurements of success. If possible, take notes on their expectations.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Task 2 : Assessing your situation UG


Inventory of resources: A list of all resources available for the project. These may include
people (not just data miners, but also those with expert knowledge of the business problem,
data managers, technical support, and others), data, hardware, and software.

Requirements, assumptions, and constraints: Requirements will include a schedule for
completion, legal and security obligations, and requirements for acceptable finished work. This
is the point to verify that you’ll have access to appropriate data!

Risks and contingencies: Identify causes that could delay completion of the project, and
prepare a contingency plan for each of them. For example, if an Internet outage in your office
could pose a problem, perhaps your contingency could be to work at another office until the
outage has ended.

Terminology: Create a list of business terms and data-mining terms that are relevant to your
project and write them down in a glossary with definitions (and perhaps examples), so that
everyone involved in the project can have a common understanding of those terms.

Costs and benefits: Prepare a cost-benefit analysis for the project. Try to state all costs and
benefits in dollar (euro, pound, yen, and so on) terms. If the benefits don’t significantly exceed
the costs, stop and reconsider this analysis and your project.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Assesing the Situation UG


What sort of data Example

are available for 


A Web-Mining Scenario Using CRISP-DM
analysis?

This is the electronics e-retailer’s first attempt at Web mining, and the company has decided to

Do you have the consult a data mining specialist to help in getting started. One of the first tasks the consultant
faces is to assess the company’s resources for data mining.
personnel needed
to complete the 
Personnel. It’s clear that there is in-house expertise with managing server logs and product and
project? purchase databases, but little experience in data warehousing and data cleaning for analysis.
Thus, a database specialist may also be consulted. Since the company hopes the results of the
study will become part of a continuing Web-mining process, management must also consider

What are the whether any positions created during the current effort will be permanent ones.
biggest risk 
Data. Since this is an established company, there is plenty of Web log and purchase data to draw
from. In fact, for this initial study, the company will restrict the analysis to customers who have
factors involved? “registered” on the site. If successful, the program can be expanded.

Risks. Aside from the monetary outlays for the consultants and the time spent by employees on

Do you have a the study, there is not a great deal of immediate risk in this venture. However, time is always
important, so this initial project is scheduled for a single financial quarter. Also, there is not a lot
contingency plan of extra cash flow at the moment, so it is imperative that the study come in under budget. If
for each risk? either of these goals should be in danger, the business managers have suggested that the
project’s scope should be reduced.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Resource inventory UG


Task 1—Research Hardware Resources
 What hardware do you need to support?


Task 2—Identify Data Sources and Knowledge Stores
 Which data sources are available for data mining? Take note of data types and formats.
 How are the data stored? Do you have live access to data warehouses or operational databases?
 Do you plan to purchase external data, such as demographic information?
 Are there any security issues preventing access to required data?


Task 3—Identify Personnel Resources
 Do you have access to business and data experts?
 Have you identified database administrators and other support staff that may be needed?
 Once you have asked these questions, include a list of contacts and resources for the phase report.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Requirement, Assumptions and Constraints UG


Task 1—Determine Requirements. The fundamental requirement is the business goal
discussed earlier, but consider the following:
 Are there security and legal restrictions on the data or project results?
 Is everyone aligned on the project scheduling requirements?
 Are there requirements on results deployment (for example, publishing to the Web or reading scores into a database)?


Task 2—Clarify Assumptions
 Are there economic factors that might affect the project (for example, consulting fees or competitive products)?
 Are there data quality assumptions?
 How does the project sponsor/management team expect to view the results? In other words, do they want to
understand the model itself or simply view the results?


Task 3—Verify Constraints
 Do you have all passwords required for data access?
 Have you verified all legal constraints on data usage?
 Are all financial constraints covered in the project budget

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Risk and Contigency UG


Types of risks include:
 Scheduling (What if the project takes longer than anticipated?)
 Financial (What if the project sponsor encounters budgetary problems?)
 Data (What if the data are of poor quality or coverage?)
 Results (What if the initial results are less dramatic than expected?)


Task List
 Document each possible risk.
 Document a contingency plan for each risk.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Terminology


To ensure that business and data mining teams are “speaking the same
language,” you should consider compiling a glossary of technical terms and
buzzwords that need clarification. For example, if “churn” for your business has
a particular and unique meaning, it is worth explicitly stating that for the benefit
of the whole team. Likewise, the team may benefit from clarification of the
usage of a gains chart.


Task List
 Keep a list of terms or jargon confusing to team members. Include both business and data mining
terminology.
 Consider publishing the list on the intranet or in other project documentation.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Cost/Benefit Analysis


This step answers the question, What is your bottom line? As part of the final
assessment, it’s critical to compare the costs of the project with the potential
benefits of success.


Task List
 Include in your analysis estimated costs for:

Data collection and any external data used

Results deployment

Operating costs
 Then, take into account the benefits of:

The primary objective being met

Additional insights generated from data exploration

Possible benefits from better data understanding

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Task 3: Defining your data-mining goals UG


Data-mining goals: Define data-mining deliverables, such as models, reports,
presentations, and processed datasets.

Data-mining success criteria: Define the data-mining technical criteria
necessary to support the business success criteria. Try to define these in
quantitative terms (such as model accuracy or predictive improvement
compared to an existing method). If the criteria must be qualitative, identify the
person who makes the assessment.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Jenis Task yang Dikembangkan

Regression /
Classification Clustering Association
Estimation

Anomaly Sequence Recommendation


Detection Mining Systems

44
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Goals → Data Mining Goals



Now that the business goal is clear, it’s time to translate it into a data mining reality. For
example, the business objective to “reduce churn” can be translated into a data mining
goal that includes:
 Identifying high-value customers based on recent purchase data
 Building a model using available customer data to predict the likelihood of churn for each customer
 Assigning each customer a rank based on both churn propensity and customer value


These data mining goals, if met, can then be used by the business to reduce churn among
the most valuable customers. As you can see, business and technology must work hand-
in-hand for effective data mining. Read on for specific tips on how to determine data
mining goals.

Task List – Data mining goals
 Describe the type of data mining problem, such as clustering, prediction, or classification.
 Document technical goals using specific units of time, such as predictions with a three-month validity.
 If possible, provide actual numbers for desired outcomes, such as producing churn scores for 80% of existing customers.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Example. Data Mining Goal


Example. A Web-Mining Scenario Using CRISP-DM


With the help of its data mining consultant, the e-retailer has been
able to translate the company’s business objectives into data mining
terms. The goals for the initial study to be completed this quarter are:
 Use historical information about previous purchases to generate a model
that links “related” items. When users look at an item description, provide
links to other items in the related group (market basket analysis).
 Use Web logs to determine what different customers are trying to find, and
then redesign the site to highlight these items. Each different customer
“type” will see a different main page for the site (profiling).
 Use Web logs to try to predict where a person is going next, given where he
or she came from and has been on your site (sequence analysis).

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Mining Success Criteria UG


Success must also be defined in technical terms to keep your data mining efforts
on track. Use the data mining goal determined earlier to formulate benchmarks
for success.


Task List
 Describe the methods for model assessment (for example, accuracy, performance, etc.).
 Define benchmarks for evaluating success. Provide specific numbers.
 Define subjective measurements as best you can and determine the arbiter of success.
 Consider whether the successful deployment of model results is part of data mining success.
 Start planning now for deployment.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Task 4: Producing your project plan UG


Project plan: Outline your step-by-step action plan for the project. Expand the
outline with a schedule for completion of each step, required resources, inputs
(such as data or a meeting with a subject matter expert), and outputs (such as
cleaned data, a model, or a report) for each step, and dependencies (steps that
can’t begin until this step is completed). Explicitly state that certain steps must
be repeated (for example, modeling and evaluation usually call for several back-
and-forth repetitions).

Initial assessment of tools and techniques: Identify the required capabilities for
meeting your data-mining goals and assess the tools and resources that you
have. If something is missing, you have to address that concern very early in the
process.

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Project Plan UG


The project plan is the master document for all of your data mining work. If done well,
it can inform everyone associated with the project of the goals, resources, risks, and
schedule for all phases of data mining. You may want to publish the plan, as well as
documentation gathered throughout this phase, to your company’s intranet.


Task List. When creating the plan, be
sure you’ve answered the following
questions:
 Have you discussed the project tasks and proposed
plan with everyone involved?
 Are time estimates included for all phases or tasks?
 Have you included the effort and resources needed
to deploy the results or business solution?
 Are decision points and review requests highlighted
in the plan?
 Have you marked phases where multiple iterations
typically occur, such as modeling?

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Check lists for Bussiness Understanding



From a business perspective:
 What does your business hope to gain from this project?
 How will you define the successful completion of our efforts?
 Do you have the budget and resources needed to reach our goals?
 Do you have access to all the data needed for this project?
 Have you and your team discussed the risks and contingencies associated with this project?
 Do the results of your cost/benefit analysis make this project worthwhile?


From a data mining perspective:
 How specifically can data mining help you meet your business goals?
 Do you have an idea about which data mining techniques might produce the best results?
 How will you know when your results are accurate or effective enough? (Have we set a measurement of data mining
success?)
 How will the modeling results be deployed? Have you considered deployment in your project plan?
 Does the project plan include all phases of CRISP-DM?
 Are risks and dependencies called out in the plan?

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Understanding Documents



Business goals
 Background:
 Business goals:
 Business success criteria:


Current situation
 Inventory of resources
 Requirements, assumptions, and constraints.
 Risks and contingencies.
 Terminology.
 Costs and benefits:


Data-mining goals
 Data-mining goals:
 Data-mining success criteria:


Project plan
 Project plan
 Initial assessment of tools and techniques:

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Dokumentasi
Proyek Data Science

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

CONTOH PENERAPAN
Kasus : Kegagalan Kredit


Problem: Bagaimana ●
Problem: Bagaimana
menurunkan NPL suatu menurunkan NPL suatu
bank bank

Pertanyaan: Bagaimana ●
Pertanyaan: Bagaimana
memperbaiki perhitungan memperbaiki perhitungan
Credit score Credit score

Measurable outcomes: % ●
Tugas Analitik: Klasifikasi
Penurunan kredit gagal ●
Performance Metrics: F1-
bayar Score

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Menentukan Tugas Analytic


Apa Tugas Analitika yang perlu diselesaikan untuk menjawab permasalahan bisnis?

A. Regresi/Estimasi: Memprediksi nilai kontinyu D. Asosiasi: Memprediksi kumpulan item/


dari kasus kejadian yang biasa terjadi bersama
• Prediksi harga rumah berdasar karakteristik tertentu • Mencari barang jualan yang biasa dibeli bersama
• Prediksi harga saham besok • Menyusun portofolio saham

B. Klasifikasi: Memprediksi kelas/ kategori dari E. Anomaly Detection: Menemukan kasus


kasus abnormal/ tidak biasa terjadi
• Prediksi kolektibilitas suatu pinjaman • Pendeteksian transaksi illegal penggunaan kartu kredit
• Prediksi kebangkrutan suatu perusahan di tahun depan • Pendeteksian penerobosan jaringan

C. Klastering: Mengelompokkan kasus berdasar F. Sequence Mining: Memprediksi apa yang akan
kemiripan terjadi dari keadaan saat ini
• Segmentasi nasabah perbankan • Prediksi apakah nasabah akan berhenti berlangganan
• Pengelompokkan pasien yang mirip kasusnya • Menentukan alur pada transaksi e-commerce

G. Rekomendasi: Memberikan rekomendasi pengguna berdasar asosiasi


preferensi dengan pengguna lain yang memiliki ‘taste’ yang sama
• Rekomendasi film untuk ditonton
• Rekomendasi saham untuk dibeli

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Pengukuran Performansi Sesuai Task



Pengukuran performansi tergantung jenis task analitik
yang dilakukan

Matriks performansi adalah ukuran keberhasilan suatu
proyek data science, misal:
– Root Mean Squared Error (RMSE)
– R-Square
– Jackard Index
– Log-loss
– Precision
– Recall
– F1-Score

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Understanding

Data apa yang diperlukan?


Dari mana bisa diperoleh?

Struktur Data: Bagaimana deskripsi data (atribut) yang diperlukan

Jumlah Data: Berapa banyak (record) data yang diperlukan

Sumber Data: Darimana data bisa diperoleh? Apakah sudah tersedia?


- Internal: Sistem Informasi/ ERP, Excel, dokumen
- Eksternal: Web API, Web Scraping
- Dataset via public data
- Dataset via open data

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Business Understanding

Bagaimana rencana pelaksanaan projeknya?

Cost Benefit Analysis: Apakah menguntungkan untuk melakukannya?

Situation Assessment: Analisa keadaan organisasi

Project Plan: Scope (WBS), Time, Schedule, Tim Pengembang

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Understanding
Mengenali/ mendalami data yang dimiliki

Mengumpulkan Data Jumlah Data (Baris dan Kolom)


01 Mengumpulkan Data yang Diperlukan Deskripsi data

Menelaah data Karakteristik atribut/ fitur


02 Menganalisa data secara eksploratif Keterkaitan antar data

Memvalidasi Data Kualitas Data


03 Menilai kesesuaian kualitas data de-
ngan masalah yang akan dipecahkan

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Understanding
Mengapa Perlu Mengenali/ mendalami data yang dimilik i
• The United States armed forces faced a dilemma during the war,
because returning bomber planes were riddled with bullet holes and
they needed better ways to protect them
• “Where should they put it?”
• When they plotted out the damage
these planes were incurring, it was
spread out, but largely concentrated
around the tail, body and wings.
• Should they upgrade these sections?

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Understanding

Mengumpulkan Data yang Diperlukan

Jumlah Data: Berapa banyak yang dapat diperoleh

Deskripsi Data: Penjelasan arti atribut/ fitur

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Understanding

Menelaah data secara eksploratif (EDA)

Karakteristik Atribut: Deskripsi data (atribut) yang diperoleh

Keterkaitan antar Data: Analisis statistik korelasi, Anova, Chi-Squared,…

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Understanding

Memvalidasi data : Menilai kesesuaian kualitas data dengan masalah yang akan dipecahkan

Laporan Kualitas Data:


- Ukuran Data (Atribut/ fitur dan Jumlah record
- Deskripsi statistical atribut
- Relasi antar atribut (dan label)
- Visualisasi data

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Preparation
Memperbaiki kualitas data untuk Pemodelan

Memilih dan memilah data Rekord terpakai


01 Memilih data yang akan dipergunakan Atribut terpakai

Membersihan Data Data lengkap


02
Meminimalkan noise (tidak lengkap, salah) Data yang diperbaiki
Data Pecilan

03
Mengkonstruksi data Fitur tambahan (Feature Engineering)
Menambahkan fitur dan transformasi data Transformasi data (standardisasi, transformasi)

04 Integrasi Data
Menggabungkan data Gabungan data

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Modelling
Mengembangkan Model (Pengetahuan)

Membangun Skenario Pemodelan


01 Membuat strategi pencarian model terbaik

Pemilihan Algoritma Machine Learning (ML)


Pembagian Data
Penentuan Langkah Eksperimen

Membangun model
02 Mengembangkan model dengan Teknik ML

Eksekusi Algoritma
Pengaturan Parameter
Pengukuran Performance Metrics

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Membangun Skenario Permodelan - 1

Membuat strategi pencarian model terbaik

A. Memilih Algoritma: Disesuaikan dengan Tugas Analytics yang dipilih


1. k-Nearest Neighbor (k-NN)
2. Naïve Bayes
3. Regression Techniques
4. Support Vector Machines (SVMs)
5. Decision Trees
6. Random Forests
7. Deep Learning Algorithms
8. ...

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Membangun Skenario Permodelan - 2

Membuat strategi pencarian model terbaik

B. Membagi data: Sesuai dengan ketersediaan data


1. Data Latih: Untuk mengembangkan model
2. Data Uji: Untuk Mengukur performansi model

Data
Latih

Split
Data

Data
Uji

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Membangun Skenario Permodelan -3

Membuat strategi pencarian model terbaik

C. Menentukan Langkah Eksperimen: Untuk mendapatkan model


terbaik secara efisien dan efektif

Best Guess One Factor at A Time Grid Search

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Membangun Model - 1

Mengembangkan model dengan Teknik ML

A. Proses Pelatihan : Untuk mendapatkan model

Teknik ML Model
Data
Latih

1. k-Nearest Neighbor (k-NN)


2. Naïve Bayes
3. Regression Techniques
4. Support Vector Machines (SVMs)
5. Decision Trees
6. Random Forests
7. Deep Learning Algorithms
8. ...

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Modelling - 2

Mengembangkan model dengan Teknik ML

B. Proses Pengujian : Untuk mengukur Performansi

Model Decision
Data
Uji

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Mengevaluasi Model

Mengevaluasi Performansi Model Yang Dihasilkan

Mengevaluasi Model Performansi Capaian vs Target


Memilih Model terbaik
01 Mengukur performansi model

Mengevaluasi Proses Review Proses untuk mencari


batasan atau kekurangan model
02 Menilai apakah proses sudah maksimal

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Contoh Ilustrasi Organisasi dan SDM


Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Scientist


Dibutuhkan pemahaman teori, programming

Juga softskill seperti komunikasi, enterpreneurship

Jadi bukan hanya permasalahan ke-teknisan saja

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Scientists vs Soft. Engineer



Data scientist sesungguhnya bukan software
engineer. Mereka dikhususkan untukl
membangun model dan mengevaluasinya,
mereka bukan ahli di dalam membuat
program aplikasi

Walau begitu dalam pekerjaanya sering sekali
hal ini menjadi tipis batasannya, pada saat ini
seringkali datascientist diharapkan berperan
dalam role yang beragam

Sehingga diharapkan seorang Data Scientist
memahami beberapa masalah teknis lainnya.

Tentu saja hal ini makin kompleks ketika
terjadi pergantian staf, seringkali seorang
data scientist harus mengelola model yang
dia tidak buat. Untuk itulah dokumentasi saat
penyusunan model menjadi hal yang penting

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Tim Proyek AI – Data Science


AI/ML Engineer
01 Mengembangkan metoda AI dan ML untuk diterapkan
memecahkan solusi

Data Scientist
02 Mengembangkan model terbaik dari data untuk
menjawab permasalahan bisnis

03 Data Engineer
Menyiapkan (big) data untuk diolah/ dimodelkan

Data Analyst
04 Menganalisis/ mencari insight dari data (dan
menampilkannya dalam dashboard)

05 Project/ Product Manager


Mengelola projek/ produk berbasis data.

Data scientist, Data Engineer, DevOps engineer
bekerja sama untuk memberikan solusi
06 Domain Expert

Data scientist mempersiapkan pekerjaannya Memberi arahan tentang domain permasalahan
luarannya dapat dimanfaatkan oleh pihak lain, misal
Data Engineer DevOps IT People
07
Menyiapkan infrastruktur IT (terutama deployment)
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Perbedaan Fokus Kompetensi


Data Engineer lebih dari sekedar Database
Administrator

Data Analyst memahami visualisasi dan cara
mengambil kesimpulan dari visualisasi
tersebut

Didampingi oleh tim TI yang memahami
masalah keteknisan (programming,
deployment)
Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science
COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Data Scientist dan ML Engineer

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Pemilihan Model Okupasi

OKUPASI NASIONAL
EITC/AI/AIF Artificial
intelligence fundamentals Data Science AI/ML
[v1r2]
AI/ML
7 Data
Engineer
Data
Scientist Applied
Research

IABAC
Associate
Associate Associcate Associate
Int. Assoc. Bussines 6 Data
Data Data AI/ML
Analytics Engineer
Engineer Scientist Engineer
Certification

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

No UK Judul Associate Data Scientist Data Scientist

1 J.62DMI00.001.1 Menentukan Objektif Bisnis OK

2 J.62DMI00.002.1 Menentukan Tujuan Teknis Data science OK


Sertifikasi
3 J.62DMI00.003.1 Membuat Rencana Proyek Data Science

4 J.62DMI00.004.1 Mengumpulkan Data OK

5 J.62DMI00.005.1 Menelaah Data OK OK

6 J.62DMI00.006.1 Memvalidasi Data OK OK

7 J.62DMI00.007.1 Menentukan Objek Data OK OK

8 J.62DMI00.008.1 Membersihkan Data OK OK

9 J.62DMI00.009.1 Mengkonstruksi Data OK OK

10 J.62DMI00.010.1 Menentukan Label Data OK

11 J.62DMI00.011.1 Mengintegrasikan Data

12 J.62DMI00.012.1 Membangun Skenario Model OK

13 J.62DMI00.013.1 Membangun Model OK OK

14 J.62DMI00.014.1 Mengevaluasi Hasil Pemodelan OK OK

15 J.62DMI00.015.1 Melakukan Proses Review Pemodelan OK

16 J.62DMI00.016.1 Membuat Rencana Deployment Model

17 J.62DMI00.017.1 Melakukan Deployment Model

18 J.62DMI00.018.1 Membuat Rencana Pemeliharaan Model

19 J.62DMI00.019.1 Melakukan Pemeliharaan Model

20 J.62DMI00.020.1 Melakukan Review Proyek Data science

21 J.62DMI00.021.1 Membuat Laporan Akhir Proyek Data science

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science


COLORING THE GLOBAL FUTURE http://www.gunadarma.ac.id

Colab Lab Super Computer DGX-1/A100

TERIMA KASIH

EdgeAI

Sesi 2 . Pengenalan Berbagai Metoda DS Pengantar Data Science

Anda mungkin juga menyukai