Introduction To Data Science
Introduction To Data Science
Introduction to
Data Science
PT Mitra Bakti UT
(subsidiary of Yayasan Karya Bakti United Tractors )
1. Group Leader Mechanical Electrical
2. Account Manager
Data Science is a
multidisciplinary subject
Steps in doing analytics
Data Science Career
Data Science Around Us
Learning Path of Data Science in general
(Coding)
Python ML : Regresi
Data
Visualisasi
ML :
Introduction SQL Final Project
Classification
Statistik
ML :
Packages
Clustering
Cross-industry standard
process for data mining
(CRISP-DM)
The Process
From From
From Problem
Requirements Understanding
to Approach
to Collection to Preparation
Example:
Business problem:
• Information overload, facts vs hoaxes.
Example:
Analysis Method:
• NLP (Natural Language Processing)
• Use Supervised Learning especially
Classification model to classify whether a news
is fake or not
Algorithm
• NLTK (Natural Language Toolkit)
• Decision Tree
• Random Forest
• etc.
Data Collection
• Where is the data acquired from? Do we need
query language?
• What is the query to collect the data?
EXAMPLE :
Data Acquired From :
- Real News Feed from the web.
- The format of the data could be in SQL form
- Other dataset from online sources to be used for
learning purposes.
Data Understanding
• What Cleaning Process should be involved later?
• What information can the data tell us?
• What can we do with the data?
Example
Key Question for General Understanding
• Is there an Missing values and/ or Outlier?
• Is there any incorrect data types?
Example
Approaches for handling missing values:
• Change with mean (if normally distributed)
• Change with modulus(if discrete variables)
• Change in respect with other features
2. Defining questions
Identifying the relationship between the variables that are
particularly interesting or unexpected
3. Visualizations
Using effective visualizations to communicate the result
About Cleaning and Preprocessing
Dataset
Cleaning your data should be the
first step in your Data Science
(DS) or Machine Learning (ML)
workflow.
1. Duplicate Dataset
2. Missing data
3. Outliers
4. Data Type
Common Problem in Data Cleaning
1. Duplicate Dataset
2. Missing data
3. Outliers
4. Data Type
Analyze the
Appropriately
Identifying missing number or
delete or impute
values proportion of
missing values
missing values
Common Problem in Data Cleaning
1. Duplicate Dataset
2. Missing data
3. Outliers
4. Data Type
Common Problem in Data Cleaning
1. Duplicate Dataset
2. Missing data
3. Outliers
4. Data Type
Example
Fitting and improve Model:
• Use Decision Tree, Random Forest etc.
• Use Random Search CV for Hyperparameter Tuning
Model Deployment:
• Use Flask for Framework
• Create web-based product for management to product
• Credit Fraud Detection Prevention
Thank You !