Algoritma Data Science School Syllabus
Algoritma Data Science School Syllabus
Tokopedia
The lecturers are very experts in this field, able to Here, we learn that our data can be utilized
master the materials and liven up the classroom optimally to provide deeper insights before
atmosphere. making decisions.
So, my key takeaway from the course is the The instructor team of Algoritma are absolutely
utilization of the data visualization with R, library capable, they not just master the theory, but they
in R, step-by-step preprocessing, and then we also have practical experiences so that we are
get to know how to use a lot of methods in data taught with materials that are more relevant to the
science and the data criteria suitable for our case. field that we are in.
2
STUDENT
TESTIMONIALS
Ezra Albers Uzila Dwiyanda
Cohort Jupiter Cohort Midas
The lifetime learning benefit provided is really Algoritma encourages students to continue
beneficial and can keep us up-to-date with learning independently by providing a
the current data science trends. project called “learning by doing.”
The material provided is not just practical, but Algoritma is active for career support from making
also accentuating the business side so that it the final project, selecting topics, helping to build
became relevant to the real-life business case. our foundation, and presenting materials.
Fortunately, because I joined Algoritma Academy After taking Algoritma courses, I became more insightful
and with the provision of the community and on how data science is used and how we can leverage
teaching assistant, so whenever I need them, I the data to make a business decision. Not only for
just need to send an email or message. predicting, but we can also create historical data to
analyze the trends we haven’t seen before.
3
REAL BUSINESS CASES
Digital transformation has accelerated the amount of data generated by
businesses and organizations around the world. Companies that can transform
their data into actionable insights will thrive and gain a competitive advantage.
Algoritma Data Science Academy is a 4 months program that will help you to
develop your programming & statistics skills as you build data science projects
modeled after real-life business cases, one at a time.
4
LEARNING PROJECTS
3 50 9
CAPSTONE BUSINESS LEARN BY
PROJECT CASES BUILDING
PERSONALIZED
LEARNING
Every student have different learning objectives, abilities, and
preferences. We will personalize your learning experience with:
Interactive Learning
In-Class Mentoring
Every Student
following along.
(SPVQ .FOUPSJOH
is Unique.
homework or projects? You can join the weekly
mentoring session to help you get unstuck.
6
MODULES & LEARNING OUTCOMES
72 HOURS)
DATA ANALYTICS SPECIALIZATION ((51 HOURS)
Python for Data Python Understanding the practice of using Package is the mandatory part to help in analyze, Knowing what and why virtual environment.
environments package & environment managers in explore, and visualize data. In addition, learning
Analysts (P4DA)
and IDE. python projects. packages and environment manager will help the Differentiate between Python Script and
12 Hours
12 Hours student to build and maintain projects easily. Python Notebook.
Proficiency in writing Python codes in
LEARN MORE Jupyter Notebook. Jupyter Notebook is a platform that allows the Understanding Python Notebook document.
student not only to write the script but also to
create documentation. Understanding Jupyter
Notebook will help the student to present project
documentation as a product report.
Python Basic Understanding Python fundamentals. Python is one of the programming languages Knowing basic Python syntax.
Programming. used for data analysis. Therefore, learning Python
fundamentals is essential before data processing. Differentiate between referencing and
copying in Python.
Introduction to
Pandas library. Understanding the use of Pandas Pandas is one of the powerful libraries used Knowing data structures in Pandas.
library as data analysis tools in Python. by Python to help in the data analysis process.
Understanding Pandas library will make student Knowing data types in Pandas.
Performing exploratory data analysis in easier in wrangle and analyze data.
Pandas. Read and explore data.
Exploratory data analysis (EDA) is an essential part of
Indexing and subsetting with Pandas. analyzing patterns of the data. After understanding Understanding basic descriptive statistics in
the Pandas library, students will have the ability to .describe() method.
recognize data patterns.
Able to perform indexing and subsetting
operations with Pandas.
7
MODULES & LEARNING OUTCOMES
(72 HOURS)
DATA ANALYTICS SPECIALIZATION (51 HOURS)
Course Name Objective Knowledge Description Key Competencies
Exploratory Working with Familiarity with DateTime format in the Ability in understanding DateTime format allows the Knowing what distinguishes the DateTime
Data Analysis date, time & data analysis process. student to create manipulation in DateTime data object from other data types.
Why and what: Understanding the use of contingency Understanding in producing contingency tables Knowing what, why, and how to create a
exploratory data tables and how to create it in Pandas. allows students to present information about frequency table.
analysis. frequency or data aggregation.
Knowing what, why, and how to create an
aggregation table.
Treating missing Understanding missing values in data Handling missing and duplicated values are Knowing how Pandas denote missing data.
values and and how to handle them. necessary to create better analysis to answer
duplicates in business problems. Able to observe and impute missing values
data. Understanding duplicates data and in data.
how to remove them.
Able to monitor and handle duplicates.
8
MODULES & LEARNING OUTCOMES
(72 HOURS)
DATA ANALYTICS SPECIALIZATION (51 HOURS)
Course Name Objective Knowledge Description Key Competencies
Wrangling
Data Wrangling Python Understanding how to handle Learning and understanding how to create a Able to import and export Python
& Visualization reproducible dependencies in a Python project. reproducible environment are mandatory since they dependencies.
will help standardize requirements between team
(DWV) environment.
members.
12 Hours
12 Hours
LEARN MORE Understanding Able to identify/differentiate multi-index A multi-index dataframe is a high-dimensional Able to fetch data from various data sources.
multi-index dataframe. dataframe with two or more indexes in rows or
dataframe. columns. Ability in understanding structure multi- Able to work with multi-index dataframe.
index dataframe is essential since multi-index
dataframe have different treatment from single index
dataframe.
Data wrangling Able to perform data reshaping with Data reshaping for data preparation and visualization Knowing why and how to perform data
and reshaping. Pandas. needs. reshaping with Pandas.
Able to perform group by aggregation. Group by aggregation is essential to present hidden Able to aggregate data using group by
information within the data. operation.
Data Able to create data visualization using Showing data patterns from the exploratory results Knowing how to interpret & create a plot in
Visualization. Pandas for exploratory data analysis. using visualization method, helping the student to Pandas.
understand insight from the data easily.
Using boxplot to visualize statistical
distribution in data.
9
MODULES & LEARNING OUTCOMES
(72 HOURS)
DATA ANALYTICS SPECIALIZATION (51 HOURS)
Course Name Objective Knowledge Description Key Competencies
Capstone End-to-end data Understanding in end-to-end project Students will perform data analysis using Python Knowing several alternatives to publish
Project --Data
Data project. analysis. programming language by building mini projects analysis reports.
based on their learning.
Analytics
3 Hours
10
(72 HOURS)
Basic Principles Understanding of the basic principles This module covers fundamental concepts of Representation of Data in the Scikit-learn.
Introduction to
of Machine of machine learning and its machine learning using the scikit-learn library.
Machine
Learning. applications. Students will learn data representation Load Data & Data preprocessing.
Learning I
(predictor and target variables), loading,
(IML1) Able to decide predictor-target and preprocessing, as well as classification Predictive Analysis with ML: Classification
12 Hours variables and preprocess data and regression cases in machine learning. & Regression.
before model machine learning
building.
Machine Understanding the concept of Support Vector Machine (SVM) is a family of Introduction to Support Vector Machine (SVM).
Learning Support Vector Machine and its extremely powerful models that can be used
Workflow: hyperparameters. in classification and regression problems. Hyperparameters SVM.
Support Students will learn about SVM concepts,
Vector Knowing the general machine including the maximum margin classifier,
Machine (SVM). learning workflow. and the concept of its hyperparameters
Implementation Able to create classification SVM can solve classification ("Support Vector SVM in Classification.
SVM in ("Support Vector Classifier") Classifier") and regression ("Support Vector
Classification model. Regressor") problems and can be extended to SVM in Regression.
and Regression. model non-linearity in the data. Even though the
Able to create regression workflow between classification and regression
("Support Vector Regressor") model. is almost the same, the approach from model
building to evaluation will be different.
11
(72 HOURS)
Model Proficiency in evaluating models Students will improve model performance by Model Evaluation.
Improvement with the right evaluation tuning its hyperparameters and how to
Technique. metrics—accuracy for measure model performance. Model evaluation Tuning Hyperparameters.
classification models and is necessary because we can know the model
mean absolute error (MAE) is good or not (can predict correctly or not)
for regression model. from this step.
Introduction to Ensemble Understanding ensemble machine Ensemble machine learning is essential to Introduction to Ensemble Learning.
Machine Machine learning using XGBoost fundamentals. know because its more potential to get a
Learning II Learning model that generalizes well to unseen data Types of Ensemble Learning: Boosting.
(IML2) using XGBoost. (low bias and low variance) than a single
machine learning model. A Short Introduction to Boosting.
9 Hours
One of the methods of ensemble learning is Concept of XGBoost and Its Hyperparameters.
boosting. For this course, students will be
looking at a specific algorithm that uses the
boosting method and that is the XGBoost
algorithm.
Implementation Able to create classification Covering up how to use XGBoost for classification XGBoost in Classification.
XGBoost in ("XGBClassifier") model. and regression tasks, including hands-on
Classification activities and examples. XGBoost in Regression.
and Regression. Able to create regression
("XGBRegressor") model.
Tuning Proficiency in tuning models more GridSearchCV and RandomSearchCV are GridSearchCV.
Hyperparameter efficiently and effectively to get the techniques for hyperparameter optimization and
Optimizer. best model performance using how to use them to improve machine learning RandomSearchCV.
GridSearchCV and RandomSearchCV. model performance.
12
MODULES & LEARNING OUTCOMES
DATA VISUALIZATION SPECIALIZATION (42 HOURS)
Course Name Objective Knowledge Description Key Competencies
Programming
Programming Programming Building reproducible data analysis RStudio is the main graphical user interface (GUI) for Rstudio Environment.
for Data Language and reports using Rstudio. the R programming language. Therefore, learning
Science
Science (P4DS)
(P4DS) Tools in R. the environment of RStudio will help the student Differentiate between R Markdown and R
Proficiency in using the R programming effectively use R and RStudio as primary tools for a Script.
9 Hours
9 Hours
language. data science project.
R Syntax.
LEARN MORE
Data types in R.
Data structures in R.
Statistical Able to use R as a data science tool. Data scientists are expected to create a product, be Knowing project workflow with R.
Computing. it an analytics report, machine learning models, or a
script for automating the process. Therefore, learning Making reproducible data science with R.
how to present a report from Rmarkdown or building
a custom function via R script is an essential part of
the data science workflow.
Practical Statistics
Statistics Descriptive Summarise data and interpret Descriptive statistics is a quick way to summarise Measurement of central tendency data.
(PS) Statistics. correlation. and gain insights from data. Learning descriptive
statistics is also essential to build accurate and Measurement of spread.
6 Hours
6 Hours
Able to analyze data distribution and effective data visualization.
identifying outliers. Measurement of data relation.
LEARN MORE
Data distribution will give more insight into the
overall data than a single value such as mean and Benchmark outlier of data.
13
11
MODULES & LEARNING OUTCOMES
DATA VISUALIZATION SPECIALIZATION (42 HOURS)
Course Name Objective Knowledge Description Key Competencies
Statistical Able to calculate probability. The probability distribution theory is essential and Probability testing.
Inference. one of the main principles behind various statistics
Able to perform hypothesis testing hypothesis testing. Confident of a sample.
analysis.
Hypothesis testing is essential in many business Statistical testing.
processes, especially where we want to do some
experiments and analyze the result. Hypothesis testing.
Data Visualization Plotting Able to create a statistical plot in R. A data visualization, first and foremost, has to convey Understanding built-in plotting
in R (DVR) Essentials. the data accurately. Therefore, students will learn functionalities in R.
12 Hours
Hours the basics of data visualization, including graphic
12
elements, what type of charts are appropriate for Understanding how to customize plot with
LEARN MORE
numerical data, what type of charts are suitable for built-in function in R.
categorical data, etc.
Plotting With Able to create a plot with ggplot2. ggplot2 is the best package for visualization in R Understand grammar of graphics system.
ggplot2. and has been recognized by some major companies,
Able to create a publication-ready plot. including Airbnb, BBC, Google, etc. By learning Understand geometries.
ggplot2, students will able to build a practical and
aesthetically pleasing visualization.
14
12
MODULES & LEARNING OUTCOMES
DATA VISUALIZATION SPECIALIZATION (42 HOURS)
Good visual presentations tend to enhance the Understand how to create a multi-
message of the visualization. It will help convince dimensional faceting plot.
clients and people about the argument and insight
that the data scientist/analyst finds.
Introduction to Able to visualize geospatial data. Building an effective visualization with map and Understand the basic concept of leaflet.
Leaflet. geospatial data will help the student to present more
insights.
Interactive Data Wrangling Able to process data with dplyr. Data wrangling or data transformation is an essential Understanding how dplyr is works.
with dplyr, and step in data science workflow before doing any
Plotting and
Plotting with Able to create interactive plot using analysis or visualization. The dplyr package is a great Common function to do data preparation
Dynamic
Plotly. plotly tool that will help the student to do data wrangling. with dplyr.
Dashboards
Dashboards
(IPDD)
(IPDD) Interactive plot can give more information compared
12 Hours
12 Hours to static visualization. People can hover or click
the plot to access more information from each data
LEARN MORE point.
Building Able to create an interactive document A dashboard is essential to present many insights Understanding how to create a dashboard
Interactive with flexdashboard. and information on a single page. Flexdashboard with flexdashboard.
Document. is a package from R that will allow students to
build interactive plots and design a single-page
dashboard.
Introduction to Able to create a dashboard using Shiny. A web dashboard is a great tool to present real-time Understand how to build an interactive
Shiny. information for management or other departments. dashboard.
Able to publish Shiny application Sometimes the information will be divided into
online. multiple tabs/pages. Shiny is an excellent package Understand Shiny dashboard components.
from R to build a web dashboard and have the same
features as Tableau or PowerBI. Understand hosting and deployment.
15
13
MODULES & LEARNING OUTCOMES
DATA VISUALIZATION SPECIALIZATION (42 HOURS)
Capstone End-to-End Data Able to analyze data and build Data scientists should find insight and Prepare data for visualization.
Project -- Data
Data Visualization an interactive data visualization recommendations from data and communicate the
Visualization Project. dashboard. result on a practical and interactive web dashboard. Build a data visualization dashboard.
3
3 Hours
Hours
Communicate insights from the dashboard.
16
14
MODULES & LEARNING OUTCOMES
MACHINE LEARNING SPECIALIZATION (90 HOURS)
Course Name Objective Knowledge Description Key Competencies
Regression Linear Regression Data preparation. As data scientists, we must create and explain the Linear regression concept.
Models (RM) Models and the machine learning models that have been created.
Key Terminologies. Create and interpret linear regression In addition, model interpretation is needed to know Feature selection.
12 Hours
12
model. each predictor’s effect on the predicted results.
Evaluating linear regression.
LEARN MORE
Evaluate the linear regression model. After making a regression model, we want to get the
best model performance to predict our new data set. Linear regression assumptions.
To determine the best model, we must evaluate it
using its model performance.
Classification Build Logistic Able to choose the suitable There are various machine learning models ready Logistic Regression Concept.
Machine
in Machine Regression and K - classification method to solve business to be deployed. However, understanding the most
Learning II
Learning Nearest Neighbors problems. basic and simple algorithms is essential. Logistic K - Nearest Neighbors Concept.
(CML 1)
1) Algorithm. regression is sufficient to solve most of the problems
while also comes with the benefit of interpretability
12 Hours
12 for research. K-NN is a simple algorithm that can deal
with multiple numeric predictors.
LEARN MORE
Solving Create classification models. As data scientists, we must create models to predict Logistic regression model.
Classification classes / categorical data, explain the model, and
Problem. Evaluating classifiers. evaluate the model’s performance. K - nearest neighbors model.
Model evaluation.
Classification Understand the Able to choose a suitable classification After students understand the basic machine Naive Bayes concept.
Machine
in Machine Practical and method. learning algorithm, it will be essential to learn about
Learning IIII
Learning Theoretical more advanced classification models. Understanding Decision Tree concept.
Text Mining in R. Able to do data preprocessing from raw Many data may not come in a structured (tabular) Text cleansing.
text format. Text is an example of widely used
unstructured data. It is essential for a data scientist Implementation of Naive Bayes for text
to be able to process text data to gain insight and classification.
create a machine learning model.
Unsupervised Practical and Determine appropriate PCs for High dimensional data contains more information The basic concept of PCA.
Machine
Machine Theoretical dimensionality-reduction and but may also include noise and needs higher
Learning
Learning (UL)
(UL) Aspect of PCA and reconstruct the data using the computational power to solve. Using appropriate The basic concept of K-Means.
K-Means. corresponding PCs. PCs allows us to reduce data dimension while
12 Hours retaining as much information as possible, resulting Evaluating clustering result.
Determine reasonable k and evaluating in a decent model with less effort.
LEARN MORE
clustering result.
Understanding K-Means and clustering will allow
students to make meaningful groups (cluster) of
data and interpret for each cluster to obtain business
insight which will help in business decision making.
Apply Biplot for Create biplot using desired PCs. Visualization of multi-dimensional data is highly Qualitative and quantitative variables.
Visualization and Understanding of biplot. complex. Biplot helps us visualize and understand
Draw Insight. multi-dimensional data, such as detecting outlier of Biplot visualization.
the overall data, correlation between features, data
distribution, etc. Biplot’s feature.
Time Series
Time &
Series & Understand Able to create time series object. A time-series object is an object that contains the The basic concept of time series.
the Concept of value itself and time factor. It is needed to make a
Forecasting (TSF)
Forecasting (TSF)
Time Series and Able to extract components trend, time series analysis. Time series decomposition.
12
12 Hours
Hours Components of the seasonal, and error from a time series
Decomposition. object. Trend, seasonal, and error are components that exist
LEARN MORE in many time series objects. These components will
help us to determine the right time series model to
train.
18
16
MODULES & LEARNING OUTCOMES
MACHINE LEARNING SPECIALIZATION (90 HOURS)
Application of Implementation of time series model for Many time series model has a requirement in the Forecasting models.
Time Series Model forecastings. time series component, so we can’t randomly use
the model. Understanding the requirement of each The linear relationship between lagged
Evaluate the time series model. time series model will help us to determine a suitable values of a time series.
model.
Model evaluation.
Model evaluation is used to evaluate the model
created to forecast our future data. Assumption of time series model.
Neural Network Application of Knowing the correct activation function The activation function and the cost function are Neural network concept.
Learning
& Deep Learning Neural Network. and cost function when building a essential things in neural network architecture. It will
neural network architecture. determine your model’s prediction accuracy. Data preparation.
(NN&DL)
12
12 Hours
Hours Neural network application. Keras is one of the deep learning frameworks. This Build neural network model.
framework will help to build a neural network model
LEARN MORE Able to use Keras framework to build a easier. Evaluating neural network model.
neural network model.
Capstone End-to-end Able to build end-to-end machine Student will be given a series of cases to solve with Knowing how to do data preprocessing.
Machine Machine Learning learning workflow. machine learning model, such as predicting the
Learning Workflow. number of customers in the next week (time series), Knowing how to build a supervised machine
building image recognition model (classification), or learning model.
3 Hours
3 Hours
predicting the strength of a concrete mixture based
on the materials composition (regression).
19
17
CERTIFICATES
Upon successful completion of the academy program, participants will be awarded a digital
certificate of completion and grade transcript by Algoritma Data Science School.
20
18
ALGORITMA DATA
Our Vision:
SCIENCE SCHOOL
We strive to solve the issue of an under-equipped modern workforce
“To accelerate data science in facing the collective problem of the contemporary digital era.
adoption across the region and We help equip every professional with data science skills, such as
data visualization, regression, data modeling, machine learning, and
provide employment opportunities.” statistical programming, that will prepare them for today and tomorrow's
employment.
THE 1ST TO BE
CERTIFIED IN ASIA
RStudio’s Instructor training and certification program is a rigorous
program that applies modern evidence-based teaching practices to
teach data science using R and RStudio. Our Instructors at Algoritma are
RStudio’s Education official Partners and the first in Asia to be certified.
21
19
BY THE NUMBERS
2000+
2.000+
Academy Students
500+
400+
Enterprise Clients
100+
In-House Training
2
National-Scale Data Hackathon
Algoritma 0813-8550-5579 @TeamAlgoritma @TeamAlgoritma Algoritma Data Science School Algoritma Data Science School