0% found this document useful (0 votes)
18 views41 pages

1666777204580-1666708806962-Introduction To Data Science REV

Uploaded by

Nurin Salwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views41 pages

1666777204580-1666708806962-Introduction To Data Science REV

Uploaded by

Nurin Salwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction to

Data Science
Mentor: Pararawendy Indarjo
Hey I’m,
Pararawendy Indarjo

I am a,
● CURRENTLY | Lead DS at AlloFresh
● 20 - Jul 22 | Senior DS at Bukalapak
● 19 - 20 | Data Analyst at Eureka.ai

Linkedin :
BSc Mathematics MSc Mathematics
https://www.linkedin.com/in/pararawendy-indarjo/
Blog : medium.com/@pararawendy19
Outline
● Introduction to data science
● Data science methodology
● Data science tools
● Googling tips
● Advices for aspiring data scientists
What isisData
What Data Hi, I’m a
data scientist!
Science?
Science?
Data science is the field of study that combines
domain expertise, programming skills, and
knowledge of mathematics and statistics to
extract meaningful insights from data (Source:
DataRobot)
Two main topics in data
science

Analytics Machine Learning


● Process of examining datasets ● Family of statistical models
to draw conclusions about the with ability to modify itself
information they contain. (learn) when exposed to
● Data analytic techniques more data (Source: SkyMind)
enable you to take raw data
and uncover patterns to
extract valuable insights from
it (Source: Lotame)
Analytics
Two subsets

1 2

Exploratory data analysis (EDA) Hypothesis Testing


Literally means exploring the data, to find Statistical tools to validate our hypotheses
meaningful patterns/insights. E.g. ● Does the new homescreen design perform
● How is the profile of our user-base? better than the old one?
● What are causing the decline in our ● How effective is a certain promo strategy?
metrics?
Machine Learning (ML)

Family of statistical models with ability to modify itself (learn) when exposed
to more data (ref: SkyMind)

● What is the machine?


○ Statistical model, like linear regression model
● What to modify?
○ The model’s parameters, i.e. coefficient of each ads channel in a
regression model
First: What is a model?
Test Data

Train Data

Model
(Trained)

Training
algorithm
Model
(Raw)

Prediction
Concepts in Machine Learning
Consider a model to predict sales omzet using different advertising channels

Features

GMV = 0.7 + 2.1 socmed + 3.3 onmedia + 1.4 DOOH

Parameters

Target

● To find the model’s parameters, we train the model on an empirical data set
○ Essentially many pairs of values (features, target)
■ socmed = 2, onmedia = 3, DOOH = 1, GMV = 10,
■ socmed = 1, onmedia = 2, DOOH = 1, GMV = 7, etc
Machine Learning Logic
Logic
1. Use past data to make our model learns the feature-target relationship
2. Use the learned model to predict the target variable of new data points
features target

Past
Data

and many more rows…

New
?
Data
1 Churn Modelling
● Churn: user leaving the company’s’ products
● We can build a model that predicts whether or not
user will churn
○ So that we can take preventive actions

2 Demand Forecasting
● We train a time series model that forecasts future
Sample of Data demands
● Benefit: maximize potential profit, while minimizing
Science production/maintenance cost
Implementations ● E.g: UHT milk demand forecasting

3 Recommendation Systems
● We train a large matrix that crosses users taste
and available products
● E.g.
○ Youtube video recommendation
○ Spotify song recommendation
○ Etc
Data Science
Methodology

Yes, it’s a long-long journey...


Data Science
● Data science underlying idea: creating business value
Methodology
● Therefore, it starts and ends with business contexts
○ Start: business requirements
○ Ends: business evaluation, are the requirements
fulfilled?

● The first task of us as data scientists is to understand the


business
○ How is the business flow?
○ Is there any explicit requirement already? Or is it
implicit?

● Afterward, we craft an analytic/data science approach to


fulfill the requirement
○ See the next slide
Data Science
● Suppose the requirement is, “How to automatically
Methodology
prevent users to churn?”

● The output is a high level data science strategy


○ In this case is a user churn model
Data Science
Methodology
● Data requirements step: we list all the data/metrics
that we want to include in the model

● Data collection step: we locate dan gather the


required data
○ This usually needs strong collaboration with
Data Engineers

● Data understanding step: oftentimes, the data we


need/expect cannot be directly used (re: raw, messy)
○ Our job is to understand these raw data
Data Science
Methodology
● After we understand the data, we
transform/prepare the data so that they have an
appropriate format for the modeling step
○ This is the most time consuming part of DS’s
job!
○ Tasks include: feature encoding, handling
non-normal data, feature engineering, etc

● Finally, the modeling step


○ We train multiple models, and seek for the
best one (model selection)
Data Science
Methodology
● After choosing the best model, we perform model
evaluation
○ Model performance on new data?
○ Model make sense business-wise?

● IF OK, deploy the model in production.


○ We put our user churn model as an automatic
decision engine whether or not user need to be
blasted with vouchers (to prevent churning)

● The final step is a continuous feedback


○ Monitor the model behavior & performance in
production
○ E.g. performance degrades → retrain the
model!
To wrap everything
Data Science
Methodology

Emphasize on the feedback loops


Data Science Tools
The standard stacks

● Big data retrieval tool, i.e. SQL


● In practice, all data is stored in a database
(cloud/on-prem)
● Need to retrieve/make them available in our laptop

● Programming tool
○ Python ● After the data is ready on our machine
○ R ● We’re ready to analyze/build models from the data
● See next slide for R vs Python comparison

● Dashboarding Tool ● Data-driven companies track/monitor MANY


○ Tableau measures/metrics
○ Power BI ● They’re indicators of product performance, growth, etc
○ Looker ● We create dashboards using these tools
Python vs R
The two programming tools for Data Scientists
Python vs R
Sample different syntaxes

● We have a data named df


● Filter data: female users whose age are under 100
● Sort based on column age
● Show top 5 rows of the processed data

(df[df['age']<100 & df['sex']=='female’] df %>%


.sort_values('age') filter (age < 100,
.head(5) sex == 'female') %>%
) arrange(age) %>%
head(5)
Python vs R
Testimony from Industry Practitioner

● If your work is around deep-dive data analysis, insight creation and visualization
○ Then both Python and R are equally capable

● If your work is to develop ML model to be served in production environment


○ Then Python is the best fit
○ It has strong deployment support (functionality, integration, etc)

● As per my experience/observation, most of companies are using Python as the main


tool for their DSs
○ BL, Tokped, Gojek, Ajaib, Eureka, etc

● Conclusion for aspiring Data Scientist


○ It’s a better bet to learn Python first!
Googling Stuffs
Effectively
If you think being a Data Scientist
means that you have to…
- Remember every code to implement each machine learning
model
- Remember all machine learning formula
- Know all Python libraries you need to do
Learn to Google!

After this bootcamp ends…


- How will you keep on learning?
- If you encounter an error, who will you ask?
- If you want to know the most updated technology for a
problem, who will you reach out to?

What if you want to study Data Science further? What if you


want to move into Deep Learning?
Learn to Google!

Tips:
- If you face an error, copy the error message last line, and
search it on Google. Usually, a website called
‘stackoverflow’ has a forum thread on that error.
- Learn to read official documentation of a package.
- If you want to search for Python codes for a particular
problem, try adding “python” at the end of your google
search.
Learn to Google!

Tips:
- To learn modelling study cases and techniques, try to
search your problem and read articles from these websites:
- towards data science
- machine learning mastery
- analytics vidhya
Learn to Google!

Example:

- Seaborn tutorial page:


https://seaborn.pydata.org/tutorial.html
- Article about Decision Tree mixed with Linear Regression:
https://towardsdatascience.com/linear-tree-the-perfect-mi
x-of-linear-model-and-decision-tree-2eaed21936b7
Advice for
Aspiring
Data
Scientists
Business First

You’re not hired for only doing


modeling, for creating solutions
instead!

Solutions can be:


● ML powered features
○ E.g. recommender system
● Actionable insights
○ Most of the times!!
Don’t Fall for The
Hype
Deep Learning? Neural Network?
Probably you will not need it!

Use familiar tools/techniques, it


helps

Sometimes complex method just


confused you even further, give
no values at all
Comprehend the
methods
“I choose XgBoost because
people say it’s good model”

“People say negative correlation


means weak correlation”

“What” is not enough, you need


to know “Why”
Constant Learning

Data science is truly vast field!

Constant learning will


future-proof your career as data
scientist. Learn beyond our
syllabus. And don’t forget to
learn the fundamental
knowledge
Domain Expert

Data science is not enough, you


need to learn the domain(s)
where you want to apply it

Work as DS in HR? Learn about


HR management

Work as DS in healthcare
industry? Learn about healthcare
Build Your Portfolio

Portfolio not only help you to


understand what you learn, it
help you to be looked by HR!

Small project(s) are cool, build


your portfolio upon it.

*will be discussed in the next


meeting!
Thank you

You might also like