Intro To Data Science
Intro To Data Science
WELCOME TO GA
GENERAL ASSEMBLY
Travis Huang (He/Him)
Data Science Part-Time Lead Instructor
https://www.linkedin.com/in/huangtravis/
2
2 | © 2018 General Assembly
What is General Assembly?
General Assembly is a pioneer in education and
career transformation, specializing in today's most
in-demand skills. We foster a flourishing community
of professionals pursuing careers they love.
What We Teach
Supervised Learning
Decision Trees
Identify the Data Science Workflow and explain the value it adds to solving
a business challenge.
WELCOME TO GA
GENERAL ASSEMBLY
What Is Data Science, Anyway?
Domain
Math Methods
Statistics techniques, Knowledge
quantitative and qualitative Industry knowledge,
methods. workflows, data operations,
analytics.
WELCOME TO GA
GENERAL ASSEMBLY
Overcoming Challenges With Data Science
Develop Select, import, and Structure, visualize, Make business Present data-driven
hypothesis-driven clean relevant data. and complete your decisions based on insights to your
questions for your analysis. data. audience.
analysis.
Develop Select, import, and Structure, visualize, Make business Present data-driven
hypothesis-driven clean relevant data. and complete your decisions based on insights to your
questions for your analysis. data. audience.
analysis.
A Closer Look at
the Data Science Workflow
WELCOME TO GA
GENERAL ASSEMBLY
“
Asking the right questions is what separates data
scientists that know ‘why’ from folks that only know
‘what’ (tools and technologies).
Kayode Ayankoya, MBA, PhD | clinical data scientist
Develop Select, import, and Structure, visualize, Make business Present data-driven
hypothesis-driven clean relevant data. and complete your decisions based on insights to your
questions for your analysis. data. audience.
analysis.
Suggested Timing
Suggested
Ensures that
Timing
data is defined and structured.
Here’s an example:
Variable
Variable Description
Type
Binary data is discrete data
survival Fate of passenger Binary that can only be in one of
two categories — either yes
pclass Ticket class Discrete or no, 1 or 0, off or on, etc. It
age Age in years of passenger Continuous can be thought of as
ordinal, nominal, count, or
interval data.
fare Price of ticket (1912 dollars) Continuous
Variable
Variable Description
Type
Variable
Variable Description
Type
Develop Select, import, and Structure, visualize, Make business Present data-driven
hypothesis-driven clean relevant data. and complete your decisions based on insights to your
questions for your analysis. data. audience.
analysis.
survival 38.38%
fare $32.20
Develop Select, import, and Structure, visualize, Make business Present data-driven
hypothesis-driven clean relevant data. and complete your decisions based on insights to your
questions for your analysis. data. audience.
analysis.
Develop Select, import, and Structure, visualize, Make business Present data-driven
hypothesis-driven clean relevant data. and complete your decisions based on insights to your
questions for your analysis. data. audience.
analysis.
Set the scene for your listeners, relating the problem to your audience's interests.
Focus on your hypothesis/solution. Help your audience see what you’re proposing.
Highlight your methodology. How did you come to your conclusion? Be concise —
present steps at a high level.
Feature contributions made and results. Highlight how your results made an impact.
Data science presentations can also be far more complex and exciting, like some of the
research presented by Nate Silver's FiveThirtyEight blog.
Decision Trees
WELCOME TO GA
GENERAL ASSEMBLY
Imagine a flow chart where each level is a question
with a yes or no answer, eventually leading to a
solution to the original question.
68 68
| © |2021 General
© 2021 Assembly
General Assembly
Back to the Workflow...
When are decision trees used?
Develop Select, import, and Structure, visualize, Make business Present data-driven
hypothesis-driven clean relevant data. and complete your decisions based on insights to your
questions for your analysis. data. audience.
analysis.
Where you’d
use decision
trees.
accuracy.
Let’s say we’re using a data set consisting Does the animal
of animals with lots of different breathe air?
Yes No
Bird Mammal
❗ Adding too many splits makes decision trees overly complex and not adaptable
to new data.
75 | © 2021 General Assembly
Decision Trees (Cont.)
ROOT
The starting point of a decision tree
is referred to as the root.
BRANCH
LEAVES
In this case, this would be either a category such as male or female or a range
of numbers (greater than or equal to age 10).
For variables that have more than one category — cabin class, for example —
you would make another branch off of a condition.*
*Within those that are NOT Class 3 and also NOT Class 2.
1. Given that the root node is sex, why would you think that this is the best way to
predict if someone died when the Titanic sank? (male = 1)?
1. Given that the root node is sex, why would you think that this is the best way to
predict if someone died when the Titanic sank? (male = 1)?
2.
1. What is the probability of death, given you are a male in second or third class?
2.
1. What is the probability of death, given you are a male in second or third class?
3.
1. What is the survival rate of a female in first or second class who paid more than $32?
3.
1. What is the survival rate of a female in first or second class who paid more than $32?
4.
1. If you were a 7-year-old boy in third class, would you be more likely to survive than a
7-year old boy in first class? What's the difference in your chances of survival?
4.
1. If you were a 7-year-old boy in third class, would you be more likely to survive than a
7-year old boy in first class? What's the difference in your chances of survival?
10–12 week Immersive courses developed to help you make a career pivot.
8–10-week part-time or 1-week accelerated courses developed to help you advance your career.
Learn a skill in as little as two hours, or tackle something in more depth for 1–2 days.
WELCOME TO GA
GENERAL ASSEMBLY