DS4A Resources 2
DS4A Resources 2
DATA SCIENCE
FOR ALL RESOURCES
Provided here is a list of curated resources that can help you
learn the importance of data science and jumpstart your
development as a data-literate individual.
CONTENT
WHY IS AI IMPORTANT? 04
STATISTICS RESOURCES 07
PYTHON RESOURCES 08
Starting Point 08
Where to Code 08
Advancing Your Python 08
2
HOW TO USE THESE RESOURCES
Going through all the resources provided below would take hundreds of hours. We do not expect that to work for most
people. So how should you use these resources?
First, consider what you want to get out of these resources. Do you want foundational data literacy to stand out in your
current field? Do you want to be able to code with a focus on data? Do you want a career switch to data science?
Second, consider how much time you’re willing to put into this. If you want meaningful results, you need to make an honest
effort and actually put time in consistently and over a sustained period of time. With that said, you do not need to sink in
hundreds of hours to see noticeable development, as we have curated the most-effective-per-unit-time resources for you.
The table below outlines multiple routes you can take through these resources based on your answers to the above
questions (of course, feel free to define your own path). Regardless of what route is best for you, please go through the
“Why is AI Important” and “How are Data Science, Data Literacy, and AI Different” sections to develop a better understanding
of why data science and data literacy are important.
Read chapter 1 of Data Science: Work through the Work through the learnpython.org
20 A Gentle Introduction, find learnpython.org tutorials, read tutorials, read chapters 1, 2, & 7 of
your favorite code-free data chapters 1 & 2 of Data Science: A Data Science: A Gentle Introduction,
science resource and become Gentle Introduction, complete all complete all the Pandas
an expert with that tool the Pandas intermediate level intermediate level Python tutorials
Python tutorials
Read all of Data Science: A Work through all of Python for Work through chapters 1 - 11, 15, &
Gentle Introduction, find your Everybody, read chapters 1 & 2 of 16 of Python for Everybody, work
50 favorite code-free data science Data Science: A Gentle Introduc- through chapters 1, 2, & 9 of
resource and become an tion, complete all the Pandas Openstax’s Intro Stats, complete all
expert with that tool intermediate level Python the Pandas intermediate level
tutorials Python tutorials
Work through the learnpy- Work through all of Python for Work through chapters 1 - 11, 15,
thon.org tutorials, read all of Everybody, read through all of & 16 of Python for Everybody and
Data Science: A Gentle Data Science: A Gentle Introduc- work through all of Think Stats, try
Introduction, complete all the tion, complete all the Pandas out several data science/ML
100+ Pandas intermediate level intermediate level Python advanced tutorials from realpython
Python tutorials, learn 2 tutorials, try out several advanced
code-free data science tools tutorials from realpython
(that have different use cases)
If you’re coming in with prior stats or Python experience, the best route for you is probably not one of the above. If this is the
case for you, we encourage you to pick and choose from the resources listed here that best fill in your missing skills.
3
WHY IS AI IMPORTANT?
If you work in finance, wouldn’t you want to know which
loan applicants will likely default? Or if you were in
sales, which people are most likely to buy your product?
4
HOW ARE DATA SCIENCE, DATA LITERACY,
AND AI DIFFERENT?
AI is just one component of data science. It is an advanced form of data modelling where the
“author” of the model does not know exactly how the model works (just as we do not know
exactly how our brains function), yet the model still gives useful results.
Typical data science models are explicit - that is, the author knows exactly what the program is
doing and why. These models are extremely valuable and require a similar amount of expertise
as AI to create. The difference is that AI can tackle some more complex problems (like artificial
vision) whereas data science models efficiently tackle more tractable problems (like predicting
future sales accounting for seasonality).
In contrast to AI and data science, data literacy has us take a step back from the weeds of
manipulating data and instead has us communicating with data. That is, being able to read and
interpret visualizations and summary statistics to draw accurate conclusions from data. Moreover,
data literacy allows us to effectively communicate those conclusions to anyone, regardless of
their background.
The below graphic provides an overview of the relationship between data literacy, data analysis,
data science, and AI.
COMMUNICATING RESULTS
DATA LITERACY DASHBOARDS
DATA VISUALIZATION
SUMMARY STATISTICS
DATA ANALYSIS
EDA
HYPOTHESIS TESTING
STATISTICAL INFERENCE
DATA SCIENCE
COHORTS ANALYSIS
MODELING
AI
5
DATA SCIENCE: KEY TOPICS
Hopefully at this point you see the value of AI and the data
science that underpins it. Since you cannot create useful AI
without a firm foundation in data science, what are the key
topics and skills you should know to build this foundation?
6
STATISTICS RESOURCES
Below is a table of excellent resources to learn statistics with. Each resource comes with a time
estimate to read through the resource thoroughly + time for dedicated practice to solidify your
learning. The read times are based on slow, high absorption reading (your mileage may vary). Do
not just skim through these books if you are looking to learn.
ESTIMATED TIME
RESOURCE KEY CONSIDERATIONS WHY USE THIS RESOURCE? COMMITMENT
Think Stats - Includes code on Github as If you already know Python, 30h + 20h
examples and homeworks it will teach you both of practice
- Well written statistics and Python’s stats
libraries
Openstax’s - Many detailed exercises You will likely find the text 90h + 0h
Introductory throughout the chapters easier to follow than OpenIn- (practice included
Statistics - Provided “try it” exercise tro. It will give you a very throughout)
solutions solid statistics foundation
OpenIntro - Complementary videos Ideal if you want both lectures 45h + 15h of
Statistics for most sections and a textbook. It will give you extra practice
a solid statistics foundation
Statistics for - Full video table of contents The video is concise and 7.5h video (watch
beginners for easy navigation presents a manageable at x1.25 speed) +
video-course - Slow and detailed amount of content to learn 25h of practice
explanations
7
PYTHON RESOURCES
In data science, two programming languages are dominant: Python and R. R has a stronger
presence in academia, however we recommend Python as it is the more popular language
generally and has a very active community.
STARTING POINT
If you have no familiarity with Python, we suggest you start with the Hello World! tutorial from
learnpython.org. This site provides guides that will walk you through how to code in Python right
from the beginning up to some basic data science applications. You can expect to learn Python’s
syntax from these tutorials, but do not expect to develop a deep understanding of programming.
We suggest you complete all of the tutorials up to and including Pandas Basics. Working through
these tutorials will likely take you 7 to 10 hours to complete.
If you want a deeper understanding of programming, or if you prefer a book to walk you through
Python, we suggest Python for Everybody (which includes recorded lectures and graded assign-
ments to assist your learning). Be sure to complete the exercises as you go through the book,
otherwise you will not retain your learnings. In total, going through this book and practicing will
likely take you around 40 to 50 hours.
WHERE TO CODE
When you start playing around with your own basic Python programs, we suggest you use
Code Skulptor. This is an in-browser version of Python that will let you test and run your code
without needing to install anything.
Once you move to more advanced programs (such as reading from and writing to files), you will
have to install python on your own computer. We suggest you do so by installing Anaconda.
Anaconda is a pain-free way to install Python quickly. It also installs an application called Spyder
(an IDE), which you can write and run your code in. For your convenience, here are the Anaconda
download links for Windows, macOS, and Linux.
8
CODE-FREE DATA SCIENCE RESOURCES
Although having the programming experience lets you make more advanced models, you can
use services that handle the coding for you. This can both save time for experienced program-
mers, and bring the power of data science to those lacking a coding background.
Here are some tools you can use to analyze data with the power of data science:
Google Data Studio -- here are some good resources to learn the tool:
The ultimate guide to google data studio
These short video tutorials on how to use the tool
This article lists 8 good code-free data science options with varying use cases
The key to getting value from these code-free options is knowing the right questions to ask (so
that you generate impactful insights). This article explains the importance of this step and how to
best approach it.
To provide your feedback and suggestions, or if you would like to talk with us, please contact:
[email protected]
THANK YOU