Unit 1 - AP For Data Science
Unit 1 - AP For Data Science
3
Prepared By: Bhavana Hotchandani
Facebook asks you to list your hometown and your current
location, ostensibly to make it easier for your friends to find
and connect with you. But it also analyzes these locations to
identify global migration patterns and where the fanbases of
different football teams live.
As a large retailer, Target tracks your purchases and
interactions, both online and in-store. And it uses the data to
predictively model which of its customers are pregnant, to
better market baby-related purchases to them.
Obama Case Study
Some data scientists also occasionally use their skills for good
— using data to make government more effective, to help the
homeless, and to improve public health.
4
Prepared By: Bhavana Hotchandani
Whitespace Formatting
Many languages use curly braces to delimit blocks of code.
Python uses indentation:
5
Prepared By: Bhavana Hotchandani
6
Prepared By: Bhavana Hotchandani
A fundamental part of the data scientist’s toolkit is data
visualization. Although it is very easy to create
visualizations, it’s much harder to produce good ones.
There are two primary uses for data visualization:
To explore data
To communicate data
7
Prepared By: Bhavana Hotchandani
A wide variety of tools exists for visualizing data. We will be
using the matplotlib library, which is widely used (although
sort of showing its age).
If you are interested in producing elaborate interactive
visualizations for the Web, it is likely not the right choice, but
for simple bar charts, line charts, and scatterplots, it works
pretty well.
In particular, we will be using the matplotlib.pyplot module.
In its simplest use, pyplot maintains an internal state in which
you build up a visualization step by step. Once you’re done, you
can save it (with savefig()) or display it (with show()).
There are many ways you can customize your charts with (for
example) axis labels, line styles, and point markers.
8
Prepared By: Bhavana Hotchandani
A bar chart is a good choice when you want to show how
some quantity varies among some discrete set of items.
For instance, Figure 3-2 shows how many Academy Awards
were won by each of a variety of movies:
A bar chart can also be a good choice for plotting
histograms of bucketed numeric values, in order to
visually explore how the values are distributed.
9
Prepared By: Bhavana Hotchandani
movies = ["Annie Hall", "Ben-Hur", "Casablanca", "Gandhi", "West Side
Story"]
num_oscars = [5, 11, 3, 8, 10]
# bars are by default width 0.8, so we'll add 0.1 to the left coordinates
# so that each bar is centered
xs = [i + 0.1 for i, _ in enumerate(movies)]
# plot bars with left x-coordinates [xs], heights [num_oscars]
plt.bar(xs, num_oscars)
plt.ylabel("# of Academy Awards")
plt.title("My Favorite Movies")
# label x-axis with movie names at bar centers
plt.xticks([i + 0.5 for i, _ in enumerate(movies)], movies)
plt.show()
10
Prepared By: Bhavana Hotchandani
As we saw already, we can make line charts using
plt.plot(). These are a good choice for showing trends:
11
Prepared By: Bhavana Hotchandani
A scatterplot is the right choice for visualizing the
relationship between two paired sets of data.
12
Prepared By: Bhavana Hotchandani
Linear algebra is the branch of mathematics that deals
with vector spaces.
13
Prepared By: Bhavana Hotchandani
14
Prepared By: Bhavana Hotchandani
15
Prepared By: Bhavana Hotchandani
16
Prepared By: Bhavana Hotchandani
17
Prepared By: Bhavana Hotchandani
18
Prepared By: Bhavana Hotchandani
19
Prepared By: Bhavana Hotchandani