0% found this document useful (0 votes)
26 views

Unit 1 - AP For Data Science

The document discusses how data is collected from various sources and how data scientists extract insights from messy data. It also discusses different data visualization techniques like bar charts, line charts and scatter plots that data scientists use to explore and communicate data using Python matplotlib library.

Uploaded by

bhavana16686
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Unit 1 - AP For Data Science

The document discusses how data is collected from various sources and how data scientists extract insights from messy data. It also discusses different data visualization techniques like bar charts, line charts and scatter plots that data scientists use to explore and communicate data using Python matplotlib library.

Uploaded by

bhavana16686
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction

 We live in a world that’s drowning in data.


 Websites track every user’s every click. Your smartphone is building up
a record of your location and speed every second of every day.
 “Quantified selfers” wear pedometers-on-steroids that are ever
recording their heart rates, movement habits, diet, and sleep patterns.
 Smart cars collect driving habits,
 smart homes collect living habits, and
 smart marketers collect purchasing habits.
 The Internet itself represents a huge graph of knowledge that
contains (among other things) an enormous cross-referenced
encyclopedia; domain-specific databases about movies, music,
sports results, pinball machines, memes, and cocktails; and too
many government statistics (some of them nearly true!) from too
many governments to wrap your head around.
 Buried in these data are answers to countless questions that no one’s
ever thought to ask.
2
Prepared By: Bhavana Hotchandani
 There’s a joke that says a data scientist is someone who knows
more statistics than a computer scientist and more computer
science than a statistician
 In fact, some data scientists are — for all practical purposes —
statisticians, while others are pretty much indistinguishable
from software engineers. Some are machine-learning experts,
Some are PhDs with impressive publication records.
 Some are PhDs with impressive publication records.
 We define it as “a data scientist is someone who extracts
insights from messy data.”
 Today’s world is full of people trying to turn data into insight.

3
Prepared By: Bhavana Hotchandani
 Facebook asks you to list your hometown and your current
location, ostensibly to make it easier for your friends to find
and connect with you. But it also analyzes these locations to
identify global migration patterns and where the fanbases of
different football teams live.
 As a large retailer, Target tracks your purchases and
interactions, both online and in-store. And it uses the data to
predictively model which of its customers are pregnant, to
better market baby-related purchases to them.
 Obama Case Study
 Some data scientists also occasionally use their skills for good
— using data to make government more effective, to help the
homeless, and to improve public health.
4
Prepared By: Bhavana Hotchandani
 Whitespace Formatting
 Many languages use curly braces to delimit blocks of code.
Python uses indentation:

•This makes Python code very readable, but it also means


that you have to be very careful with your formatting.
•Whitespace is ignored inside parentheses and brackets,
which can be helpful for long-winded computations:

5
Prepared By: Bhavana Hotchandani
6
Prepared By: Bhavana Hotchandani
 A fundamental part of the data scientist’s toolkit is data
visualization. Although it is very easy to create
visualizations, it’s much harder to produce good ones.
 There are two primary uses for data visualization:
 To explore data
 To communicate data

7
Prepared By: Bhavana Hotchandani
 A wide variety of tools exists for visualizing data. We will be
using the matplotlib library, which is widely used (although
sort of showing its age).
 If you are interested in producing elaborate interactive
visualizations for the Web, it is likely not the right choice, but
for simple bar charts, line charts, and scatterplots, it works
pretty well.
 In particular, we will be using the matplotlib.pyplot module.
 In its simplest use, pyplot maintains an internal state in which
you build up a visualization step by step. Once you’re done, you
can save it (with savefig()) or display it (with show()).
 There are many ways you can customize your charts with (for
 example) axis labels, line styles, and point markers.
8
Prepared By: Bhavana Hotchandani
 A bar chart is a good choice when you want to show how
some quantity varies among some discrete set of items.
 For instance, Figure 3-2 shows how many Academy Awards
were won by each of a variety of movies:
 A bar chart can also be a good choice for plotting
histograms of bucketed numeric values, in order to
visually explore how the values are distributed.

9
Prepared By: Bhavana Hotchandani
movies = ["Annie Hall", "Ben-Hur", "Casablanca", "Gandhi", "West Side
Story"]
num_oscars = [5, 11, 3, 8, 10]
# bars are by default width 0.8, so we'll add 0.1 to the left coordinates
# so that each bar is centered
xs = [i + 0.1 for i, _ in enumerate(movies)]
# plot bars with left x-coordinates [xs], heights [num_oscars]
plt.bar(xs, num_oscars)
plt.ylabel("# of Academy Awards")
plt.title("My Favorite Movies")
# label x-axis with movie names at bar centers
plt.xticks([i + 0.5 for i, _ in enumerate(movies)], movies)
plt.show()
10
Prepared By: Bhavana Hotchandani
 As we saw already, we can make line charts using
plt.plot(). These are a good choice for showing trends:

11
Prepared By: Bhavana Hotchandani
 A scatterplot is the right choice for visualizing the
relationship between two paired sets of data.

12
Prepared By: Bhavana Hotchandani
 Linear algebra is the branch of mathematics that deals
with vector spaces.

13
Prepared By: Bhavana Hotchandani
14
Prepared By: Bhavana Hotchandani
15
Prepared By: Bhavana Hotchandani
16
Prepared By: Bhavana Hotchandani
17
Prepared By: Bhavana Hotchandani
18
Prepared By: Bhavana Hotchandani
19
Prepared By: Bhavana Hotchandani

You might also like