0% found this document useful (0 votes)

255 views

Data Manipulation With Pandas

The document introduces pandas and DataFrames for data manipulation. It discusses how pandas allows working with tabular, rectangular data through DataFrames. It provides various examples of exploring, sorting, filtering and selecting subsets of data from a DataFrame of dog information. The key methods demonstrated include .head() to view the top rows, .info() for metadata, .describe() for summaries, and slicing/indexing to subset rows and columns.

Uploaded by

Zhuo Yang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

255 views

Data Manipulation With Pandas

Uploaded by

Zhuo Yang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 147

Introducing

DataFrames
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
What's the point of pandas?
Data Manipulation skill track

Data Visualization skill track

DATA MANIPULATION WITH PANDAS

Course outline
Chapter 1: DataFrames Chapter 3: Slicing and Indexing Data
Sorting and subse ing Subse ing using slicing

Creating new columns Indexes and subse ing using indexes

Chapter 2: Aggregating Data Chapter 4: Creating and Visualizing Data

Summary statistics Plo ing

Counting Handling missing data

Grouped summary statistics Reading data into a DataFrame

DATA MANIPULATION WITH PANDAS

pandas is built on NumPy and Matplotlib

DATA MANIPULATION WITH PANDAS

pandas is popular

1 h ps://pypistats.org/packages/pandas

DATA MANIPULATION WITH PANDAS

Rectangular data
Name Breed Color Height (cm) Weight (kg) Date of Birth
Bella Labrador Brown 56 25 2013-07-01
Charlie Poodle Black 43 23 2016-09-16
Lucy Chow Chow Brown 46 22 2014-08-25
Cooper Schnauzer Gray 49 17 2011-12-11
Max Labrador Black 59 29 2017-01-20
Stella Chihuahua Tan 18 2 2015-04-20
Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

pandas DataFrames
print(dogs)

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
4 Max Labrador Black 59 29 2017-01-20
5 Stella Chihuahua Tan 18 2 2015-04-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

Exploring a DataFrame: .head()
dogs.head()

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
4 Max Labrador Black 59 29 2017-01-20

DATA MANIPULATION WITH PANDAS

Exploring a DataFrame: .info()
dogs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 name 7 non-null object
1 breed 7 non-null object
2 color 7 non-null object
3 height_cm 7 non-null int64
4 weight_kg 7 non-null int64
5 date_of_birth 7 non-null object
dtypes: int64(2), object(4)
memory usage: 464.0+ bytes

DATA MANIPULATION WITH PANDAS

Exploring a DataFrame: .shape
dogs.shape

(7, 6)

DATA MANIPULATION WITH PANDAS

Exploring a DataFrame: .describe()
dogs.describe()

height_cm weight_kg
count 7.000000 7.000000
mean 49.714286 27.428571
std 17.960274 22.292429
min 18.000000 2.000000
25% 44.500000 19.500000
50% 49.000000 23.000000
75% 57.500000 27.000000
max 77.000000 74.000000

DATA MANIPULATION WITH PANDAS

Components of a DataFrame: .values
dogs.values

array([['Bella', 'Labrador', 'Brown', 56, 24, '2013-07-01'],

['Charlie', 'Poodle', 'Black', 43, 24, '2016-09-16'],
['Lucy', 'Chow Chow', 'Brown', 46, 24, '2014-08-25'],
['Cooper', 'Schnauzer', 'Gray', 49, 17, '2011-12-11'],
['Max', 'Labrador', 'Black', 59, 29, '2017-01-20'],
['Stella', 'Chihuahua', 'Tan', 18, 2, '2015-04-20'],
['Bernie', 'St. Bernard', 'White', 77, 74, '2018-02-27']],
dtype=object)

DATA MANIPULATION WITH PANDAS

Components of a DataFrame: .columns and .index
dogs.columns

Index(['name', 'breed', 'color', 'height_cm', 'weight_kg', 'date_of_birth'],

dtype='object')

dogs.index

RangeIndex(start=0, stop=7, step=1)

DATA MANIPULATION WITH PANDAS

pandas Philosophy
There should be one -- and preferably only one -- obvious way to do it.

- The Zen of Python by Tim Peters, Item 13

1 h ps://www.python.org/dev/peps/pep-0020/

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Sorting and
subsetting
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
Sorting
dogs.sort_values("weight_kg")

name breed color height_cm weight_kg date_of_birth

5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

Sorting in descending order
dogs.sort_values("weight_kg", ascending=False)

name breed color height_cm weight_kg date_of_birth

6 Bernie St. Bernard White 77 74 2018-02-27
4 Max Labrador Black 59 29 2017-01-20
0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
5 Stella Chihuahua Tan 18 2 2015-04-20

DATA MANIPULATION WITH PANDAS

Sorting by multiple variables
dogs.sort_values(["weight_kg", "height_cm"])

name breed color height_cm weight_kg date_of_birth

5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

Sorting by multiple variables
dogs.sort_values(["weight_kg", "height_cm"], ascending=[True, False])

name breed color height_cm weight_kg date_of_birth

5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
0 Bella Labrador Brown 56 24 2013-07-01
2 Lucy Chow Chow Brown 46 24 2014-08-25
1 Charlie Poodle Black 43 24 2016-09-16
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

Subsetting columns
dogs["name"]

0 Bella
1 Charlie
2 Lucy
3 Cooper
4 Max
5 Stella
6 Bernie
Name: name, dtype: object

DATA MANIPULATION WITH PANDAS

Subsetting multiple columns
dogs[["breed", "height_cm"]] cols_to_subset = ["breed", "height_cm"]
dogs[cols_to_subset]

breed height_cm
0 Labrador 56 breed height_cm
1 Poodle 43 0 Labrador 56
2 Chow Chow 46 1 Poodle 43
3 Schnauzer 49 2 Chow Chow 46
4 Labrador 59 3 Schnauzer 49
5 Chihuahua 18 4 Labrador 59
6 St. Bernard 77 5 Chihuahua 18
6 St. Bernard 77

DATA MANIPULATION WITH PANDAS

Subsetting rows
dogs["height_cm"] > 50

0 True
1 False
2 False
3 False
4 True
5 False
6 True
Name: height_cm, dtype: bool

DATA MANIPULATION WITH PANDAS

Subsetting rows
dogs[dogs["height_cm"] > 50]

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

Subsetting based on text data
dogs[dogs["breed"] == "Labrador"]

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20

DATA MANIPULATION WITH PANDAS

Subsetting based on dates
dogs[dogs["date_of_birth"] > "2015-01-01"]

name breed color height_cm weight_kg date_of_birth

1 Charlie Poodle Black 43 24 2016-09-16
4 Max Labrador Black 59 29 2017-01-20
5 Stella Chihuahua Tan 18 2 2015-04-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

Subsetting based on multiple conditions
is_lab = dogs["breed"] == "Labrador"
is_brown = dogs["color"] == "Brown"
dogs[is_lab & is_brown]

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 24 2013-07-01

dogs[ (dogs["breed"] == "Labrador") & (dogs["color"] == "Brown") ]

DATA MANIPULATION WITH PANDAS

Subsetting using .isin()
is_black_or_brown = dogs["color"].isin(["Black", "Brown"])
dogs[is_black_or_brown]

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
4 Max Labrador Black 59 29 2017-01-20

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
New columns
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
Adding a new column
dogs["height_m"] = dogs["height_cm"] / 100
print(dogs)

name breed color height_cm weight_kg date_of_birth height_m

0 Bella Labrador Brown 56 24 2013-07-01 0.56
1 Charlie Poodle Black 43 24 2016-09-16 0.43
2 Lucy Chow Chow Brown 46 24 2014-08-25 0.46
3 Cooper Schnauzer Gray 49 17 2011-12-11 0.49
4 Max Labrador Black 59 29 2017-01-20 0.59
5 Stella Chihuahua Tan 18 2 2015-04-20 0.18
6 Bernie St. Bernard White 77 74 2018-02-27 0.77

DATA MANIPULATION WITH PANDAS

Doggy mass index
BMI = weight in kg/(height in m)2

dogs["bmi"] = dogs["weight_kg"] / dogs["height_m"] ** 2

print(dogs.head())

name breed color height_cm weight_kg date_of_birth height_m bmi

0 Bella Labrador Brown 56 24 2013-07-01 0.56 76.530612
1 Charlie Poodle Black 43 24 2016-09-16 0.43 129.799892
2 Lucy Chow Chow Brown 46 24 2014-08-25 0.46 113.421550
3 Cooper Schnauzer Gray 49 17 2011-12-11 0.49 70.803832
4 Max Labrador Black 59 29 2017-01-20 0.59 83.309394

DATA MANIPULATION WITH PANDAS

Multiple manipulations
bmi_lt_100 = dogs[dogs["bmi"] < 100]
bmi_lt_100_height = bmi_lt_100.sort_values("height_cm", ascending=False)
bmi_lt_100_height[["name", "height_cm", "bmi"]]

name height_cm bmi

4 Max 59 83.309394
0 Bella 56 76.530612
3 Cooper 49 70.803832
5 Stella 18 61.728395

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Summary statistics
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Summarizing numerical data
.median() , .mode()
dogs["height_cm"].mean()
.min() , .max()

49.714285714285715 .var() , .std()

.sum()

.quantile()

DATA MANIPULATION WITH PANDAS

Summarizing dates
Oldest dog:

dogs["date_of_birth"].min()

'2011-12-11'

Youngest dog:

dogs["date_of_birth"].max()

'2018-02-27'

DATA MANIPULATION WITH PANDAS

The .agg() method
def pct30(column):
return column.quantile(0.3)

dogs["weight_kg"].agg(pct30)

22.599999999999998

DATA MANIPULATION WITH PANDAS

Summaries on multiple columns
dogs[["weight_kg", "height_cm"]].agg(pct30)

weight_kg 22.6
height_cm 45.4
dtype: float64

DATA MANIPULATION WITH PANDAS

Multiple summaries
def pct40(column):
return column.quantile(0.4)

dogs["weight_kg"].agg([pct30, pct40])

pct30 22.6
pct40 24.0
Name: weight_kg, dtype: float64

DATA MANIPULATION WITH PANDAS

Cumulative sum
dogs["weight_kg"] dogs["weight_kg"].cumsum()

0 24 0 24
1 24 1 48
2 24 2 72
3 17 3 89
4 29 4 118
5 2 5 120
6 74 6 194
Name: weight_kg, dtype: int64 Name: weight_kg, dtype: int64

DATA MANIPULATION WITH PANDAS

Cumulative statistics
.cummax()

.cummin()

.cumprod()

DATA MANIPULATION WITH PANDAS

Walmart
sales.head()

store type dept date weekly_sales is_holiday temp_c fuel_price unemp

0 1 A 1 2010-02-05 24924.50 False 5.73 0.679 8.106
1 1 A 2 2010-02-05 50605.27 False 5.73 0.679 8.106
2 1 A 3 2010-02-05 13740.12 False 5.73 0.679 8.106
3 1 A 4 2010-02-05 39954.04 False 5.73 0.679 8.106
4 1 A 5 2010-02-05 32229.38 False 5.73 0.679 8.106

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Counting
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Avoiding double counting

DATA MANIPULATION WITH PANDAS

Vet visits
print(vet_visits)

date name breed weight_kg

0 2018-09-02 Bella Labrador 24.87
1 2019-06-07 Max Labrador 28.35
2 2018-01-17 Stella Chihuahua 1.51
3 2019-10-19 Lucy Chow Chow 24.07
.. ... ... ... ...
71 2018-01-20 Stella Chihuahua 2.83
72 2019-06-07 Max Chow Chow 24.01
73 2018-08-20 Lucy Chow Chow 24.40
74 2019-04-22 Max Labrador 28.54

DATA MANIPULATION WITH PANDAS

Dropping duplicate names
vet_visits.drop_duplicates(subset="name")

date name breed weight_kg

0 2018-09-02 Bella Labrador 24.87
1 2019-06-07 Max Chow Chow 24.01
2 2019-03-19 Charlie Poodle 24.95
3 2018-01-17 Stella Chihuahua 1.51
4 2019-10-19 Lucy Chow Chow 24.07
7 2019-03-30 Cooper Schnauzer 16.91
10 2019-01-04 Bernie St. Bernard 74.98
(6 2019-06-07 Max Labrador 28.35)

DATA MANIPULATION WITH PANDAS

Dropping duplicate pairs
unique_dogs = vet_visits.drop_duplicates(subset=["name", "breed"])
print(unique_dogs)

date name breed weight_kg

0 2018-09-02 Bella Labrador 24.87
1 2019-03-13 Max Chow Chow 24.13
2 2019-03-19 Charlie Poodle 24.95
3 2018-01-17 Stella Chihuahua 1.51
4 2019-10-19 Lucy Chow Chow 24.07
6 2019-06-07 Max Labrador 28.35
7 2019-03-30 Cooper Schnauzer 16.91
10 2019-01-04 Bernie St. Bernard 74.98

DATA MANIPULATION WITH PANDAS

Easy as 1, 2, 3
unique_dogs["breed"].value_counts() unique_dogs["breed"].value_counts(sort=True)

Labrador 2 Labrador 2
Schnauzer 1 Chow Chow 2
St. Bernard 1 Schnauzer 1
Chow Chow 2 St. Bernard 1
Poodle 1 Poodle 1
Chihuahua 1 Chihuahua 1
Name: breed, dtype: int64 Name: breed, dtype: int64

DATA MANIPULATION WITH PANDAS

Proportions
unique_dogs["breed"].value_counts(normalize=True)

Labrador 0.250
Chow Chow 0.250
Schnauzer 0.125
St. Bernard 0.125
Poodle 0.125
Chihuahua 0.125
Name: breed, dtype: float64

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Grouped summary
statistics
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Summaries by group
dogs[dogs["color"] == "Black"]["weight_kg"].mean()
dogs[dogs["color"] == "Brown"]["weight_kg"].mean()
dogs[dogs["color"] == "White"]["weight_kg"].mean()
dogs[dogs["color"] == "Gray"]["weight_kg"].mean()
dogs[dogs["color"] == "Tan"]["weight_kg"].mean()

26.0
24.0
74.0
17.0
2.0

DATA MANIPULATION WITH PANDAS

Grouped summaries
dogs.groupby("color")["weight_kg"].mean()

color
Black 26.5
Brown 24.0
Gray 17.0
Tan 2.0
White 74.0
Name: weight_kg, dtype: float64

DATA MANIPULATION WITH PANDAS

Multiple grouped summaries
dogs.groupby("color")["weight_kg"].agg([min, max, sum])

min max sum

color
Black 24 29 53
Brown 24 24 48
Gray 17 17 17
Tan 2 2 2
White 74 74 74

DATA MANIPULATION WITH PANDAS

Grouping by multiple variables
dogs.groupby(["color", "breed"])["weight_kg"].mean()

color breed
Black Chow Chow 25
Labrador 29
Poodle 24
Brown Chow Chow 24
Labrador 24
Gray Schnauzer 17
Tan Chihuahua 2
White St. Bernard 74
Name: weight_kg, dtype: int64

DATA MANIPULATION WITH PANDAS

Many groups, many summaries
dogs.groupby(["color", "breed"])[["weight_kg", "height_cm"]].mean()

weight_kg height_cm
color breed
Black Labrador 29 59
Poodle 24 43
Brown Chow Chow 24 46
Labrador 24 56
Gray Schnauzer 17 49
Tan Chihuahua 2 18
White St. Bernard 74 77

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Pivot tables
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Group by to pivot table
dogs.groupby("color")["weight_kg"].mean() dogs.pivot_table(values="weight_kg",
index="color")

color
Black 26 weight_kg
Brown 24 color
Gray 17 Black 26.5
Tan 2 Brown 24.0
White 74 Gray 17.0
Name: weight_kg, dtype: int64 Tan 2.0
White 74.0

DATA MANIPULATION WITH PANDAS

Different statistics
import numpy as np
dogs.pivot_table(values="weight_kg", index="color", aggfunc=np.median)

weight_kg
color
Black 26.5
Brown 24.0
Gray 17.0
Tan 2.0
White 74.0

DATA MANIPULATION WITH PANDAS

Multiple statistics
dogs.pivot_table(values="weight_kg", index="color", aggfunc=[np.mean, np.median])

mean median
weight_kg weight_kg
color
Black 26.5 26.5
Brown 24.0 24.0
Gray 17.0 17.0
Tan 2.0 2.0
White 74.0 74.0

DATA MANIPULATION WITH PANDAS

Pivot on two variables
dogs.groupby(["color", "breed"])["weight_kg"].mean()

dogs.pivot_table(values="weight_kg", index="color", columns="breed")

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard

color
Black NaN NaN 29.0 24.0 NaN NaN
Brown NaN 24.0 24.0 NaN NaN NaN
Gray NaN NaN NaN NaN 17.0 NaN
Tan 2.0 NaN NaN NaN NaN NaN
White NaN NaN NaN NaN NaN 74.0

DATA MANIPULATION WITH PANDAS

Filling missing values in pivot tables
dogs.pivot_table(values="weight_kg", index="color", columns="breed", fill_value=0)

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard

color
Black 0 0 29 24 0 0
Brown 0 24 24 0 0 0
Gray 0 0 0 0 17 0
Tan 2 0 0 0 0 0
White 0 0 0 0 0 74

DATA MANIPULATION WITH PANDAS

Summing with pivot tables
dogs.pivot_table(values="weight_kg", index="color", columns="breed",
fill_value=0, margins=True)

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard All
color
Black 0 0 29 24 0 0 26.500000
Brown 0 24 24 0 0 0 24.000000
Gray 0 0 0 0 17 0 17.000000
Tan 2 0 0 0 0 0 2.000000
White 0 0 0 0 0 74 74.000000
All 2 24 26 24 17 74 27.714286

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Explicit indexes
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
The dog dataset, revisited
print(dogs)

name breed color height_cm weight_kg

0 Bella Labrador Brown 56 25
1 Charlie Poodle Black 43 23
2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Gray 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS

.columns and .index
dogs.columns

Index(['name', 'breed', 'color', 'height_cm', 'weight_kg'], dtype='object')

dogs.index

RangeIndex(start=0, stop=7, step=1)

DATA MANIPULATION WITH PANDAS

Setting a column as the index
dogs_ind = dogs.set_index("name")
print(dogs_ind)

breed color height_cm weight_kg

name
Bella Labrador Brown 56 25
Charlie Poodle Black 43 23
Lucy Chow Chow Brown 46 22
Cooper Schnauzer Grey 49 17
Max Labrador Black 59 29
Stella Chihuahua Tan 18 2
Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS

Removing an index
dogs_ind.reset_index()

name breed color height_cm weight_kg

0 Bella Labrador Brown 56 25
1 Charlie Poodle Black 43 23
2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Grey 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS

Dropping an index
dogs_ind.reset_index(drop=True)

breed color height_cm weight_kg

0 Labrador Brown 56 25
1 Poodle Black 43 23
2 Chow Chow Brown 46 22
3 Schnauzer Grey 49 17
4 Labrador Black 59 29
5 Chihuahua Tan 18 2
6 St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS

Indexes make subsetting simpler
dogs[dogs["name"].isin(["Bella", "Stella"])]

name breed color height_cm weight_kg

0 Bella Labrador Brown 56 25
5 Stella Chihuahua Tan 18 2

dogs_ind.loc[["Bella", "Stella"]]

breed color height_cm weight_kg

name
Bella Labrador Brown 56 25
Stella Chihuahua Tan 18 2

DATA MANIPULATION WITH PANDAS

Index values don't need to be unique
dogs_ind2 = dogs.set_index("breed")
print(dogs_ind2)

name color height_cm weight_kg

breed
Labrador Bella Brown 56 25
Poodle Charlie Black 43 23
Chow Chow Lucy Brown 46 22
Schnauzer Cooper Grey 49 17
Labrador Max Black 59 29
Chihuahua Stella Tan 18 2
St. Bernard Bernie White 77 74

DATA MANIPULATION WITH PANDAS

Subsetting on duplicated index values
dogs_ind2.loc["Labrador"]

name color height_cm weight_kg

breed
Labrador Bella Brown 56 25
Labrador Max Black 59 29

DATA MANIPULATION WITH PANDAS

Multi-level indexes a.k.a. hierarchical indexes
dogs_ind3 = dogs.set_index(["breed", "color"])
print(dogs_ind3)

name height_cm weight_kg

breed color
Labrador Brown Bella 56 25
Poodle Black Charlie 43 23
Chow Chow Brown Lucy 46 22
Schnauzer Grey Cooper 49 17
Labrador Black Max 59 29
Chihuahua Tan Stella 18 2
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Subset the outer level with a list
dogs_ind3.loc[["Labrador", "Chihuahua"]]

name height_cm weight_kg

breed color
Labrador Brown Bella 56 25
Black Max 59 29
Chihuahua Tan Stella 18 2

DATA MANIPULATION WITH PANDAS

Subset inner levels with a list of tuples
dogs_ind3.loc[[("Labrador", "Brown"), ("Chihuahua", "Tan")]]

name height_cm weight_kg

breed color
Labrador Brown Bella 56 25
Chihuahua Tan Stella 18 2

DATA MANIPULATION WITH PANDAS

Sorting by index values
dogs_ind3.sort_index()

name height_cm weight_kg

breed color
Chihuahua Tan Stella 18 2
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Controlling sort_index
dogs_ind3.sort_index(level=["color", "breed"], ascending=[True, False])

name height_cm weight_kg

breed color
Poodle Black Charlie 43 23
Labrador Black Max 59 29
Brown Bella 56 25
Chow Chow Brown Lucy 46 22
Schanuzer Grey Cooper 49 17
Chihuahua Tan Stella 18 2
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Now you have two problems
Index values are just data

Indexes violate "tidy data" principles

You need to learn two syntaxes

DATA MANIPULATION WITH PANDAS

Temperature dataset
date city country avg_temp_c
0 2000-01-01 Abidjan Côte D'Ivoire 27.293
1 2000-02-01 Abidjan Côte D'Ivoire 27.685
2 2000-03-01 Abidjan Côte D'Ivoire 29.061
3 2000-04-01 Abidjan Côte D'Ivoire 28.162
4 2000-05-01 Abidjan Côte D'Ivoire 27.547

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Slicing and
subsetting with .loc
and .iloc
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
Slicing lists
breeds = ["Labrador", "Poodle", breeds[2:5]
"Chow Chow", "Schnauzer",
"Labrador", "Chihuahua",
['Chow Chow', 'Schnauzer', 'Labrador']
"St. Bernard"]

breeds[:3]
['Labrador',
'Poodle',
'Chow Chow', ['Labrador', 'Poodle', 'Chow Chow']
'Schnauzer',
'Labrador', breeds[:]
'Chihuahua',
'St. Bernard']
['Labrador','Poodle','Chow Chow','Schnauzer',
'Labrador','Chihuahua','St. Bernard']

DATA MANIPULATION WITH PANDAS

Sort the index before you slice
dogs_srt = dogs.set_index(["breed", "color"]).sort_index()
print(dogs_srt)

name height_cm weight_kg

breed color
Chihuahua Tan Stella 18 2
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Slicing the outer index level
dogs_srt.loc["Chow Chow":"Poodle"] Full dataset

name height_cm weight_kg name height_cm weight_kg

breed color breed color

Chow Chow Brown Lucy 46 22 Chihuahua Tan Stella 18 2

Labrador Black Max 59 29 Chow Chow Brown Lucy 46 22

Brown Bella 56 25 Labrador Black Max 59 29

Poodle Black Charlie 43 23 Brown Bella 56 25

Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
The nal value "Poodle" is included
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Slicing the inner index levels badly
dogs_srt.loc["Tan":"Grey"] Full dataset

Empty DataFrame name height_cm weight_kg

Columns: [name, height_cm, weight_kg] breed color

Index: [] Chihuahua Tan Stella 18 2

Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Slicing the inner index levels correctly
dogs_srt.loc[ Full dataset
("Labrador", "Brown"):("Schnauzer", "Grey")]
name height_cm weight_kg

name height_cm weight_kg breed color

breed color Chihuahua Tan Stella 18 2

Labrador Brown Bella 56 25 Chow Chow Brown Lucy 46 22

Poodle Black Charlie 43 23 Labrador Black Max 59 29

Schnauzer Grey Cooper 49 17 Brown Bella 56 25

Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Slicing columns
dogs_srt.loc[:, "name":"height_cm"] Full dataset

name height_cm name height_cm weight_kg

breed color breed color

Chihuahua Tan Stella 18 Chihuahua Tan Stella 18 2

Chow Chow Brown Lucy 46 Chow Chow Brown Lucy 46 22

Labrador Black Max 59 Labrador Black Max 59 29

Brown Bella 56 Brown Bella 56 25

Poodle Black Charlie 43 Poodle Black Charlie 43 23

Schnauzer Grey Cooper 49 Schnauzer Grey Cooper 49 17

St. Bernard White Bernie 77 St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Slice twice
dogs_srt.loc[ Full dataset
("Labrador", "Brown"):("Schnauzer", "Grey"),
"name":"height_cm"] name height_cm weight_kg
breed color

name height_cm Chihuahua Tan Stella 18 2

breed color Chow Chow Brown Lucy 46 22

Labrador Brown Bella 56 Labrador Black Max 59 29

Poodle Black Charlie 43 Brown Bella 56 25

Schanuzer Grey Cooper 49 Poodle Black Charlie 43 23

Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS

Dog days
dogs = dogs.set_index("date_of_birth").sort_index()
print(dogs)

name breed color height_cm weight_kg

date_of_birth
2011-12-11 Cooper Schanuzer Grey 49 17
2013-07-01 Bella Labrador Brown 56 25
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23
2017-01-20 Max Labrador Black 59 29
2018-02-27 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS

Slicing by dates
# Get dogs with date_of_birth between 2014-08-25 and 2016-09-16
dogs.loc["2014-08-25":"2016-09-16"]

name breed color height_cm weight_kg

date_of_birth
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23

DATA MANIPULATION WITH PANDAS

Slicing by partial dates
# Get dogs with date_of_birth between 2014-01-01 and 2016-12-31
dogs.loc["2014":"2016"]

name breed color height_cm weight_kg

date_of_birth
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23

DATA MANIPULATION WITH PANDAS

Subsetting by row/column number
print(dogs.iloc[2:5, 1:4]) Full dataset

breed color height_cm name breed color height_cm weight_kg

2 Chow Chow Brown 46 0 Bella Labrador Brown 56 25
3 Schnauzer Grey 49 1 Charlie Poodle Black 43 23
4 Labrador Black 59 2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Grey 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Working with pivot
tables
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
A bigger dog dataset
print(dog_pack)

breed color height_cm weight_kg

0 Boxer Brown 62.64 30.4
1 Poodle Black 46.41 20.4
2 Beagle Brown 36.39 12.4
3 Chihuahua Tan 19.70 1.6
4 Labrador Tan 54.44 36.1
.. ... ... ... ...
87 Boxer Gray 58.13 29.9
88 St. Bernard White 70.13 69.4
89 Poodle Gray 51.30 20.4
90 Beagle White 38.81 8.8
91 Beagle Black 33.40 13.5

DATA MANIPULATION WITH PANDAS

Pivoting the dog pack
dogs_height_by_breed_vs_color = dog_pack.pivot_table(
"height_cm", index="breed", columns="color")
print(dogs_height_by_breed_vs_color)

color Black Brown Gray Tan White

breed
Beagle 34.500000 36.4500 36.313333 35.740000 38.810000
Boxer 57.203333 62.6400 58.280000 62.310000 56.360000
Chihuahua 18.555000 NaN 21.660000 20.096667 17.933333
Chow Chow 51.262500 50.4800 NaN 53.497500 54.413333
Dachshund 21.186667 19.7250 NaN 19.375000 20.660000
Labrador 57.125000 NaN NaN 55.190000 55.310000
Poodle 48.036000 57.1300 56.645000 NaN 44.740000
St. Bernard 63.920000 65.8825 67.640000 68.334000 67.495000

DATA MANIPULATION WITH PANDAS

.loc[] + slicing is a power combo
dogs_height_by_breed_vs_color.loc["Chow Chow":"Poodle"]

color Black Brown Gray Tan White

breed
Chow Chow 51.262500 50.480 NaN 53.4975 54.413333
Dachshund 21.186667 19.725 NaN 19.3750 20.660000
Labrador 57.125000 NaN NaN 55.1900 55.310000
Poodle 48.036000 57.130 56.645 NaN 44.740000

DATA MANIPULATION WITH PANDAS

The axis argument
dogs_height_by_breed_vs_color.mean(axis="index")

color
Black 43.973563
Brown 48.717917
Gray 48.107667
Tan 44.934738
White 44.465208
dtype: float64

DATA MANIPULATION WITH PANDAS

Calculating summary stats across columns
dogs_height_by_breed_vs_color.mean(axis="columns")

breed
Beagle 36.362667
Boxer 59.358667
Chihuahua 19.561250
Chow Chow 52.413333
Dachshund 20.236667
Labrador 55.875000
Poodle 51.637750
St. Bernard 66.654300
dtype: float64

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Visualizing your
data
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Histograms
import matplotlib.pyplot as plt

dog_pack["height_cm"].hist()

plt.show()

DATA MANIPULATION WITH PANDAS

Histograms
dog_pack["height_cm"].hist(bins=20) dog_pack["height_cm"].hist(bins=5)
plt.show() plt.show()

DATA MANIPULATION WITH PANDAS

Bar plots
avg_weight_by_breed = dog_pack.groupby("breed")["weight_kg"].mean()
print(avg_weight_by_breed)

breed
Beagle 10.636364
Boxer 30.620000
Chihuahua 1.491667
Chow Chow 22.535714
Dachshund 9.975000
Labrador 31.850000
Poodle 20.400000
St. Bernard 71.576923
Name: weight_kg, dtype: float64

DATA MANIPULATION WITH PANDAS

Bar plots
avg_weight_by_breed.plot(kind="bar") avg_weight_by_breed.plot(kind="bar",
plt.show() title="Mean Weight by Dog Breed")
plt.show()

DATA MANIPULATION WITH PANDAS

Line plots
sully.head() sully.plot(x="date",
y="weight_kg",

date weight_kg kind="line")

0 2019-01-31 36.1 plt.show()

1 2019-02-28 35.3
2 2019-03-31 32.0
3 2019-04-30 32.9
4 2019-05-31 32.0

DATA MANIPULATION WITH PANDAS

Rotating axis labels
sully.plot(x="date", y="weight_kg", kind="line", rot=45)
plt.show()

DATA MANIPULATION WITH PANDAS

Scatter plots
dog_pack.plot(x="height_cm", y="weight_kg", kind="scatter")
plt.show()

DATA MANIPULATION WITH PANDAS

Layering plots
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()
plt.show()

DATA MANIPULATION WITH PANDAS

Add a legend
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()
plt.legend(["F", "M"])
plt.show()

DATA MANIPULATION WITH PANDAS

Transparency
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist(alpha=0.7)
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist(alpha=0.7)
plt.legend(["F", "M"])
plt.show()

DATA MANIPULATION WITH PANDAS

Avocados
print(avocados)

date type year avg_price size nb_sold

0 2015-12-27 conventional 2015 0.95 small 9626901.09
1 2015-12-20 conventional 2015 0.98 small 8710021.76
2 2015-12-13 conventional 2015 0.93 small 9855053.66
... ... ... ... ... ... ...
1011 2018-01-21 organic 2018 1.63 extra_large 1490.02
1012 2018-01-14 organic 2018 1.59 extra_large 1580.01
1013 2018-01-07 organic 2018 1.51 extra_large 1289.07

[1014 rows x 6 columns]

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Missing values
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
What's a missing value?
Name Breed Color Height (cm) Weight (kg) Date of Birth
Bella Labrador Brown 56 25 2013-07-01
Charlie Poodle Black 43 23 2016-09-16
Lucy Chow Chow Brown 46 22 2014-08-25
Cooper Schnauzer Gray 49 17 2011-12-11
Max Labrador Black 59 29 2017-01-20
Stella Chihuahua Tan 18 2 2015-04-20
Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

What's a missing value?
Name Breed Color Height (cm) Weight (kg) Date of Birth
Bella Labrador Brown 56 ? 2013-07-01
Charlie Poodle Black 43 23 2016-09-16
Lucy Chow Chow Brown 46 22 2014-08-25
Cooper Schnauzer Gray 49 ? 2011-12-11
Max Labrador Black 59 29 2017-01-20
Stella Chihuahua Tan 18 2 2015-04-20
Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS

Missing values in pandas DataFrames
print(dogs)

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 NaN 2013-07-01
1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
3 Cooper Schnauzer Gray 49 NaN 2011-12-11
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27

DATA MANIPULATION WITH PANDAS

Detecting missing values
dogs.isna()

name breed color height_cm weight_kg date_of_birth

0 False False False False True False
1 False False False False False False
2 False False False False False False
3 False False False False True False
4 False False False False False False
5 False False False False False False
6 False False False False False False

DATA MANIPULATION WITH PANDAS

Detecting any missing values
dogs.isna().any()

name False
breed False
color False
height_cm False
weight_kg True
date_of_birth False
dtype: bool

DATA MANIPULATION WITH PANDAS

Counting missing values
dogs.isna().sum()

name 0
breed 0
color 0
height_cm 0
weight_kg 2
date_of_birth 0
dtype: int64

DATA MANIPULATION WITH PANDAS

Plotting missing values
import matplotlib.pyplot as plt
dogs.isna().sum().plot(kind="bar")
plt.show()

DATA MANIPULATION WITH PANDAS

Removing missing values
dogs.dropna()

name breed color height_cm weight_kg date_of_birth

1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27

DATA MANIPULATION WITH PANDAS

Replacing missing values
dogs.fillna(0)

name breed color height_cm weight_kg date_of_birth

0 Bella Labrador Brown 56 0.0 2013-07-01
1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
3 Cooper Schnauzer Gray 49 0.0 2011-12-11
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Creating
DataFrames
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Dictionaries
my_dict = { my_dict = {
"key1": value1, "title": "Charlotte's Web",
"key2": value2, "author": "E.B. White",
"key3": value3 "published": 1952
} }

my_dict["key1"] my_dict["title"]

value1 Charlotte's Web

DATA MANIPULATION WITH PANDAS

Creating DataFrames
From a list of dictionaries From a dictionary of lists

Constructed row by row Constructed column by column

DATA MANIPULATION WITH PANDAS

List of dictionaries - by row
name breed height (cm) weight (kg) date of birth
Ginger Dachshund 22 10 2019-03-14
Scout Dalmatian 59 25 2019-05-09

list_of_dicts = [
{"name": "Ginger", "breed": "Dachshund", "height_cm": 22,
"weight_kg": 10, "date_of_birth": "2019-03-14"},
{"name": "Scout", "breed": "Dalmatian", "height_cm": 59,
"weight_kg": 25, "date_of_birth": "2019-05-09"}
]

DATA MANIPULATION WITH PANDAS

List of dictionaries - by row
name breed height (cm) weight (kg) date of birth
Ginger Dachshund 22 10 2019-03-14
Scout Dalmatian 59 25 2019-05-09

new_dogs = pd.DataFrame(list_of_dicts)
print(new_dogs)

name breed height_cm weight_kg date_of_birth

0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09

DATA MANIPULATION WITH PANDAS

Dictionary of lists - by column
date dict_of_lists = {
name breed height weight of
birth "name": ["Ginger", "Scout"],
"breed": ["Dachshund", "Dalmatian"],
2019-
Ginger Dachshund 22 10
03-14 "height_cm": [22, 59],

2019- "weight_kg": [10, 25],

Scout Dalmatian 59 25 05- "date_of_birth": ["2019-03-14",
09
"2019-05-09"]
}
new_dogs = pd.DataFrame(dict_of_lists)
Key = column name

Value = list of column values

DATA MANIPULATION WITH PANDAS

Dictionary of lists - by column
name breed height (cm) weight (kg) date of birth
Ginger Dachshund 22 10 2019-03-14
Scout Dalmatian 59 25 2019-05-09

print(new_dogs)

name breed height_cm weight_kg date_of_birth

0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Reading and writing
CSVs
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
What's a CSV file?
CSV = comma-separated values

Designed for DataFrame-like data

Most database and spreadsheet programs can use them or create them

DATA MANIPULATION WITH PANDAS

Example CSV file
new_dogs.csv

name,breed,height_cm,weight_kg,d_o_b
Ginger,Dachshund,22,10,2019-03-14
Scout,Dalmatian,59,25,2019-05-09

DATA MANIPULATION WITH PANDAS

CSV to DataFrame
import pandas as pd
new_dogs = pd.read_csv("new_dogs.csv")
print(new_dogs)

name breed height_cm weight_kg date_of_birth

0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09

DATA MANIPULATION WITH PANDAS

DataFrame manipulation
new_dogs["bmi"] = new_dogs["weight_kg"] / (new_dogs["height_cm"] / 100) ** 2
print(new_dogs)

name breed height_cm weight_kg date_of_birth bmi

0 Ginger Dachshund 22 10 2019-03-14 206.611570
1 Scout Dalmatian 59 25 2019-05-09 71.818443

DATA MANIPULATION WITH PANDAS

DataFrame to CSV
new_dogs.to_csv("new_dogs_with_bmi.csv")

new_dogs_with_bmi.csv

name,breed,height_cm,weight_kg,d_o_b,bmi
Ginger,Dachshund,22,10,2019-03-14,206.611570
Scout,Dalmatian,59,25,2019-05-09,71.818443

DATA MANIPULATION WITH PANDAS

Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Wrap-up
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Recap
Chapter 1 Chapter 3
Subse ing and sorting Indexing

Adding new columns Slicing

Chapter 2 Chapter 4
Aggregating and grouping Visualizations

Summary statistics Reading and writing CSVs

DATA MANIPULATION WITH PANDAS

More to learn
Joining Data with pandas

Streamlined Data Ingestion with pandas

Analyzing Police Activity with pandas

Analyzing Marketing Campaigns with

pandas

DATA MANIPULATION WITH PANDAS

Congratulations!
D ATA M A N I P U L AT I O N W I T H PA N D A S

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Pythons Basics
No ratings yet
Pythons Basics
104 pages
Bluestone Puppy Application
No ratings yet
Bluestone Puppy Application
3 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
Google People and Ai Guidebook-Workshop-Slides
No ratings yet
Google People and Ai Guidebook-Workshop-Slides
126 pages
Amstaff
No ratings yet
Amstaff
1 page
Slide Deck Data Analysis With Databricks
No ratings yet
Slide Deck Data Analysis With Databricks
115 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Structured Streaming
No ratings yet
Structured Streaming
12 pages
What Are DBT Sources
No ratings yet
What Are DBT Sources
109 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
azure comapny wise question
No ratings yet
azure comapny wise question
68 pages
Pyspark - DataFrame Window Functions
No ratings yet
Pyspark - DataFrame Window Functions
3 pages
Spark
No ratings yet
Spark
96 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
_ Databricks & PySpark learning day-10
No ratings yet
_ Databricks & PySpark learning day-10
4 pages
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark
No ratings yet
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark
51 pages
ENGG1003_10_PythonApplicationsOnJupiter
No ratings yet
ENGG1003_10_PythonApplicationsOnJupiter
30 pages
6.3. data_structure_pyspark.ipynb - Exercise
No ratings yet
6.3. data_structure_pyspark.ipynb - Exercise
6 pages
Data Science ML Full Stack 2022 GitHub
No ratings yet
Data Science ML Full Stack 2022 GitHub
9 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
1 - Optimize Amazon SageMaker Deployment Strategies
No ratings yet
1 - Optimize Amazon SageMaker Deployment Strategies
45 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
LLM With Knowledge Graphs
No ratings yet
LLM With Knowledge Graphs
40 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
20191216134846D3338 - COMP6579 Session 10 - Big Data Analytics (Apache Spark - SparkML)
No ratings yet
20191216134846D3338 - COMP6579 Session 10 - Big Data Analytics (Apache Spark - SparkML)
42 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
DBT - Commands
No ratings yet
DBT - Commands
2 pages
LS1.1 - V6 Generalized Architecture of Big Data Systems
No ratings yet
LS1.1 - V6 Generalized Architecture of Big Data Systems
8 pages
SCD in Databricks
No ratings yet
SCD in Databricks
16 pages
Azure Data Engineering Course
No ratings yet
Azure Data Engineering Course
20 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
13 pages
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
No ratings yet
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
23 pages
Pyspark Cashing & Persisting - Complete Guide
No ratings yet
Pyspark Cashing & Persisting - Complete Guide
3 pages
ETL vs. ELT: Frictionless Data Integration - Diyotta
No ratings yet
ETL vs. ELT: Frictionless Data Integration - Diyotta
3 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Hadoop (Big Data) : Skills Gained
No ratings yet
Hadoop (Big Data) : Skills Gained
8 pages
Databricks Certified Data Engineer Associate Exam Guide
No ratings yet
Databricks Certified Data Engineer Associate Exam Guide
7 pages
Big Data Analytics
No ratings yet
Big Data Analytics
134 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
No ratings yet
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
11 pages
200+ Python Exercises For Beginners Solve Coding Challenges
No ratings yet
200+ Python Exercises For Beginners Solve Coding Challenges
8 pages
Big Data Tools 2 - Apache Spark With PySpark
No ratings yet
Big Data Tools 2 - Apache Spark With PySpark
33 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Full Download Learning Informatica PowerCenter 10 x enterprise data warehousing and intelligent data centers Second Edition. Edition Rahul Malewar PDF DOCX
100% (5)
Full Download Learning Informatica PowerCenter 10 x enterprise data warehousing and intelligent data centers Second Edition. Edition Rahul Malewar PDF DOCX
65 pages
Aws Three Practical Use Cases With Databricks Ebook v5 101221
No ratings yet
Aws Three Practical Use Cases With Databricks Ebook v5 101221
34 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Course12 2 PDF
No ratings yet
Course12 2 PDF
36 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
2018 02 08 Whats New in Apache Spark 2 180213220045
No ratings yet
2018 02 08 Whats New in Apache Spark 2 180213220045
57 pages
LangChain_Academy_-_Introduction_to_LangGraph_-_Motivation
No ratings yet
LangChain_Academy_-_Introduction_to_LangGraph_-_Motivation
17 pages
Performance Tuning Spark UI
No ratings yet
Performance Tuning Spark UI
37 pages
Low Level Design
No ratings yet
Low Level Design
23 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
Databricks Widgets
No ratings yet
Databricks Widgets
13 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Dog - Wikipedia: Dog Dog Dog Dog's
No ratings yet
Dog - Wikipedia: Dog Dog Dog Dog's
4 pages
Kinds of Sentences
100% (2)
Kinds of Sentences
3 pages
Shepherd Club Dog Show, Delhi
No ratings yet
Shepherd Club Dog Show, Delhi
1 page
(Ebook) Maran Illustrated Dog Training by maranGraphics Development Group ISBN 9781592008582, 1592008585 2024 scribd download
No ratings yet
(Ebook) Maran Illustrated Dog Training by maranGraphics Development Group ISBN 9781592008582, 1592008585 2024 scribd download
82 pages
All About Golden Retrievers
No ratings yet
All About Golden Retrievers
10 pages
Choosing A Mixed Breed Dog
No ratings yet
Choosing A Mixed Breed Dog
2 pages
Top 10 Cutest Dog Breeds in The World
No ratings yet
Top 10 Cutest Dog Breeds in The World
13 pages
The Road of Paws of A Husky
No ratings yet
The Road of Paws of A Husky
11 pages
Test Data
No ratings yet
Test Data
1,503 pages
The Complete Dog Breed Book New Edition DK Definitive Pet Breed Guides Dk download
No ratings yet
The Complete Dog Breed Book New Edition DK Definitive Pet Breed Guides Dk download
42 pages
Daftar Harga Pasir 2023
No ratings yet
Daftar Harga Pasir 2023
4 pages
Statistics C.A.C.I.B. II Banja Luka Dog Show 2019
No ratings yet
Statistics C.A.C.I.B. II Banja Luka Dog Show 2019
9 pages
Wyllow Behavioral Study
No ratings yet
Wyllow Behavioral Study
5 pages
Dog Breeds
No ratings yet
Dog Breeds
12 pages
Adopt, Don't Shop
100% (1)
Adopt, Don't Shop
6 pages
Top 7 Dog Breeds For Police Work
100% (2)
Top 7 Dog Breeds For Police Work
15 pages
Dog Attack Deaths and Maimings: A Critique September 1982 - December 22 2009
100% (1)
Dog Attack Deaths and Maimings: A Critique September 1982 - December 22 2009
13 pages
Bulldog Francés Papel
No ratings yet
Bulldog Francés Papel
3 pages
History of The German Shepherd
No ratings yet
History of The German Shepherd
3 pages
The Complete Owner's Guide - CavalierKingCharlesSpaniel
100% (1)
The Complete Owner's Guide - CavalierKingCharlesSpaniel
120 pages
breeds of dogs
No ratings yet
breeds of dogs
10 pages
A Comprehensive Natural History of British Dogs
100% (2)
A Comprehensive Natural History of British Dogs
168 pages
Expert Paper
No ratings yet
Expert Paper
11 pages
AUSSIEDOODLES
No ratings yet
AUSSIEDOODLES
69 pages
Boerboel Dogs & Puppies in Ghana For Sale Prices On Jiji - Com.gh
No ratings yet
Boerboel Dogs & Puppies in Ghana For Sale Prices On Jiji - Com.gh
1 page
Best of Dog Breeds
No ratings yet
Best of Dog Breeds
6 pages
Urinary Incontinence in Male Dogs Under Primary Veterinary Care in England: Prevalence and Risk Factors
No ratings yet
Urinary Incontinence in Male Dogs Under Primary Veterinary Care in England: Prevalence and Risk Factors
10 pages
Embark Report
No ratings yet
Embark Report
33 pages