0% found this document useful (0 votes)
255 views

Data Manipulation With Pandas

The document introduces pandas and DataFrames for data manipulation. It discusses how pandas allows working with tabular, rectangular data through DataFrames. It provides various examples of exploring, sorting, filtering and selecting subsets of data from a DataFrame of dog information. The key methods demonstrated include .head() to view the top rows, .info() for metadata, .describe() for summaries, and slicing/indexing to subset rows and columns.

Uploaded by

Zhuo Yang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views

Data Manipulation With Pandas

The document introduces pandas and DataFrames for data manipulation. It discusses how pandas allows working with tabular, rectangular data through DataFrames. It provides various examples of exploring, sorting, filtering and selecting subsets of data from a DataFrame of dog information. The key methods demonstrated include .head() to view the top rows, .info() for metadata, .describe() for summaries, and slicing/indexing to subset rows and columns.

Uploaded by

Zhuo Yang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 147

Introducing

DataFrames
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
What's the point of pandas?
Data Manipulation skill track

Data Visualization skill track

DATA MANIPULATION WITH PANDAS


Course outline
Chapter 1: DataFrames Chapter 3: Slicing and Indexing Data
Sorting and subse ing Subse ing using slicing

Creating new columns Indexes and subse ing using indexes

Chapter 2: Aggregating Data Chapter 4: Creating and Visualizing Data


Summary statistics Plo ing

Counting Handling missing data

Grouped summary statistics Reading data into a DataFrame

DATA MANIPULATION WITH PANDAS


pandas is built on NumPy and Matplotlib

DATA MANIPULATION WITH PANDAS


pandas is popular

1 h ps://pypistats.org/packages/pandas

DATA MANIPULATION WITH PANDAS


Rectangular data
Name Breed Color Height (cm) Weight (kg) Date of Birth
Bella Labrador Brown 56 25 2013-07-01
Charlie Poodle Black 43 23 2016-09-16
Lucy Chow Chow Brown 46 22 2014-08-25
Cooper Schnauzer Gray 49 17 2011-12-11
Max Labrador Black 59 29 2017-01-20
Stella Chihuahua Tan 18 2 2015-04-20
Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


pandas DataFrames
print(dogs)

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
4 Max Labrador Black 59 29 2017-01-20
5 Stella Chihuahua Tan 18 2 2015-04-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


Exploring a DataFrame: .head()
dogs.head()

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
4 Max Labrador Black 59 29 2017-01-20

DATA MANIPULATION WITH PANDAS


Exploring a DataFrame: .info()
dogs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 name 7 non-null object
1 breed 7 non-null object
2 color 7 non-null object
3 height_cm 7 non-null int64
4 weight_kg 7 non-null int64
5 date_of_birth 7 non-null object
dtypes: int64(2), object(4)
memory usage: 464.0+ bytes

DATA MANIPULATION WITH PANDAS


Exploring a DataFrame: .shape
dogs.shape

(7, 6)

DATA MANIPULATION WITH PANDAS


Exploring a DataFrame: .describe()
dogs.describe()

height_cm weight_kg
count 7.000000 7.000000
mean 49.714286 27.428571
std 17.960274 22.292429
min 18.000000 2.000000
25% 44.500000 19.500000
50% 49.000000 23.000000
75% 57.500000 27.000000
max 77.000000 74.000000

DATA MANIPULATION WITH PANDAS


Components of a DataFrame: .values
dogs.values

array([['Bella', 'Labrador', 'Brown', 56, 24, '2013-07-01'],


['Charlie', 'Poodle', 'Black', 43, 24, '2016-09-16'],
['Lucy', 'Chow Chow', 'Brown', 46, 24, '2014-08-25'],
['Cooper', 'Schnauzer', 'Gray', 49, 17, '2011-12-11'],
['Max', 'Labrador', 'Black', 59, 29, '2017-01-20'],
['Stella', 'Chihuahua', 'Tan', 18, 2, '2015-04-20'],
['Bernie', 'St. Bernard', 'White', 77, 74, '2018-02-27']],
dtype=object)

DATA MANIPULATION WITH PANDAS


Components of a DataFrame: .columns and .index
dogs.columns

Index(['name', 'breed', 'color', 'height_cm', 'weight_kg', 'date_of_birth'],


dtype='object')

dogs.index

RangeIndex(start=0, stop=7, step=1)

DATA MANIPULATION WITH PANDAS


pandas Philosophy
There should be one -- and preferably only one -- obvious way to do it.

- The Zen of Python by Tim Peters, Item 13

1 h ps://www.python.org/dev/peps/pep-0020/

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Sorting and
subsetting
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
Sorting
dogs.sort_values("weight_kg")

name breed color height_cm weight_kg date_of_birth


5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


Sorting in descending order
dogs.sort_values("weight_kg", ascending=False)

name breed color height_cm weight_kg date_of_birth


6 Bernie St. Bernard White 77 74 2018-02-27
4 Max Labrador Black 59 29 2017-01-20
0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
3 Cooper Schnauzer Gray 49 17 2011-12-11
5 Stella Chihuahua Tan 18 2 2015-04-20

DATA MANIPULATION WITH PANDAS


Sorting by multiple variables
dogs.sort_values(["weight_kg", "height_cm"])

name breed color height_cm weight_kg date_of_birth


5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


Sorting by multiple variables
dogs.sort_values(["weight_kg", "height_cm"], ascending=[True, False])

name breed color height_cm weight_kg date_of_birth


5 Stella Chihuahua Tan 18 2 2015-04-20
3 Cooper Schnauzer Gray 49 17 2011-12-11
0 Bella Labrador Brown 56 24 2013-07-01
2 Lucy Chow Chow Brown 46 24 2014-08-25
1 Charlie Poodle Black 43 24 2016-09-16
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


Subsetting columns
dogs["name"]

0 Bella
1 Charlie
2 Lucy
3 Cooper
4 Max
5 Stella
6 Bernie
Name: name, dtype: object

DATA MANIPULATION WITH PANDAS


Subsetting multiple columns
dogs[["breed", "height_cm"]] cols_to_subset = ["breed", "height_cm"]
dogs[cols_to_subset]

breed height_cm
0 Labrador 56 breed height_cm
1 Poodle 43 0 Labrador 56
2 Chow Chow 46 1 Poodle 43
3 Schnauzer 49 2 Chow Chow 46
4 Labrador 59 3 Schnauzer 49
5 Chihuahua 18 4 Labrador 59
6 St. Bernard 77 5 Chihuahua 18
6 St. Bernard 77

DATA MANIPULATION WITH PANDAS


Subsetting rows
dogs["height_cm"] > 50

0 True
1 False
2 False
3 False
4 True
5 False
6 True
Name: height_cm, dtype: bool

DATA MANIPULATION WITH PANDAS


Subsetting rows
dogs[dogs["height_cm"] > 50]

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


Subsetting based on text data
dogs[dogs["breed"] == "Labrador"]

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
4 Max Labrador Black 59 29 2017-01-20

DATA MANIPULATION WITH PANDAS


Subsetting based on dates
dogs[dogs["date_of_birth"] > "2015-01-01"]

name breed color height_cm weight_kg date_of_birth


1 Charlie Poodle Black 43 24 2016-09-16
4 Max Labrador Black 59 29 2017-01-20
5 Stella Chihuahua Tan 18 2 2015-04-20
6 Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


Subsetting based on multiple conditions
is_lab = dogs["breed"] == "Labrador"
is_brown = dogs["color"] == "Brown"
dogs[is_lab & is_brown]

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01

dogs[ (dogs["breed"] == "Labrador") & (dogs["color"] == "Brown") ]

DATA MANIPULATION WITH PANDAS


Subsetting using .isin()
is_black_or_brown = dogs["color"].isin(["Black", "Brown"])
dogs[is_black_or_brown]

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 24 2013-07-01
1 Charlie Poodle Black 43 24 2016-09-16
2 Lucy Chow Chow Brown 46 24 2014-08-25
4 Max Labrador Black 59 29 2017-01-20

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
New columns
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
Adding a new column
dogs["height_m"] = dogs["height_cm"] / 100
print(dogs)

name breed color height_cm weight_kg date_of_birth height_m


0 Bella Labrador Brown 56 24 2013-07-01 0.56
1 Charlie Poodle Black 43 24 2016-09-16 0.43
2 Lucy Chow Chow Brown 46 24 2014-08-25 0.46
3 Cooper Schnauzer Gray 49 17 2011-12-11 0.49
4 Max Labrador Black 59 29 2017-01-20 0.59
5 Stella Chihuahua Tan 18 2 2015-04-20 0.18
6 Bernie St. Bernard White 77 74 2018-02-27 0.77

DATA MANIPULATION WITH PANDAS


Doggy mass index
BMI = weight in kg/(height in m)2

dogs["bmi"] = dogs["weight_kg"] / dogs["height_m"] ** 2


print(dogs.head())

name breed color height_cm weight_kg date_of_birth height_m bmi


0 Bella Labrador Brown 56 24 2013-07-01 0.56 76.530612
1 Charlie Poodle Black 43 24 2016-09-16 0.43 129.799892
2 Lucy Chow Chow Brown 46 24 2014-08-25 0.46 113.421550
3 Cooper Schnauzer Gray 49 17 2011-12-11 0.49 70.803832
4 Max Labrador Black 59 29 2017-01-20 0.59 83.309394

DATA MANIPULATION WITH PANDAS


Multiple manipulations
bmi_lt_100 = dogs[dogs["bmi"] < 100]
bmi_lt_100_height = bmi_lt_100.sort_values("height_cm", ascending=False)
bmi_lt_100_height[["name", "height_cm", "bmi"]]

name height_cm bmi


4 Max 59 83.309394
0 Bella 56 76.530612
3 Cooper 49 70.803832
5 Stella 18 61.728395

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Summary statistics
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Summarizing numerical data
.median() , .mode()
dogs["height_cm"].mean()
.min() , .max()

49.714285714285715 .var() , .std()

.sum()

.quantile()

DATA MANIPULATION WITH PANDAS


Summarizing dates
Oldest dog:

dogs["date_of_birth"].min()

'2011-12-11'

Youngest dog:

dogs["date_of_birth"].max()

'2018-02-27'

DATA MANIPULATION WITH PANDAS


The .agg() method
def pct30(column):
return column.quantile(0.3)

dogs["weight_kg"].agg(pct30)

22.599999999999998

DATA MANIPULATION WITH PANDAS


Summaries on multiple columns
dogs[["weight_kg", "height_cm"]].agg(pct30)

weight_kg 22.6
height_cm 45.4
dtype: float64

DATA MANIPULATION WITH PANDAS


Multiple summaries
def pct40(column):
return column.quantile(0.4)

dogs["weight_kg"].agg([pct30, pct40])

pct30 22.6
pct40 24.0
Name: weight_kg, dtype: float64

DATA MANIPULATION WITH PANDAS


Cumulative sum
dogs["weight_kg"] dogs["weight_kg"].cumsum()

0 24 0 24
1 24 1 48
2 24 2 72
3 17 3 89
4 29 4 118
5 2 5 120
6 74 6 194
Name: weight_kg, dtype: int64 Name: weight_kg, dtype: int64

DATA MANIPULATION WITH PANDAS


Cumulative statistics
.cummax()

.cummin()

.cumprod()

DATA MANIPULATION WITH PANDAS


Walmart
sales.head()

store type dept date weekly_sales is_holiday temp_c fuel_price unemp


0 1 A 1 2010-02-05 24924.50 False 5.73 0.679 8.106
1 1 A 2 2010-02-05 50605.27 False 5.73 0.679 8.106
2 1 A 3 2010-02-05 13740.12 False 5.73 0.679 8.106
3 1 A 4 2010-02-05 39954.04 False 5.73 0.679 8.106
4 1 A 5 2010-02-05 32229.38 False 5.73 0.679 8.106

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Counting
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Avoiding double counting

DATA MANIPULATION WITH PANDAS


Vet visits
print(vet_visits)

date name breed weight_kg


0 2018-09-02 Bella Labrador 24.87
1 2019-06-07 Max Labrador 28.35
2 2018-01-17 Stella Chihuahua 1.51
3 2019-10-19 Lucy Chow Chow 24.07
.. ... ... ... ...
71 2018-01-20 Stella Chihuahua 2.83
72 2019-06-07 Max Chow Chow 24.01
73 2018-08-20 Lucy Chow Chow 24.40
74 2019-04-22 Max Labrador 28.54

DATA MANIPULATION WITH PANDAS


Dropping duplicate names
vet_visits.drop_duplicates(subset="name")

date name breed weight_kg


0 2018-09-02 Bella Labrador 24.87
1 2019-06-07 Max Chow Chow 24.01
2 2019-03-19 Charlie Poodle 24.95
3 2018-01-17 Stella Chihuahua 1.51
4 2019-10-19 Lucy Chow Chow 24.07
7 2019-03-30 Cooper Schnauzer 16.91
10 2019-01-04 Bernie St. Bernard 74.98
(6 2019-06-07 Max Labrador 28.35)

DATA MANIPULATION WITH PANDAS


Dropping duplicate pairs
unique_dogs = vet_visits.drop_duplicates(subset=["name", "breed"])
print(unique_dogs)

date name breed weight_kg


0 2018-09-02 Bella Labrador 24.87
1 2019-03-13 Max Chow Chow 24.13
2 2019-03-19 Charlie Poodle 24.95
3 2018-01-17 Stella Chihuahua 1.51
4 2019-10-19 Lucy Chow Chow 24.07
6 2019-06-07 Max Labrador 28.35
7 2019-03-30 Cooper Schnauzer 16.91
10 2019-01-04 Bernie St. Bernard 74.98

DATA MANIPULATION WITH PANDAS


Easy as 1, 2, 3
unique_dogs["breed"].value_counts() unique_dogs["breed"].value_counts(sort=True)

Labrador 2 Labrador 2
Schnauzer 1 Chow Chow 2
St. Bernard 1 Schnauzer 1
Chow Chow 2 St. Bernard 1
Poodle 1 Poodle 1
Chihuahua 1 Chihuahua 1
Name: breed, dtype: int64 Name: breed, dtype: int64

DATA MANIPULATION WITH PANDAS


Proportions
unique_dogs["breed"].value_counts(normalize=True)

Labrador 0.250
Chow Chow 0.250
Schnauzer 0.125
St. Bernard 0.125
Poodle 0.125
Chihuahua 0.125
Name: breed, dtype: float64

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Grouped summary
statistics
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Summaries by group
dogs[dogs["color"] == "Black"]["weight_kg"].mean()
dogs[dogs["color"] == "Brown"]["weight_kg"].mean()
dogs[dogs["color"] == "White"]["weight_kg"].mean()
dogs[dogs["color"] == "Gray"]["weight_kg"].mean()
dogs[dogs["color"] == "Tan"]["weight_kg"].mean()

26.0
24.0
74.0
17.0
2.0

DATA MANIPULATION WITH PANDAS


Grouped summaries
dogs.groupby("color")["weight_kg"].mean()

color
Black 26.5
Brown 24.0
Gray 17.0
Tan 2.0
White 74.0
Name: weight_kg, dtype: float64

DATA MANIPULATION WITH PANDAS


Multiple grouped summaries
dogs.groupby("color")["weight_kg"].agg([min, max, sum])

min max sum


color
Black 24 29 53
Brown 24 24 48
Gray 17 17 17
Tan 2 2 2
White 74 74 74

DATA MANIPULATION WITH PANDAS


Grouping by multiple variables
dogs.groupby(["color", "breed"])["weight_kg"].mean()

color breed
Black Chow Chow 25
Labrador 29
Poodle 24
Brown Chow Chow 24
Labrador 24
Gray Schnauzer 17
Tan Chihuahua 2
White St. Bernard 74
Name: weight_kg, dtype: int64

DATA MANIPULATION WITH PANDAS


Many groups, many summaries
dogs.groupby(["color", "breed"])[["weight_kg", "height_cm"]].mean()

weight_kg height_cm
color breed
Black Labrador 29 59
Poodle 24 43
Brown Chow Chow 24 46
Labrador 24 56
Gray Schnauzer 17 49
Tan Chihuahua 2 18
White St. Bernard 74 77

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Pivot tables
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Group by to pivot table
dogs.groupby("color")["weight_kg"].mean() dogs.pivot_table(values="weight_kg",
index="color")

color
Black 26 weight_kg
Brown 24 color
Gray 17 Black 26.5
Tan 2 Brown 24.0
White 74 Gray 17.0
Name: weight_kg, dtype: int64 Tan 2.0
White 74.0

DATA MANIPULATION WITH PANDAS


Different statistics
import numpy as np
dogs.pivot_table(values="weight_kg", index="color", aggfunc=np.median)

weight_kg
color
Black 26.5
Brown 24.0
Gray 17.0
Tan 2.0
White 74.0

DATA MANIPULATION WITH PANDAS


Multiple statistics
dogs.pivot_table(values="weight_kg", index="color", aggfunc=[np.mean, np.median])

mean median
weight_kg weight_kg
color
Black 26.5 26.5
Brown 24.0 24.0
Gray 17.0 17.0
Tan 2.0 2.0
White 74.0 74.0

DATA MANIPULATION WITH PANDAS


Pivot on two variables
dogs.groupby(["color", "breed"])["weight_kg"].mean()

dogs.pivot_table(values="weight_kg", index="color", columns="breed")

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard


color
Black NaN NaN 29.0 24.0 NaN NaN
Brown NaN 24.0 24.0 NaN NaN NaN
Gray NaN NaN NaN NaN 17.0 NaN
Tan 2.0 NaN NaN NaN NaN NaN
White NaN NaN NaN NaN NaN 74.0

DATA MANIPULATION WITH PANDAS


Filling missing values in pivot tables
dogs.pivot_table(values="weight_kg", index="color", columns="breed", fill_value=0)

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard


color
Black 0 0 29 24 0 0
Brown 0 24 24 0 0 0
Gray 0 0 0 0 17 0
Tan 2 0 0 0 0 0
White 0 0 0 0 0 74

DATA MANIPULATION WITH PANDAS


Summing with pivot tables
dogs.pivot_table(values="weight_kg", index="color", columns="breed",
fill_value=0, margins=True)

breed Chihuahua Chow Chow Labrador Poodle Schnauzer St. Bernard All
color
Black 0 0 29 24 0 0 26.500000
Brown 0 24 24 0 0 0 24.000000
Gray 0 0 0 0 17 0 17.000000
Tan 2 0 0 0 0 0 2.000000
White 0 0 0 0 0 74 74.000000
All 2 24 26 24 17 74 27.714286

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Explicit indexes
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
The dog dataset, revisited
print(dogs)

name breed color height_cm weight_kg


0 Bella Labrador Brown 56 25
1 Charlie Poodle Black 43 23
2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Gray 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS


.columns and .index
dogs.columns

Index(['name', 'breed', 'color', 'height_cm', 'weight_kg'], dtype='object')

dogs.index

RangeIndex(start=0, stop=7, step=1)

DATA MANIPULATION WITH PANDAS


Setting a column as the index
dogs_ind = dogs.set_index("name")
print(dogs_ind)

breed color height_cm weight_kg


name
Bella Labrador Brown 56 25
Charlie Poodle Black 43 23
Lucy Chow Chow Brown 46 22
Cooper Schnauzer Grey 49 17
Max Labrador Black 59 29
Stella Chihuahua Tan 18 2
Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS


Removing an index
dogs_ind.reset_index()

name breed color height_cm weight_kg


0 Bella Labrador Brown 56 25
1 Charlie Poodle Black 43 23
2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Grey 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS


Dropping an index
dogs_ind.reset_index(drop=True)

breed color height_cm weight_kg


0 Labrador Brown 56 25
1 Poodle Black 43 23
2 Chow Chow Brown 46 22
3 Schnauzer Grey 49 17
4 Labrador Black 59 29
5 Chihuahua Tan 18 2
6 St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS


Indexes make subsetting simpler
dogs[dogs["name"].isin(["Bella", "Stella"])]

name breed color height_cm weight_kg


0 Bella Labrador Brown 56 25
5 Stella Chihuahua Tan 18 2

dogs_ind.loc[["Bella", "Stella"]]

breed color height_cm weight_kg


name
Bella Labrador Brown 56 25
Stella Chihuahua Tan 18 2

DATA MANIPULATION WITH PANDAS


Index values don't need to be unique
dogs_ind2 = dogs.set_index("breed")
print(dogs_ind2)

name color height_cm weight_kg


breed
Labrador Bella Brown 56 25
Poodle Charlie Black 43 23
Chow Chow Lucy Brown 46 22
Schnauzer Cooper Grey 49 17
Labrador Max Black 59 29
Chihuahua Stella Tan 18 2
St. Bernard Bernie White 77 74

DATA MANIPULATION WITH PANDAS


Subsetting on duplicated index values
dogs_ind2.loc["Labrador"]

name color height_cm weight_kg


breed
Labrador Bella Brown 56 25
Labrador Max Black 59 29

DATA MANIPULATION WITH PANDAS


Multi-level indexes a.k.a. hierarchical indexes
dogs_ind3 = dogs.set_index(["breed", "color"])
print(dogs_ind3)

name height_cm weight_kg


breed color
Labrador Brown Bella 56 25
Poodle Black Charlie 43 23
Chow Chow Brown Lucy 46 22
Schnauzer Grey Cooper 49 17
Labrador Black Max 59 29
Chihuahua Tan Stella 18 2
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Subset the outer level with a list
dogs_ind3.loc[["Labrador", "Chihuahua"]]

name height_cm weight_kg


breed color
Labrador Brown Bella 56 25
Black Max 59 29
Chihuahua Tan Stella 18 2

DATA MANIPULATION WITH PANDAS


Subset inner levels with a list of tuples
dogs_ind3.loc[[("Labrador", "Brown"), ("Chihuahua", "Tan")]]

name height_cm weight_kg


breed color
Labrador Brown Bella 56 25
Chihuahua Tan Stella 18 2

DATA MANIPULATION WITH PANDAS


Sorting by index values
dogs_ind3.sort_index()

name height_cm weight_kg


breed color
Chihuahua Tan Stella 18 2
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Controlling sort_index
dogs_ind3.sort_index(level=["color", "breed"], ascending=[True, False])

name height_cm weight_kg


breed color
Poodle Black Charlie 43 23
Labrador Black Max 59 29
Brown Bella 56 25
Chow Chow Brown Lucy 46 22
Schanuzer Grey Cooper 49 17
Chihuahua Tan Stella 18 2
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Now you have two problems
Index values are just data

Indexes violate "tidy data" principles

You need to learn two syntaxes

DATA MANIPULATION WITH PANDAS


Temperature dataset
date city country avg_temp_c
0 2000-01-01 Abidjan Côte D'Ivoire 27.293
1 2000-02-01 Abidjan Côte D'Ivoire 27.685
2 2000-03-01 Abidjan Côte D'Ivoire 29.061
3 2000-04-01 Abidjan Côte D'Ivoire 28.162
4 2000-05-01 Abidjan Côte D'Ivoire 27.547

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Slicing and
subsetting with .loc
and .iloc
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
Slicing lists
breeds = ["Labrador", "Poodle", breeds[2:5]
"Chow Chow", "Schnauzer",
"Labrador", "Chihuahua",
['Chow Chow', 'Schnauzer', 'Labrador']
"St. Bernard"]

breeds[:3]
['Labrador',
'Poodle',
'Chow Chow', ['Labrador', 'Poodle', 'Chow Chow']
'Schnauzer',
'Labrador', breeds[:]
'Chihuahua',
'St. Bernard']
['Labrador','Poodle','Chow Chow','Schnauzer',
'Labrador','Chihuahua','St. Bernard']

DATA MANIPULATION WITH PANDAS


Sort the index before you slice
dogs_srt = dogs.set_index(["breed", "color"]).sort_index()
print(dogs_srt)

name height_cm weight_kg


breed color
Chihuahua Tan Stella 18 2
Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Slicing the outer index level
dogs_srt.loc["Chow Chow":"Poodle"] Full dataset

name height_cm weight_kg name height_cm weight_kg

breed color breed color

Chow Chow Brown Lucy 46 22 Chihuahua Tan Stella 18 2

Labrador Black Max 59 29 Chow Chow Brown Lucy 46 22

Brown Bella 56 25 Labrador Black Max 59 29

Poodle Black Charlie 43 23 Brown Bella 56 25


Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
The nal value "Poodle" is included
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Slicing the inner index levels badly
dogs_srt.loc["Tan":"Grey"] Full dataset

Empty DataFrame name height_cm weight_kg

Columns: [name, height_cm, weight_kg] breed color

Index: [] Chihuahua Tan Stella 18 2


Chow Chow Brown Lucy 46 22
Labrador Black Max 59 29
Brown Bella 56 25
Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Slicing the inner index levels correctly
dogs_srt.loc[ Full dataset
("Labrador", "Brown"):("Schnauzer", "Grey")]
name height_cm weight_kg

name height_cm weight_kg breed color

breed color Chihuahua Tan Stella 18 2

Labrador Brown Bella 56 25 Chow Chow Brown Lucy 46 22

Poodle Black Charlie 43 23 Labrador Black Max 59 29

Schnauzer Grey Cooper 49 17 Brown Bella 56 25


Poodle Black Charlie 43 23
Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Slicing columns
dogs_srt.loc[:, "name":"height_cm"] Full dataset

name height_cm name height_cm weight_kg

breed color breed color

Chihuahua Tan Stella 18 Chihuahua Tan Stella 18 2

Chow Chow Brown Lucy 46 Chow Chow Brown Lucy 46 22

Labrador Black Max 59 Labrador Black Max 59 29

Brown Bella 56 Brown Bella 56 25

Poodle Black Charlie 43 Poodle Black Charlie 43 23

Schnauzer Grey Cooper 49 Schnauzer Grey Cooper 49 17

St. Bernard White Bernie 77 St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Slice twice
dogs_srt.loc[ Full dataset
("Labrador", "Brown"):("Schnauzer", "Grey"),
"name":"height_cm"] name height_cm weight_kg
breed color

name height_cm Chihuahua Tan Stella 18 2

breed color Chow Chow Brown Lucy 46 22

Labrador Brown Bella 56 Labrador Black Max 59 29

Poodle Black Charlie 43 Brown Bella 56 25

Schanuzer Grey Cooper 49 Poodle Black Charlie 43 23


Schnauzer Grey Cooper 49 17
St. Bernard White Bernie 77 74

DATA MANIPULATION WITH PANDAS


Dog days
dogs = dogs.set_index("date_of_birth").sort_index()
print(dogs)

name breed color height_cm weight_kg


date_of_birth
2011-12-11 Cooper Schanuzer Grey 49 17
2013-07-01 Bella Labrador Brown 56 25
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23
2017-01-20 Max Labrador Black 59 29
2018-02-27 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS


Slicing by dates
# Get dogs with date_of_birth between 2014-08-25 and 2016-09-16
dogs.loc["2014-08-25":"2016-09-16"]

name breed color height_cm weight_kg


date_of_birth
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23

DATA MANIPULATION WITH PANDAS


Slicing by partial dates
# Get dogs with date_of_birth between 2014-01-01 and 2016-12-31
dogs.loc["2014":"2016"]

name breed color height_cm weight_kg


date_of_birth
2014-08-25 Lucy Chow Chow Brown 46 22
2015-04-20 Stella Chihuahua Tan 18 2
2016-09-16 Charlie Poodle Black 43 23

DATA MANIPULATION WITH PANDAS


Subsetting by row/column number
print(dogs.iloc[2:5, 1:4]) Full dataset

breed color height_cm name breed color height_cm weight_kg


2 Chow Chow Brown 46 0 Bella Labrador Brown 56 25
3 Schnauzer Grey 49 1 Charlie Poodle Black 43 23
4 Labrador Black 59 2 Lucy Chow Chow Brown 46 22
3 Cooper Schnauzer Grey 49 17
4 Max Labrador Black 59 29
5 Stella Chihuahua Tan 18 2
6 Bernie St. Bernard White 77 74

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Working with pivot
tables
D ATA M A N I P U L AT I O N W I T H PA N D A S

Richie Co on
Learning Solutions Architect at
DataCamp
A bigger dog dataset
print(dog_pack)

breed color height_cm weight_kg


0 Boxer Brown 62.64 30.4
1 Poodle Black 46.41 20.4
2 Beagle Brown 36.39 12.4
3 Chihuahua Tan 19.70 1.6
4 Labrador Tan 54.44 36.1
.. ... ... ... ...
87 Boxer Gray 58.13 29.9
88 St. Bernard White 70.13 69.4
89 Poodle Gray 51.30 20.4
90 Beagle White 38.81 8.8
91 Beagle Black 33.40 13.5

DATA MANIPULATION WITH PANDAS


Pivoting the dog pack
dogs_height_by_breed_vs_color = dog_pack.pivot_table(
"height_cm", index="breed", columns="color")
print(dogs_height_by_breed_vs_color)

color Black Brown Gray Tan White


breed
Beagle 34.500000 36.4500 36.313333 35.740000 38.810000
Boxer 57.203333 62.6400 58.280000 62.310000 56.360000
Chihuahua 18.555000 NaN 21.660000 20.096667 17.933333
Chow Chow 51.262500 50.4800 NaN 53.497500 54.413333
Dachshund 21.186667 19.7250 NaN 19.375000 20.660000
Labrador 57.125000 NaN NaN 55.190000 55.310000
Poodle 48.036000 57.1300 56.645000 NaN 44.740000
St. Bernard 63.920000 65.8825 67.640000 68.334000 67.495000

DATA MANIPULATION WITH PANDAS


.loc[] + slicing is a power combo
dogs_height_by_breed_vs_color.loc["Chow Chow":"Poodle"]

color Black Brown Gray Tan White


breed
Chow Chow 51.262500 50.480 NaN 53.4975 54.413333
Dachshund 21.186667 19.725 NaN 19.3750 20.660000
Labrador 57.125000 NaN NaN 55.1900 55.310000
Poodle 48.036000 57.130 56.645 NaN 44.740000

DATA MANIPULATION WITH PANDAS


The axis argument
dogs_height_by_breed_vs_color.mean(axis="index")

color
Black 43.973563
Brown 48.717917
Gray 48.107667
Tan 44.934738
White 44.465208
dtype: float64

DATA MANIPULATION WITH PANDAS


Calculating summary stats across columns
dogs_height_by_breed_vs_color.mean(axis="columns")

breed
Beagle 36.362667
Boxer 59.358667
Chihuahua 19.561250
Chow Chow 52.413333
Dachshund 20.236667
Labrador 55.875000
Poodle 51.637750
St. Bernard 66.654300
dtype: float64

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Visualizing your
data
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Histograms
import matplotlib.pyplot as plt

dog_pack["height_cm"].hist()

plt.show()

DATA MANIPULATION WITH PANDAS


Histograms
dog_pack["height_cm"].hist(bins=20) dog_pack["height_cm"].hist(bins=5)
plt.show() plt.show()

DATA MANIPULATION WITH PANDAS


Bar plots
avg_weight_by_breed = dog_pack.groupby("breed")["weight_kg"].mean()
print(avg_weight_by_breed)

breed
Beagle 10.636364
Boxer 30.620000
Chihuahua 1.491667
Chow Chow 22.535714
Dachshund 9.975000
Labrador 31.850000
Poodle 20.400000
St. Bernard 71.576923
Name: weight_kg, dtype: float64

DATA MANIPULATION WITH PANDAS


Bar plots
avg_weight_by_breed.plot(kind="bar") avg_weight_by_breed.plot(kind="bar",
plt.show() title="Mean Weight by Dog Breed")
plt.show()

DATA MANIPULATION WITH PANDAS


Line plots
sully.head() sully.plot(x="date",
y="weight_kg",

date weight_kg kind="line")

0 2019-01-31 36.1 plt.show()

1 2019-02-28 35.3
2 2019-03-31 32.0
3 2019-04-30 32.9
4 2019-05-31 32.0

DATA MANIPULATION WITH PANDAS


Rotating axis labels
sully.plot(x="date", y="weight_kg", kind="line", rot=45)
plt.show()

DATA MANIPULATION WITH PANDAS


Scatter plots
dog_pack.plot(x="height_cm", y="weight_kg", kind="scatter")
plt.show()

DATA MANIPULATION WITH PANDAS


Layering plots
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()
plt.show()

DATA MANIPULATION WITH PANDAS


Add a legend
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()
plt.legend(["F", "M"])
plt.show()

DATA MANIPULATION WITH PANDAS


Transparency
dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist(alpha=0.7)
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist(alpha=0.7)
plt.legend(["F", "M"])
plt.show()

DATA MANIPULATION WITH PANDAS


Avocados
print(avocados)

date type year avg_price size nb_sold


0 2015-12-27 conventional 2015 0.95 small 9626901.09
1 2015-12-20 conventional 2015 0.98 small 8710021.76
2 2015-12-13 conventional 2015 0.93 small 9855053.66
... ... ... ... ... ... ...
1011 2018-01-21 organic 2018 1.63 extra_large 1490.02
1012 2018-01-14 organic 2018 1.59 extra_large 1580.01
1013 2018-01-07 organic 2018 1.51 extra_large 1289.07

[1014 rows x 6 columns]

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Missing values
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
What's a missing value?
Name Breed Color Height (cm) Weight (kg) Date of Birth
Bella Labrador Brown 56 25 2013-07-01
Charlie Poodle Black 43 23 2016-09-16
Lucy Chow Chow Brown 46 22 2014-08-25
Cooper Schnauzer Gray 49 17 2011-12-11
Max Labrador Black 59 29 2017-01-20
Stella Chihuahua Tan 18 2 2015-04-20
Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


What's a missing value?
Name Breed Color Height (cm) Weight (kg) Date of Birth
Bella Labrador Brown 56 ? 2013-07-01
Charlie Poodle Black 43 23 2016-09-16
Lucy Chow Chow Brown 46 22 2014-08-25
Cooper Schnauzer Gray 49 ? 2011-12-11
Max Labrador Black 59 29 2017-01-20
Stella Chihuahua Tan 18 2 2015-04-20
Bernie St. Bernard White 77 74 2018-02-27

DATA MANIPULATION WITH PANDAS


Missing values in pandas DataFrames
print(dogs)

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 NaN 2013-07-01
1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
3 Cooper Schnauzer Gray 49 NaN 2011-12-11
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27

DATA MANIPULATION WITH PANDAS


Detecting missing values
dogs.isna()

name breed color height_cm weight_kg date_of_birth


0 False False False False True False
1 False False False False False False
2 False False False False False False
3 False False False False True False
4 False False False False False False
5 False False False False False False
6 False False False False False False

DATA MANIPULATION WITH PANDAS


Detecting any missing values
dogs.isna().any()

name False
breed False
color False
height_cm False
weight_kg True
date_of_birth False
dtype: bool

DATA MANIPULATION WITH PANDAS


Counting missing values
dogs.isna().sum()

name 0
breed 0
color 0
height_cm 0
weight_kg 2
date_of_birth 0
dtype: int64

DATA MANIPULATION WITH PANDAS


Plotting missing values
import matplotlib.pyplot as plt
dogs.isna().sum().plot(kind="bar")
plt.show()

DATA MANIPULATION WITH PANDAS


Removing missing values
dogs.dropna()

name breed color height_cm weight_kg date_of_birth


1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27

DATA MANIPULATION WITH PANDAS


Replacing missing values
dogs.fillna(0)

name breed color height_cm weight_kg date_of_birth


0 Bella Labrador Brown 56 0.0 2013-07-01
1 Charlie Poodle Black 43 24.0 2016-09-16
2 Lucy Chow Chow Brown 46 24.0 2014-08-25
3 Cooper Schnauzer Gray 49 0.0 2011-12-11
4 Max Labrador Black 59 29.0 2017-01-20
5 Stella Chihuahua Tan 18 2.0 2015-04-20
6 Bernie St. Bernard White 77 74.0 2018-02-27

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Creating
DataFrames
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Dictionaries
my_dict = { my_dict = {
"key1": value1, "title": "Charlotte's Web",
"key2": value2, "author": "E.B. White",
"key3": value3 "published": 1952
} }

my_dict["key1"] my_dict["title"]

value1 Charlotte's Web

DATA MANIPULATION WITH PANDAS


Creating DataFrames
From a list of dictionaries From a dictionary of lists

Constructed row by row Constructed column by column

DATA MANIPULATION WITH PANDAS


List of dictionaries - by row
name breed height (cm) weight (kg) date of birth
Ginger Dachshund 22 10 2019-03-14
Scout Dalmatian 59 25 2019-05-09

list_of_dicts = [
{"name": "Ginger", "breed": "Dachshund", "height_cm": 22,
"weight_kg": 10, "date_of_birth": "2019-03-14"},
{"name": "Scout", "breed": "Dalmatian", "height_cm": 59,
"weight_kg": 25, "date_of_birth": "2019-05-09"}
]

DATA MANIPULATION WITH PANDAS


List of dictionaries - by row
name breed height (cm) weight (kg) date of birth
Ginger Dachshund 22 10 2019-03-14
Scout Dalmatian 59 25 2019-05-09

new_dogs = pd.DataFrame(list_of_dicts)
print(new_dogs)

name breed height_cm weight_kg date_of_birth


0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09

DATA MANIPULATION WITH PANDAS


Dictionary of lists - by column
date dict_of_lists = {
name breed height weight of
birth "name": ["Ginger", "Scout"],
"breed": ["Dachshund", "Dalmatian"],
2019-
Ginger Dachshund 22 10
03-14 "height_cm": [22, 59],

2019- "weight_kg": [10, 25],


Scout Dalmatian 59 25 05- "date_of_birth": ["2019-03-14",
09
"2019-05-09"]
}
new_dogs = pd.DataFrame(dict_of_lists)
Key = column name

Value = list of column values

DATA MANIPULATION WITH PANDAS


Dictionary of lists - by column
name breed height (cm) weight (kg) date of birth
Ginger Dachshund 22 10 2019-03-14
Scout Dalmatian 59 25 2019-05-09

print(new_dogs)

name breed height_cm weight_kg date_of_birth


0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Reading and writing
CSVs
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
What's a CSV file?
CSV = comma-separated values

Designed for DataFrame-like data

Most database and spreadsheet programs can use them or create them

DATA MANIPULATION WITH PANDAS


Example CSV file
new_dogs.csv

name,breed,height_cm,weight_kg,d_o_b
Ginger,Dachshund,22,10,2019-03-14
Scout,Dalmatian,59,25,2019-05-09

DATA MANIPULATION WITH PANDAS


CSV to DataFrame
import pandas as pd
new_dogs = pd.read_csv("new_dogs.csv")
print(new_dogs)

name breed height_cm weight_kg date_of_birth


0 Ginger Dachshund 22 10 2019-03-14
1 Scout Dalmatian 59 25 2019-05-09

DATA MANIPULATION WITH PANDAS


DataFrame manipulation
new_dogs["bmi"] = new_dogs["weight_kg"] / (new_dogs["height_cm"] / 100) ** 2
print(new_dogs)

name breed height_cm weight_kg date_of_birth bmi


0 Ginger Dachshund 22 10 2019-03-14 206.611570
1 Scout Dalmatian 59 25 2019-05-09 71.818443

DATA MANIPULATION WITH PANDAS


DataFrame to CSV
new_dogs.to_csv("new_dogs_with_bmi.csv")

new_dogs_with_bmi.csv

name,breed,height_cm,weight_kg,d_o_b,bmi
Ginger,Dachshund,22,10,2019-03-14,206.611570
Scout,Dalmatian,59,25,2019-05-09,71.818443

DATA MANIPULATION WITH PANDAS


Let's practice!
D ATA M A N I P U L AT I O N W I T H PA N D A S
Wrap-up
D ATA M A N I P U L AT I O N W I T H PA N D A S

Maggie Matsui
Senior Content Developer at DataCamp
Recap
Chapter 1 Chapter 3
Subse ing and sorting Indexing

Adding new columns Slicing

Chapter 2 Chapter 4
Aggregating and grouping Visualizations

Summary statistics Reading and writing CSVs

DATA MANIPULATION WITH PANDAS


More to learn
Joining Data with pandas

Streamlined Data Ingestion with pandas

Analyzing Police Activity with pandas

Analyzing Marketing Campaigns with


pandas

DATA MANIPULATION WITH PANDAS


Congratulations!
D ATA M A N I P U L AT I O N W I T H PA N D A S

You might also like