0% found this document useful (0 votes)
87 views23 pages

3 Pandas Basic III

The document discusses important data formatting methods in Pandas including merge, sort, reset_index, and fillna. It provides examples of merging two zoo dataframes on the animal column, sorting the zoo dataframe by water_need ascending and descending, resetting the index after sorting to remove old indexes, and using fillna to replace NaN values with "unknown" after a left merge. It also discusses merging two additional datasets, article_read and blog_buy, and provides self-test questions to practice these methods.

Uploaded by

Kurnia Setiyawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views23 pages

3 Pandas Basic III

The document discusses important data formatting methods in Pandas including merge, sort, reset_index, and fillna. It provides examples of merging two zoo dataframes on the animal column, sorting the zoo dataframe by water_need ascending and descending, resetting the index after sorting to remove old indexes, and using fillna to replace NaN values with "unknown" after a left merge. It also discusses merging two additional datasets, article_read and blog_buy, and provides self-test questions to practice these methods.

Uploaded by

Kurnia Setiyawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Important Data Formatting Methods

(merge, sort, reset_index, fillna)


Let’s start with
our zoo dataset,
Againn..
Pandas Merge
(a.k.a. “joining” dataframes)
let’s say that we have another dataframe, zoo_eats,
that contains information about the food requirements
for each species.
zoo_eats.csv data:

Make animal,food
elephant,vegetables
zoo_eats.csv tiger,meat
kangaroo,vegetables

File zebra,meat
girrafe, vegetables

Loading the data:


zoo_eats = pd.read_csv('zoo_eats.csv')
zoo_eats
Let’s merge
these two pandas
dataframes 
zoo.merge(zoo_eats)  
 
Try:
zoo_eats.merge(zoo) zoo.merge(zoo_eats)

Is it the same?

where are
all the
lions?

Calmdown… Lion akan kembalii


Pandas Merge…
But how?
Inner, outer, left or right?
HOW YOU WANT
TO MERGE?
Let’s try this:
zoo.merge(zoo_eats, 'outer')

See?
Lions came back…
the giraffe came back…
Let’s try this too:
zoo.merge(zoo_eats,‘left')
Sorting in Pandas
Sorts In Ascending Order
zoo.sort_values(by=['water_need’])
Sorts In Descending Order
zoo.sort_values('water_need’, ascending = False)
sort by multiple columns
zoo.sort_values(by = ['animal', 'water_need'])
Reset Index
wrong indexing can mess up your visualizations or even
your machine learning models
Reset Index
zoo.sort_values(by = ['water_need'], ascending = False).reset_index()

As you can see, our new dataframe kept the old


indexes, too. If you want to remove them, just add
the drop = True 
Reset Index
Fillna
Note: fillna is basically fill + na in one world. 
Let’s rerun the left-merge method that we have used
above:
zoo.merge(zoo_eats, 'left')

The problem is that we


have NaN values for lions.

Let’s replace it with something more


meaningful
zoo.merge(zoo_eats, how = 'left').fillna('unknown')
Let’s get back to our article_read dataset
this dataset holds the data of a travel blog
Download another data from:
46.101.230.157/dilan/pandas_tutorial_buy.csv
And name it as blog_buy
Thera 4 variable in blog_buy, mame those
variable respectively as ‘my_date_time',
'event', 'user_id’ and 'amount'

Test Your Self #1!


Merge article read and blog_buy
Test Your Self #2!
What’s the average (mean) revenue
between 2018-01-01 and 2018-01-07 from
the users in the article_read dataframe?
Test Your Self #3!
Print the top 3 countries by total revenue
between 2018-01-01 and 2018-01-07! (Obviously,
this concerns the users in
the article_read dataframe again.)
 

You might also like