0% found this document useful (0 votes)
11 views

Lecture 15 (DS) - Pandas - DataFrame Merging, String Operations

panda library in python

Uploaded by

anayabutt658
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 15 (DS) - Pandas - DataFrame Merging, String Operations

panda library in python

Uploaded by

anayabutt658
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Science

Lecture # 16
Pandas
• Often we need –
to Merging
work with dataDataFrame
from
multiple frames
• A common practice is to merge two frames
into one like join operatoin
• By the end of this lecture, you should be able
to

• Explain that data is usually distributed across


different locations and tables
• Combine data from distinct DataFrames
• Distinguish among different ways to combine 2

data sets
Note: All Images are taken from edx.org
Example DataFrames

• We’ll use the above data frames for this lecture


3

Note: All Images are taken from edx.org


pandas.concat() : Stack DataFrames
• The concat function
is used to create a
new dataframe out
of two
• Here the DataFrame
called left being
concatenated with
itself
• The index of
resulting DataFrame
will have now
indexes from the
original table
4

preserved
Note: All Images are taken from edx.org
pandas.concat() : Stack DataFrames

• Here is the
example of
concatenating
two different data
frames
• Please note the
appearance of
NaNor missing
values if a
column is missing
5

Note: All Images are taken from edx.org


Inner Join using pandas.concat()

• Instead of having extra rows with missing numbers, we can use inner
join
• In previous slide, concatenated DataFrames stacked vertically
• Here they are placed next to each other horizontally
• This is also not perfect because key columns have been duplicated
6

Note: All Images are taken from edx.org


Stack DataFrames using append()

• An alternative to
concat is append
• It behaves
similarly to
concat function
• But it is a
function of
DataFrame itself

Note: All Images are taken from edx.org


Inner Join using merge()

• The operation which will give us a true combination of


these two frames is called merge
• This function eliminates the duplicate columns 8

• Discuss Case Study: Movie Data Analysis


Note: All Images are taken from edx.org
• String
Frequent
is commonly String
used data Operations
type because we
often need to read text data
• We’ll now review a few useful string operations
in Panda
• By the end of this lecture, you should be able
to:

• Describe what operations the string methods can


perform
• Navigate your way to find the right string method for
you
9

• Perform basic string operations in Pandas


Note: All Images are taken from edx.org
str.split()
• One of the most
important string
operation is split
• It helps with separating
data into pieces around
a character
• It returns an object
with arrays
• In this example the city
field now contains an
array of strings rather
than just the string 10

Note: All Images are taken from edx.org


str.contains()

• This function checks


if a string has a given
character in it

11

Note: All Images are taken from edx.org


str.extract() – Returns first match found

12

Note: All Images are taken from edx.org


Summary

• In summary string operations are very handy in


data cleaning
• Explore more:

• http://pandas.pydata.org/pandas-docs/stable/te
xt.html#text-sting-methods
• Discuss Case Study: Movie Data Analysis
13

Note: All Images are taken from edx.org


Unix time / POSIX time / epoch time

• Unix time tracks the time by counting seconds


since a specific time instant
• That instant is the start of the year 1970 as per
UTC time zone
• This is an integer and we have to convert it into a
readable date and time

14

Note: All Images are taken from edx.org


Data Types for Timestamps

• Datetime64[Ns] is a general data type for datetime


• We can convert a timestamp to Python format using
datetime function 15

• Unit = All
Note: ‘s’Images
declared thatfrom
are taken unit is seconds
edx.org
Select Rows Based on Timestamps

• Once time is converted to Python format we can use it to


create filters
• We can also leverage the timestamp to sort data in
ascending or descending order as shown in next slide

16

Note: All Images are taken from edx.org


Sort Table in Chronological Order

17

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook
• First we talked about
data ingestion, in
which we reviewed
how to ingest data in
multiple formats and
basic read operations
related to these
formats
• We also talked about
Series and DataFrame
as two basic
fundamental data 18
structures in Pandas
Note: All Images are taken from edx.org
Summary of Movie Rated Notebook

• Then we discuss
basic statistical
operations on Series
and DataFrames
• We discussed joined
descriptive statistics
and individual
functions for
generating min, max
etc
19

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We went through
data preparation and
exploration options in
Pandas like isnull, any
and dropna functions

20

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We also overview
data visualization
• We saw examples of
inline plots, box plots
and histogram using
Panda’s plot function

21

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We talked about
slicing out rows and
filtering data frames,
as well as
aggregating data
using the groupby
operation

22

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We also talked about


merging or joining
data from multiple
data frames using
inner joins and other
operations

23

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We talked about three


main string
operations called
split, contains and
extract

24

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• Finally, we talked
about how to work
with time stamps

25

Note: All Images are taken from edx.org

You might also like