0% found this document useful (0 votes)
9 views

Lecture 15 (DS) - Pandas - DataFrame Merging, String Operations

panda library in python

Uploaded by

anayabutt658
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 15 (DS) - Pandas - DataFrame Merging, String Operations

panda library in python

Uploaded by

anayabutt658
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Science

Lecture # 16
Pandas
• Often we need –
to Merging
work with dataDataFrame
from
multiple frames
• A common practice is to merge two frames
into one like join operatoin
• By the end of this lecture, you should be able
to

• Explain that data is usually distributed across


different locations and tables
• Combine data from distinct DataFrames
• Distinguish among different ways to combine 2

data sets
Note: All Images are taken from edx.org
Example DataFrames

• We’ll use the above data frames for this lecture


3

Note: All Images are taken from edx.org


pandas.concat() : Stack DataFrames
• The concat function
is used to create a
new dataframe out
of two
• Here the DataFrame
called left being
concatenated with
itself
• The index of
resulting DataFrame
will have now
indexes from the
original table
4

preserved
Note: All Images are taken from edx.org
pandas.concat() : Stack DataFrames

• Here is the
example of
concatenating
two different data
frames
• Please note the
appearance of
NaNor missing
values if a
column is missing
5

Note: All Images are taken from edx.org


Inner Join using pandas.concat()

• Instead of having extra rows with missing numbers, we can use inner
join
• In previous slide, concatenated DataFrames stacked vertically
• Here they are placed next to each other horizontally
• This is also not perfect because key columns have been duplicated
6

Note: All Images are taken from edx.org


Stack DataFrames using append()

• An alternative to
concat is append
• It behaves
similarly to
concat function
• But it is a
function of
DataFrame itself

Note: All Images are taken from edx.org


Inner Join using merge()

• The operation which will give us a true combination of


these two frames is called merge
• This function eliminates the duplicate columns 8

• Discuss Case Study: Movie Data Analysis


Note: All Images are taken from edx.org
• String
Frequent
is commonly String
used data Operations
type because we
often need to read text data
• We’ll now review a few useful string operations
in Panda
• By the end of this lecture, you should be able
to:

• Describe what operations the string methods can


perform
• Navigate your way to find the right string method for
you
9

• Perform basic string operations in Pandas


Note: All Images are taken from edx.org
str.split()
• One of the most
important string
operation is split
• It helps with separating
data into pieces around
a character
• It returns an object
with arrays
• In this example the city
field now contains an
array of strings rather
than just the string 10

Note: All Images are taken from edx.org


str.contains()

• This function checks


if a string has a given
character in it

11

Note: All Images are taken from edx.org


str.extract() – Returns first match found

12

Note: All Images are taken from edx.org


Summary

• In summary string operations are very handy in


data cleaning
• Explore more:

• http://pandas.pydata.org/pandas-docs/stable/te
xt.html#text-sting-methods
• Discuss Case Study: Movie Data Analysis
13

Note: All Images are taken from edx.org


Unix time / POSIX time / epoch time

• Unix time tracks the time by counting seconds


since a specific time instant
• That instant is the start of the year 1970 as per
UTC time zone
• This is an integer and we have to convert it into a
readable date and time

14

Note: All Images are taken from edx.org


Data Types for Timestamps

• Datetime64[Ns] is a general data type for datetime


• We can convert a timestamp to Python format using
datetime function 15

• Unit = All
Note: ‘s’Images
declared thatfrom
are taken unit is seconds
edx.org
Select Rows Based on Timestamps

• Once time is converted to Python format we can use it to


create filters
• We can also leverage the timestamp to sort data in
ascending or descending order as shown in next slide

16

Note: All Images are taken from edx.org


Sort Table in Chronological Order

17

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook
• First we talked about
data ingestion, in
which we reviewed
how to ingest data in
multiple formats and
basic read operations
related to these
formats
• We also talked about
Series and DataFrame
as two basic
fundamental data 18
structures in Pandas
Note: All Images are taken from edx.org
Summary of Movie Rated Notebook

• Then we discuss
basic statistical
operations on Series
and DataFrames
• We discussed joined
descriptive statistics
and individual
functions for
generating min, max
etc
19

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We went through
data preparation and
exploration options in
Pandas like isnull, any
and dropna functions

20

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We also overview
data visualization
• We saw examples of
inline plots, box plots
and histogram using
Panda’s plot function

21

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We talked about
slicing out rows and
filtering data frames,
as well as
aggregating data
using the groupby
operation

22

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We also talked about


merging or joining
data from multiple
data frames using
inner joins and other
operations

23

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• We talked about three


main string
operations called
split, contains and
extract

24

Note: All Images are taken from edx.org


Summary of Movie Rated Notebook

• Finally, we talked
about how to work
with time stamps

25

Note: All Images are taken from edx.org

You might also like