0% found this document useful (0 votes)

15 views15 pages

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

SARAVANAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views15 pages

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

SARAVANAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit - III

Pandas

Introducing Pandas Objects

 Pandas objects is enhanced versions of NumPy structured arrays in which the rows and
columns are identified with labels rather than simple integer indices.

 Three fundamental Pandas data structures: the Series, DataFrame, and Index.

import numpy as np
import pandas as pd

The Pandas Series Object

 A Pandas Series is a one-dimensional array of indexed data.

 It can be created from a list or array as follows:

data = pd.Series([0.25, 0.5, 0.75, 1.0])

data
0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64

data.values
array([ 0.25, 0.5 , 0.75, 1. ])

data.index
RangeIndex(start=0, stop=4, step=1)

data[1]
0.5

data[1:3]
1 0.50
2 0.75
dtype: float64
Series as generalized NumPy array
 We can use strings as an index:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

data['b']
0.5

 We can even use non-contiguous or non-sequential indices:

data = pd.Series([0.25, 0.5, 0.75, 1.0],

index=[2, 5, 3, 7])
data
2 0.25
5 0.50
3 0.75
7 1.00
dtype: float64

data[5]
0.5

Series as specialized dictionary

population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64

population['California']
38332521

population['California':'Illinois']
California 38332521
Florida 19552860
Illinois 12882135
dtype: int64
Constructing Series objects

>>> pd.Series(data, index=index)

 Where index is an optional argument and data can be one of many entities.

pd.Series([2, 4, 6])
0 2
1 4
2 6
dtype: int64

pd.Series(5, index=[100, 200, 300])

100 5
200 5
300 5
dtype: int64

pd.Series({2:'a', 1:'b', 3:'c'})

1 b
2 a
3 c
dtype: object

pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

3 c
2 a
dtype: object

The Pandas DataFrame Object

 Series is an analog of a one-dimensional array with flexible indices,

 DataFrame is an analog of a two-dimensional array with both flexible row indices and
flexible column names.

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,

'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
dtype: int64
states = pd.DataFrame({'population': population,
'area': area})
states
area population
California42396738332521
Florida 17031219552860
Illinois 14999512882135
New York14129719651127
Texas 69566226448193

states.index
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

states.columns
Index(['area', 'population'], dtype='object')

DataFrame as specialized dictionary

states['area']
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64

Constructing DataFrame objects

 A Pandas DataFrame can be constructed in a variety of ways.

From a single Series object

 A DataFrame is a collection of Series objects, and a single-column DataFrame can be
constructed from a single Series:

pd.DataFrame(population, columns=['population'])
population

California38332521

Florida 19552860

Illinois 12882135

New York19651127

Texas 26448193
From a list of dicts
data = [{'a': i, 'b': 2 * i}
for i in range(3)]
pd.DataFrame(data)
a b

0 0 0

1 1 2

2 2 4

From a dictionary of Series objects

pd.DataFrame({'population': population,
'area': area})
area population
California42396738332521
Florida 17031219552860
Illinois 14999512882135
New York14129719651127
Texas 69566226448193

From a two-dimensional NumPy array

pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])

foo bar

a 0.865257 0.213169

b 0.442759 0.108267

c 0.047110 0.905718

From a NumPy structured array

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A
array([(0, 0.0), (0, 0.0), (0, 0.0)],
dtype=[('A', '<i8'), ('B', '<f8')])

pd.DataFrame(A)

A B

0 0 0.0

1 0 0.0

2 0 0.0

The Pandas Index Object

 Both the Series and DataFrame objects contain an explicit index that lets you reference
and modify data.
 Index object is an interesting structure in itself, and it can be thought of either as
an immutable array or as an ordered set.

ind = pd.Index([2, 3, 5, 7, 11])

ind
Int64Index([2, 3, 5, 7, 11], dtype='int64')

Index as immutable array

ind[1]
3

ind[::2]
Int64Index([2, 5, 11], dtype='int64')

print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64

 One difference between Index objects and NumPy arrays is that indices are immutable–
that is, they cannot be modified via the normal means:

ind[1] = 0
Index as ordered set
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
indA & indB # intersection
Int64Index([3, 5, 7], dtype='int64')

indA | indB # union

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

indA ^ indB # symmetric difference

Int64Index([1, 2, 9, 11], dtype='int64')

Data Selection in Series

Indexers: loc, iloc, and ix

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

data
1 a
3 b
5 c
dtype: object

# explicit index when indexing

data[1]
'a'
# implicit index when slicing
data[1:3]
3 b
5 c
dtype: object

 First, the loc attribute allows indexing and slicing that always references the explicit
index:

data.loc[1]
'a'

data.loc[1:3]
1 a
3 b
dtype: object

 The iloc attribute allows indexing and slicing that always references the implicit Python-
style index:

data.iloc[1]
'b'
data.iloc[1:3]
3 b
5 c
dtype: object

 A third indexing attribute, ix, is a hybrid of the two, and for Series objects is equivalent to
standard []-based indexing.
 The purpose of the ix indexer will become more apparent in the context
of DataFrame objects

Operating on Data in Pandas

 One of the essential pieces of NumPy is the ability to perform quick element-wise
operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and with
more sophisticated operations (trigonometric functions, exponential and logarithmic
functions, etc.).

 Pandas inherit much of this functionality from NumPy, and the ufuncs.

 Pandas include a couple useful twists, however:

i. For unary operations like negation and trigonometric functions, these ufuncs
will preserve index and column labels in the output.
ii. For binary operations such as addition and multiplication, Pandas will
automatically align indices when passing the objects to the ufunc.

Handling Missing Data

 The difference between data found in many tutorials and data in the real world is that
real-world data is rarely clean and homogeneous.

 In particular, many interesting datasets will have some amount of data missing.

 Different data sources may indicate missing data in different ways.

 Here we will see some general considerations for missing data, and how Pandas chooses
to represent it.

 Demonstrate some built-in Pandas tools for handling missing data in Python.
 We refer missing data in general as null, NaN, or NA values.

 There are a number of schemes that have been developed to indicate the presence of
missing data in a table or DataFrame.

 Two strategies: using a mask that globally indicates missing values, or choosing
a sentinel value (indicating a missing integer value with -9999 or NaN or None) that
indicates a missing entry.

Operating on Null Values

 Pandas treats None and NaN as essentially interchangeable for indicating missing or null
values.

 To facilitate this convention, there are several useful methods for detecting, removing,
and replacing null values in Pandas data structures. They are:
 isnull(): Generate a boolean mask indicating missing values
 notnull(): Opposite of isnull()
 dropna(): Return a filtered version of the data
 fillna(): Return a copy of the data with missing values filled or imputed

Hierarchical Indexing
 Pandas provide objects that handle three-dimensional and four-dimensional data.

 Common pattern in practice is to make use of hierarchical indexing (also known

as multi-indexing) to incorporate multiple index levels within a single index.

 In this way, higher-dimensional data can be compactly represented within the familiar
one-dimensional Series and two-dimensional DataFrame objects.

Pandas MultiIndex
import pandas as pd
import numpy as np
index = [('California', 2000), ('California', 2010),
('New York', 2000), ('New York', 2010),
('Texas', 2000), ('Texas', 2010)]
populations = [33871648, 37253956,
18976457, 19378102,
20851820, 25145561]
pop = pd.Series(populations, index=index)
pop
(California, 2000) 33871648
(California, 2010) 37253956
(New York, 2000) 18976457
(New York, 2010) 19378102
(Texas, 2000) 20851820
(Texas, 2010) 25145561
dtype: int64

index = pd.MultiIndex.from_tuples(index)
index
MultiIndex(levels=[['California', 'New York', 'Texas'], [2000, 2010]],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

 The MultiIndex contains multiple levels of indexing–in this case, the state names and the
years, as well as multiple labels for each data point which encode these levels.

 If we re-index our series with this MultiIndex, we see the hierarchical representation of
the data:
pop = pop.reindex(index)
pop
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

pop[:, 2010]
California 37253956
New York 19378102
Texas 25145561
dtype: int64

MultiIndex as extra dimension

 The unstack() method will quickly convert a multiply indexed Series into a
conventionally indexed DataFrame:

pop_df = pop.unstack()
pop_df

2000 2010

California3387164837253956

New York 1897645719378102

Texas 2085182025145561
 The stack() method provides the opposite operation:

pop_df.stack()
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

pop_df = pd.DataFrame({'total': pop,

'under18': [9267089, 9284094,
4687374, 4318033,
5906301, 6879014]})
pop_df

total under18

California2000338716489267089

2010372539569284094

New York 2000189764574687374

2010193781024318033

Texas 2000208518205906301

2010251455616879014

Methods of MultiIndex Creation

 The most straightforward way to construct a multiply indexed Series or DataFrame is to
simply pass a list of two or more index arrays to the constructor.

df = pd.DataFrame(np.random.rand(4, 2),
index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
columns=['data1', 'data2'])
df

data1 data2

a 10.5542330.356072

20.9252440.219474

b10.4417590.610054

20.1714950.886688
data = {('California', 2000): 33871648,
('California', 2010): 37253956,
('Texas', 2000): 20851820,
('Texas', 2010): 25145561,
('New York', 2000): 18976457,
('New York', 2010): 19378102}
pd.Series(data)
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

Explicit MultiIndex constructors

pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])

MultiIndex(levels=[['a', 'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])

MultiIndex(levels=[['a', 'b'], [1, 2]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

MultiIndex level names

 Sometimes it is convenient to name the levels of the MultiIndex. This can be
accomplished by passing the names argument to any of the
above MultiIndex constructors.

pop.index.names = ['state', 'year']

pop
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

Indexing and Slicing a MultiIndex

pop
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
Texas 2000 20851820
2010 25145561
dtype: int64

pop.loc['California':'New York']
state year
California 2000 33871648
2010 37253956
New York 2000 18976457
2010 19378102
dtype: int64

Combining Datasets:
Concat and Append

import pandas as pd
import numpy as np

x = [1, 2, 3]
y = [4, 5, 6]
z = [7, 8, 9]
np.concatenate([x, y, z])
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

x = [[1, 2],
[3, 4]]
np.concatenate([x, x], axis=1)
array([[1, 2, 1, 2],
[3, 4, 3, 4]])

Simple Concatenation with pd.concat

ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])
ser2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6])
pd.concat([ser1, ser2])
1 A
2 B
3 C
4 D
5 E
6 F
dtype: object

df1 = make_df('AB', [1, 2])

df2 = make_df('AB', [3, 4])
display('df1', 'df2', 'pd.concat([df1, df2])')
df3 = make_df('AB', [0, 1])
df4 = make_df('CD', [0, 1])
display('df3', 'df4', "pd.concat([df3, df4], axis='col')")

Concatenation with joins

df5 = make_df('ABC', [1, 2])
df6 = make_df('BCD', [3, 4])
display('df5', 'df6', 'pd.concat([df5, df6])')

display('df5', 'df6',
"pd.concat([df5, df6], join='inner')")

The append() method

display('df1', 'df2', 'df1.append(df2)')
Merge and Join

Categories of Joins

 The pd.merge() function implements a number of types of joins: the one-to-one, many-to-
one, and many-to-many joins.

 All three types of joins are accessed via an identical call to the pd.merge() interface; the
type of join performed depends on the form of the input data.

Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
138 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Pandas
No ratings yet
Pandas
63 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
The Pandas Series Object-Print
No ratings yet
The Pandas Series Object-Print
16 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Pandas
No ratings yet
Pandas
163 pages
Notes - EDA-Unit2
No ratings yet
Notes - EDA-Unit2
43 pages
Pandas
No ratings yet
Pandas
57 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Module 6
No ratings yet
Module 6
48 pages
Unit 2
No ratings yet
Unit 2
81 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Pandas Fundamentals
No ratings yet
Pandas Fundamentals
90 pages
09 - Pandas Slides
No ratings yet
09 - Pandas Slides
33 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
23 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
SR Ip Pandas I Full Notes
No ratings yet
SR Ip Pandas I Full Notes
30 pages
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
14 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Pandas Series - Notes for PA3.Docx
No ratings yet
Pandas Series - Notes for PA3.Docx
9 pages
Ip Notes
No ratings yet
Ip Notes
20 pages
Leip 102
No ratings yet
Leip 102
36 pages
Pandas
No ratings yet
Pandas
12 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
eda u2
No ratings yet
eda u2
61 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Ip 102
No ratings yet
Ip 102
36 pages
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
No ratings yet
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
30 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas 1 Series
No ratings yet
Pandas 1 Series
14 pages
Pandas
No ratings yet
Pandas
7 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
01 Activity 1 Professional Issues
No ratings yet
01 Activity 1 Professional Issues
1 page
Metaverse 3C: Concept, Components, and Challenges in Travel and Tourism Sector
No ratings yet
Metaverse 3C: Concept, Components, and Challenges in Travel and Tourism Sector
15 pages
Of Computer Networks, Network Hardware, Network Software, Reference Models
No ratings yet
Of Computer Networks, Network Hardware, Network Software, Reference Models
3 pages
UVM Tutorial For Candy Lovers - 9
No ratings yet
UVM Tutorial For Candy Lovers - 9
39 pages
Software Engineering Software Requirements Specification (SRS) Document
No ratings yet
Software Engineering Software Requirements Specification (SRS) Document
13 pages
04 Tso Ispf SDSF
No ratings yet
04 Tso Ispf SDSF
43 pages
Ite101 Week 1 5 1 1
No ratings yet
Ite101 Week 1 5 1 1
41 pages
Pwa Roads & Drainage Cad Standards Manual Ver 5.0
No ratings yet
Pwa Roads & Drainage Cad Standards Manual Ver 5.0
208 pages
Movie Editing Tools-New
No ratings yet
Movie Editing Tools-New
22 pages
10 Essential Coding Projects To Master Your Fundamentals and Land A $100,000+ Job ASAP
No ratings yet
10 Essential Coding Projects To Master Your Fundamentals and Land A $100,000+ Job ASAP
18 pages
Stack Based Buffer Overflows On Windows X86 Module Cheat Sheet
No ratings yet
Stack Based Buffer Overflows On Windows X86 Module Cheat Sheet
3 pages
GQ Solutions
No ratings yet
GQ Solutions
12 pages
Job Description
No ratings yet
Job Description
1 page
EGK V53 Commissioning Instructions For IO-Link
No ratings yet
EGK V53 Commissioning Instructions For IO-Link
96 pages
Aveva™ Insight Communication Paths and Protocols 2025-06-23-08-06-11
No ratings yet
Aveva™ Insight Communication Paths and Protocols 2025-06-23-08-06-11
5 pages
BP For Sap S4hana Otc PDF
No ratings yet
BP For Sap S4hana Otc PDF
39 pages
The Indoor Unit of The RTN: Web LCT Configuration - Logging in To An NE
No ratings yet
The Indoor Unit of The RTN: Web LCT Configuration - Logging in To An NE
12 pages
Operating System
No ratings yet
Operating System
14 pages
Specs: Extended Capabilities Port Microsoft Development Library
No ratings yet
Specs: Extended Capabilities Port Microsoft Development Library
53 pages
45 IoT Assignment 2 Brief
No ratings yet
45 IoT Assignment 2 Brief
4 pages
NEG Al Ave Itt) : RT Prehgurs
No ratings yet
NEG Al Ave Itt) : RT Prehgurs
48 pages
SOBD MySQL Installation Setup-Windows Machine
No ratings yet
SOBD MySQL Installation Setup-Windows Machine
29 pages
Hello World - This Is Innosabi
No ratings yet
Hello World - This Is Innosabi
17 pages
Policy On Safeguarding Data
No ratings yet
Policy On Safeguarding Data
7 pages
SEM Microproject
No ratings yet
SEM Microproject
22 pages
Essay On Celebration
No ratings yet
Essay On Celebration
2 pages
HelloTalk - Language Exchange - Learn Languages For Free
No ratings yet
HelloTalk - Language Exchange - Learn Languages For Free
1 page
BSIT D 2018 Prospectus
No ratings yet
BSIT D 2018 Prospectus
2 pages
GT Virtual UART 64 Driver Installation Instructions
No ratings yet
GT Virtual UART 64 Driver Installation Instructions
11 pages
GE PanaLog Viewer Man 916 069B 2003 03 PDF
No ratings yet
GE PanaLog Viewer Man 916 069B 2003 03 PDF
30 pages

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

Unit III - Pandas - Data Manipulation Using Python

Uploaded by

Unit - III

Introducing Pandas Objects

The Pandas Series Object

 It can be created from a list or array as follows:

data = pd.Series([0.25, 0.5, 0.75, 1.0])

 We can even use non-contiguous or non-sequential indices:

data = pd.Series([0.25, 0.5, 0.75, 1.0],

Series as specialized dictionary

>>> pd.Series(data, index=index)

pd.Series(5, index=[100, 200, 300])

pd.Series({2:'a', 1:'b', 3:'c'})

pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

The Pandas DataFrame Object

area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,

DataFrame as specialized dictionary

Constructing DataFrame objects

From a single Series object

From a dictionary of Series objects

From a two-dimensional NumPy array

From a NumPy structured array

The Pandas Index Object

ind = pd.Index([2, 3, 5, 7, 11])

Index as immutable array

print(ind.size, ind.shape, ind.ndim, ind.dtype)

indA | indB # union

indA ^ indB # symmetric difference

Data Selection in Series

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

# explicit index when indexing

Operating on Data in Pandas

 Pandas include a couple useful twists, however:

Handling Missing Data

 Different data sources may indicate missing data in different ways.

Operating on Null Values

 Common pattern in practice is to make use of hierarchical indexing (also known

MultiIndex as extra dimension

New York 1897645719378102

pop_df = pd.DataFrame({'total': pop,

New York 2000189764574687374

Methods of MultiIndex Creation

Explicit MultiIndex constructors

pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])

pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])

MultiIndex level names

pop.index.names = ['state', 'year']

Indexing and Slicing a MultiIndex

Simple Concatenation with pd.concat

df1 = make_df('AB', [1, 2])

Concatenation with joins

The append() method

You might also like