0% found this document useful (0 votes)

10 views38 pages

Data Manipulation With Pandas

The document introduces Pandas, a data manipulation library in Python, highlighting its three fundamental data structures: Series, DataFrame, and Index. It explains the creation and functionality of Series as a one-dimensional array with labeled indices, and DataFrame as a two-dimensional array with flexible row and column labels. Additionally, it discusses the immutability and ordered set properties of the Index object, along with the ability to perform element-wise operations using NumPy's universal functions.

Uploaded by

22eg112a15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views38 pages

Data Manipulation With Pandas

Uploaded by

22eg112a15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Data Manipulation with Pandas

Introducing Pandas Objects

• Pandas objects are enhanced versions of NumPy structured arrays in which
the rows and columns are identified with labels rather than simple integer
indices.
• Pandas provides a host of useful tools, methods, and functionality on top of
the basic data structures.
• Three fundamental Pandas data structures:
1. Series 2. DataFrame and 3.Index
In[1]: import numpy as np
import pandas as pd
1. The Pandas Series Object
A Pandas Series is a one-dimensional array of indexed data. It can be created from
a list or array as follows:
In[2]:data = pd.Series([0.25, 0.5, 0.75, 1.0])
data
Out[2]: 0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64

the Series wraps both a sequence of values and a sequence of indices, which we can
access with the values and index attributes. The values are simply a familiar
NumPy array:
In[3]: data.values
Out[3]: array([ 0.25, 0.5 , 0.75, 1. ])

The index is an array-like object of type pd.Index

In[4]: data.index
Out[4]: RangeIndex(start=0, stop=4, step=1)

Data can be accessed by the associated index via the familiar Python square-bracket notation:
In[5]: data[1]
Out[5]: 0.5
In[6]: data[1:3]
Out[6]: 1 0.50
2 0.75
dtype: float64

The Pandas Series is much more general and flexible than the one-dimensional
NumPy array that it emulates.
Series as generalized NumPy array
Series object is basically inter‐changeable with a one-dimensional NumPy array.
The essential difference is the presence of the index: while the NumPy array has an
implicitly defined integer index used to access the values, the Pandas Series has an
explicitly defined index associated with the values.

This explicit index definition gives the Series object additional capabilities.
For example, the index need not be an integer, but can consist of values of any desired
type. For example, if we wish, we can use strings as an index:
In[7]: data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data
Out[7]: a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

And the item access works as expected:

In[8]: data['b']
Out[8]: 0.5

We can even use noncontiguous or nonsequential indices:

In[9]: data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=[2,5,3, 7])
data
Out[9]: 2 0.25
5 0.50
3 0.75
7 1.00
dtype: float64
In[10]: data[5]
Out[10]: 0.5

Series as specialized dictionary

A Pandas Series is like a specialization of a Python dictionary.
A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a
Series is a structure that maps typed keys to a set of typed values.
We can make the Series-as-dictionary analogy even more clear by constructing a
Series object directly from a Python dictionary:
In[11]: population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population
Out[11]: California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64

By default, a Series will be created where the index is drawn from the sorted keys.
From here, typical dictionary-style item access can be performed:
In[12]: population['California']
Out[12]: 38332521

Series also supports array-style operations such as slicing:

In[13]: population['California':'Illinois']
Out[13]: California 38332521
Florida 19552860
Illinois 12882135
dtype: int64

Constructing Series objects

We’ve already seen a few ways of constructing a Pandas Series from scratch; all
of them are some version of the following:
>>> pd.Series(data, index=index)

where index is an optional argument, and data can be one of many entities.
For example, data can be a list or NumPy array, in which case index defaults to
an integer sequence:
In[14]: pd.Series([2, 4, 6])
Out[14]: 0 2
1 4
2 6
dtype: int64

Data can be a scalar, which is repeated to fill the specified index:

In[15]: pd.Series(5, index=[100, 200, 300])
Out[15]: 100 5
200 5
300 5
dtype: int64

data can be a dictionary, in which index defaults to the sorted dictionary keys:
In[16]: pd.Series({2:'a', 1:'b', 3:'c'})
Out[16]: 1 b
2 a
3 c
4 dtype:
object
the index can be explicitly set if a different result is preferred:
In[17]: pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])
Out[17]: 3 c
2 a dtype:
object

Notice that in this case, the Series is populated only with the explicitly identified
keys.

2. The Pandas DataFrame Object

The DataFrame can be thought of either as a generalization of a NumPy array, or as
a specialization of a Python dictionary.

DataFrame as a generalized NumPy array

If a Series is an analog of a one-dimensional array with flexible indices, a
DataFrame is an analog of a two-dimensional array with both flexible row indices
and flexible column names.
DataFrame as a sequence of aligned Series objects, “aligned” means they share the
same index.
a new Series listing the area of each of the five states discussed in the previous
section:
In[18]:
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
'Florida': 170312, 'Illinois': 149995}

area = pd.Series(area_dict)
area
Out[18]: California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
dtype: int64

Now that we have this along with the population Series from before, we can use a
dictionary to construct a single two-dimensional object containing this information:
In[19]: states = pd.DataFrame({'population': population,
'area': area})
states
Out[19]: area population
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193

Like the Series object, the DataFrame has an index attribute that gives access to
the index labels:
In[20]: states.index
Out[20]:
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

Additionally, the DataFrame has a columns attribute, which is an Index object

holding the column labels:
In[21]: states.columns
Out[21]: Index(['area', 'population'], dtype='object')

Thus the DataFrame can be thought of as a generalization of a two-dimensional

NumPy array, where both the rows and columns have a generalized index for access‐
ing the data.

DataFrame as specialized dictionary

A dictionary maps a key to a value, a DataFrame maps a column name to a Series
of column data. For example, asking for the 'area' attribute returns the Series
object containing the areas we saw earlier:
In[22]: states['area']
Out[22]: California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64

Constructing DataFrame objects

A Pandas DataFrame can be constructed in a variety of ways.

From a single Series object. A DataFrame is a collection of Series objects,

and a single-column DataFrame can be constructed from a single Series:
In[23]: pd.DataFrame(population, columns=['population'])
Out[23]: population
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193

From a list of dicts. Any list of dictionaries can be made into a DataFrame.
Use a simple list comprehension to create some data:
In[24]: data = [{'a': i, 'b': 2 * i}
for i in range(3)]
pd.DataFrame(data)
Out[24]: a b
0 0 0
1 1 2
2 2 4
Even if some keys in the dictionary are missing, Pandas will fill them in with NaN
(i.e., “not a number”) values:
In[25]: pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])
Out[25]: a b c
0 1.0 2 NaN
1 NaN 3 4.0

From a dictionary of Series objects. a DataFrame can be constructed from

a dictionary of Series objects as well:
In[26]: pd.DataFrame({'population': population,
'area': area})
Out[26]: area population
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193

From a two-dimensional NumPy array. Given a two-dimensional array of

data, we can create a DataFrame with any specified column and index names. If
omitted, an integer index will be used for each:
In[27]: pd.DataFrame(np.random.rand(3,

2), columns=['foo',

'bar'], index=['a',

'b', 'c'])

Out[27]: foo bar

a 0.865257 0.213169
b 0.442759 0.108267
c 0.047110 0.905718

From a NumPy structured array. A Pandas DataFrame operates much like

a structured array, and can be created directly from one:
In[28]: A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])
A
Out[28]: array([(0, 0.0), (0, 0.0), (0, 0.0)],
dtype=[('A', '<i8'), ('B', '<f8')])

In[29]: pd.DataFrame(A)
Out[29]: A B
0 0 0.0
1 0 0.0
2 0 0.0

3. The Pandas Index Object

Both the Series and DataFrame objects contain an explicit index that lets you reference
and modify data. This Index object is an interesting structure in itself, and it can be
thought of either as an immutable array or as an ordered set (technically a multiset, as
Index objects may contain repeated values). Those views have some interesting
consequences in the operations available on Index objects. As a simple example, let’s
construct an Index from a list of integers:
In[30]: ind = pd.Index([2, 3, 5, 7, 11])
ind
Out[30]: Int64Index([2, 3, 5, 7, 11], dtype='int64')

Index as immutable array

The Index object in many ways operates like an array. For example, we can use
standard Python indexing notation to retrieve values or slices:
In[31]: ind[1]
Out[31]: 3
In[32]: ind[::2]
Out[32]: Int64Index([2, 5, 11], dtype='int64')

Index objects also have many of the attributes familiar from NumPy arrays:
In[33]: print(ind.size, ind.shape, ind.ndim, ind.dtype)
5 (5,) 1 int64

One difference between Index objects and NumPy arrays is that indices are
immutable—that is, they cannot be modified via the normal means:
In[34]: ind[1] = 0
---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-34-40e631c82e8a> in <module>()
----> 1 ind[1] = 0

/Users/jakevdp/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py ...
1243
1244 def __setitem__(self, key, value):
-> 1245 raise TypeError("Index does not support mutable
operations") 1246
1247 def __getitem__(self, key):

TypeError: Index does not support mutable operations

This immutability makes it safer to share indices between multiple DataFrames and
arrays, without the potential for side effects from inadvertent index modification.

Index as ordered set

The Index object follows many of the conventions used by Python’s built-in set
data structure, so that unions, intersections, differences, and other combinations can be
computed in a familiar way:
In[35]: indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In[36]: indA & indB # intersection

Out[36]: Int64Index([3, 5, 7], dtype='int64')

In[37]: indA | indB # union

Out[37]: Int64Index([1, 2, 3, 5, 7, 9, 11],

dtype='int64')

In[38]: indA ^ indB # symmetric difference

Out[38]: Int64Index([1, 2, 9, 11], dtype='int64')

These operations may also be accessed via object methods—for example,

indA.intersection(indB)

Operating on Data in Pandas

One of the essential pieces of NumPy is the ability to perform quick element-wise
operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and
with more sophisticated operations (trigonometric functions, exponential and loga‐
rithmic functions, etc.). Pandas inherits much of this functionality from NumPy, and
the ufuncs Universal Functions.
We see that there are well-defined operations between one-dimensional Series
structures and two-dimensional DataFrame structures.

Ufuncs: Index Preservation

Because Pandas is designed to work with NumPy, any NumPy ufunc will work on
Pandas Series and DataFrame objects. Let’s start by defining a simple Series
and DataFrame on which to demonstrate this:
In[1]: import pandas as pd
import numpy as np
In[2]: rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
ser
Out[2]: 0 6
1 3
2 7
3 4
dtype: int64

In[3]: df = pd.DataFrame(rng.randint(0, 10, (3, 4)),

columns=['A', 'B', 'C', 'D'])
df
Out[3]: A B C D
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4

If we apply a NumPy ufunc on either of these objects, the result will be another Pan‐
das object with the indices preserved:
In[4]: np.exp(ser)

Out[4]: 0 403.428793
1 20.085537
2 1096.633158
3 54.598150
dtype: float64

Or, for a slightly more complex calculation:

In[5]: np.sin(df * np.pi / 4)
Out[5]: A B C D
0 -1.000000 7.071068e-01 1.000000 -1.000000e+00
1 -0.707107 1.224647e-16 0.707107 -7.071068e-01
2 -0.707107 1.000000e+00 -0.707107 1.224647e-16

UFuncs: Index Alignment

For binary operations on two Series or DataFrame objects, Pandas will align
indices in the process of performing the operation. This is very convenient when you
are working with incomplete data

Index alignment in Series

As an example, suppose we are combining two different data sources, and find only
the top three US states by area and the top three US states by population:
In[6]: area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
'New York': 19651127}, name='population')

Let’s see what happens when we divide these to compute the population density:
In[7]: population / area
Out[7]: Alaska NaN
California 90.413926
New York NaN
Texas 38.018740
dtype: float64
The resulting array contains the union of indices of the two input arrays, which we
could determine using standard Python set arithmetic on these indices:
In[8]: area.index | population.index
Out[8]: Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

Any item for which one or the other does not have an entry is marked with NaN, or “Not
a Number. This index matching is implemented this way for any of Python’s built-in
arithmetic expressions; any missing val‐ ues are filled in with NaN by default:
In[9]: A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B
Out[9]: 0 NaN
1 5.0
2 9.0
3 NaN
dtype: float64

If using NaN values is not the desired behavior, we can modify the fill value using
appropriate object methods in place of the operators. For example, calling A.add(B)
is equivalent to calling A + B, but allows optional explicit specification of the fill
value for any elements in A or B that might be missing:
In[10]: A.add(B, fill_value=0)
Out[10]: 0 2.0
1 5.0
2 9.0
3 5.0
dtype: float64

Index alignment in DataFrame

A similar type of alignment takes place for both columns and indices when you are
performing operations on DataFrames:
In[11]: A = pd.DataFrame(rng.randint(0, 20, (2,
2)), columns=list('AB'))
A
Out[11]: A B
0 1 11
1 5 1
In[12]: B = pd.DataFrame(rng.randint(0, 10, (3,
3)), columns=list('BAC'))
B
Out[12]: B A C
0 4 0 9
1 5 8 0
2 9 2 6

A
B C
0 1.0 15.0 NaN
1 13.0 6.0 NaN
2 NaN NaN NaN

Notice that indices are aligned correctly irrespective of their order in the two objects,
and indices in the result are sorted. As was the case with Series, we can use the asso‐
ciated object’s arithmetic method and pass any desired fill_value to be used in
place of missing entries. Here we’ll fill with the mean of all values in A (which we
compute by first stacking the rows of A):
In[14]: fill = A.stack().mean()
A.add(B, fill_value=fill)
Out[14]: A B C
0 1.0 15.0 13.5
1 13.0 6.0 4.5
2 6.5 13.5 10.5

Table 3-1 lists Python operators and their equivalent Pandas object methods.

Table 3-1. Mapping between Python operators and Pandas methods

Python operator Pandas method(s)

+ add()
- sub(), subtract()
* mul(), multiply()
/ truediv(), div(), divide()
// floordiv()
% mod()
**pow()

Ufuncs: Operations Between DataFrame and Series

According to NumPy’s broadcasting rules, subtraction between a two-dimensional

array and one of its rows is applied row-wise.
In Pandas, the convention similarly operates row-wise by default:
In[17]: df = pd.DataFrame(A, columns=list('QRST'))
df - df.iloc[0]
Out[17]: Q R S T
0 0 0 0 0
1 -1 -2 2 4
2 3 -7 1 4

If you would instead like to operate column-wise, you can use the object methods
mentioned earlier, while specifying the axis keyword:
In[18]: df.subtract(df['R'], axis=0)
Out[18]: Q R S T
0 -5 0 -6 -4
1 -4 0 -2 2
2 5 0 2 7

Note that these DataFrame/Series operations, like the operations discussed

before, will automatically align indices between the two elements:
In[19]: halfrow = df.iloc[0, ::2]
halfrow
Out[19]: Q 3
S 2
Name: 0, dtype: int64
In[20]: df - halfrow
Out[20]: Q R S T
0 0.0 NaN 0.0 NaN
1 -1.0 NaN 2.0 NaN
2 3.0 NaN 1.0 NaN

This preservation and alignment of indices and columns means that operations on data
in Pandas will always maintain the data context, which prevents the types of silly errors
that might come up when you are working with heterogeneous and/or mis‐ aligned
data in raw NumPy arrays.
Data Indexing and Selection
Learning for accessing and modifying values in Pandas Series and DataFrame objects.

Data Selection in Series

A Series object acts in many ways like a one-dimensional NumPy array, and in
many ways like a standard Python dictionary.
This help us to understand the patterns of data indexing and selection in these arrays.

Series as dictionary
Like a dictionary, the Series object provides a mapping from a collection of keys to
a collection of values:
In[1]: import pandas as pd
data = pd.Series([0.25, 0.5,
0.75, 1.0],
index=['a', 'b', 'c',
'd'])
data
Out[1]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
In[2]: data['b']
Out[2]: 0.5
In[3]: 'a' in data
Out[3]: True
In[4]: data.keys()
Out[4]: Index(['a', 'b', 'c', 'd'], dtype='object')
In[5]: list(data.items())
Out[5]: [('a', 0.25), ('b', 0.5), ('c', 0.75), ('d',
1.0)]
A Series can be assigned to a new index value:
In[6]: data['e'] = 1.25
data
Out[6]:
a 0.25
b 0.50
c 0.75
d 1.00
e 1.25
dtype: float64

Series as one-dimensional array

A Series builds slices, masking, and fancy indexing.
In[7]: # slicing by explicit index
data['a':'c']
Out[7]:
a 0.25
b 0.50
c 0.75
dtype: float64
In[8]: # slicing by implicit integer index
data[0:2]
Out[8]:
a 0.25
b 0.50
dtype: float64
In[9]: # masking
data[(data > 0.3) & (data < 0.8)]
Out[9]:
b 0.50
c 0.75
dtype: float64
In[10]: # fancy
indexing
data[['a', 'e']]
Out[10]:
a 0.25
e 1.25
dtype: float64

Indexers: loc, iloc, and ix

These slicing and indexing conventions can be a source of confusion. For example, if
your Series has an explicit integer index, an indexing operation such as data[1]
will use the explicit indices, while a slicing operation like data[1:3] will use the
implicit Python-style index.
In[11]: data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])
data
Out[11]: 1 a
3 b
5 c dtype:
object

In[12]: # explicit index when indexing

data[1]
Out[12]: 'a'
In[13]: # implicit index when slicing
data[1:3]
Out[13]: 3 b
5 c dtype:
object

Because of this potential confusion in the case of integer indexes, Pandas provides some
special indexer attributes that explicitly expose certain indexing schemes. These
are not functional methods, but attributes that expose a particular slicing interface to
the data in the Series.
First, the loc attribute allows indexing and slicing that always references the
explicit index:
In[14]: data.loc[1]
Out[14]: 'a'
In[15]: data.loc[1:3]
Out[15]: 1 a
3 b dtype:
object

The iloc attribute allows indexing and slicing that always references the implicit
Python-style index:
In[16]: data.iloc[1]
Out[16]: 'b'
In[17]: data.iloc[1:3]
Out[17]: 3 b
5 c dtype:
object

A third indexing attribute, ix, is a hybrid of the two, and for Series objects is equiva‐
lent to standard []-based indexing. The purpose of the ix indexer will become more
apparent in the context of DataFrame objects, which we will discuss in a moment.
One guiding principle of Python code is that “explicit is better than implicit.” The
explicit nature of loc and iloc make them very useful in maintaining clean and
readable code; especially in the case of integer indexes, I recommend using these both
to make code easier to read and understand, and to prevent subtle bugs due to the mixed
indexing/slicing convention.

Data Selection in DataFrame

A DataFrame acts a two-dimensional or structured array, like a dictionary of Series
structures sharing the same index.

DataFrame as a dictionary
Let’s return to our example of areas and populations of states:
In[18]: area = pd.Series({'California': 423967, 'Texas': 695662,
'New York': 141297, 'Florida': 170312,
'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
'New York': 19651127, 'Florida': 19552860,
'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data
Out[18]: area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135
New York 141297 19651127
Texas 695662 26448193

# dictionary-style indexing of the column name:

In[19]: data['area']
Out[19]: California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64

# attribute-style access with column names that are strings:

In[20]: data.area
Out[20]: California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64

attribute-style accesses the exact same object as the dictionary-style access:

In[21]: data.area is data['area']
Out[21]: True

it does not work for all cases! For example, if the column names are not strings, or if
the column names conflict with methods of the DataFrame, this attribute-style access
is not possible. For example, the DataFrame has a pop() method, so data.pop
will point to this rather than the "pop" column:
In[22]: data.pop is data['pop']
Out[22]: False
In particular, you should avoid the temptation to try column assignment via attribute
(i.e., use data['pop'] = z rather than data.pop = z).
Like with the Series objects discussed earlier, this dictionary-style syntax can also
be used to modify the object, in this case to add a new column:
In[23]: data['density'] = data['pop'] /
data['area'] data
Out[23]: area pop density
California 423967 38332521 90.413926
Florida 170312 19552860 114.806121
Illinois 149995 12882135 85.883763
New York 141297 19651127 139.076746
Texas 695662 26448193 38.018740

DataFrame as two-dimensional array

The DataFrame represented as an enhanced two-dimensional array. We can examine
data array using the values attribute:
In[24]: data.values
Out[24]: array([[ 4.23967000e+05, 3.83325210e+07, 9.04139261e+01],
[ 1.70312000e+05, 1.95528600e+07, 1.14806121e+02],
[ 1.49995000e+05, 1.28821350e+07, 8.58837628e+01],
[ 1.41297000e+05, 1.96511270e+07, 1.39076746e+02],
[ 6.95662000e+05, 2.64481930e+07, 3.80187404e+01]])

we can transpose the full DataFrame to swap rows and columns:

In[25]: data.T
Out[25]:
California Florida Illinois New York Texas
area 4.239670e+05 1.703120e+05 1.499950e+05 1.412970e+05 6.956620e+05
pop 3.833252e+07 1.955286e+07 1.288214e+07 1.965113e+07 2.644819e+07
density 9.041393e+01 1.148061e+02 8.588376e+01 1.390767e+02 3.801874e+01

Passing a single index to an array accesses a row:

In[26]: data.values[0]
Out[26]: array([ 4.23967000e+05, 3.83325210e+07,
9.04139261e+01])
and passing a single “index” to a DataFrame accesses a column:
In[27]: data['area']
Out[27]: California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64

Here Pandas again uses the loc, iloc, and ix indexers mentioned earlier. Using
the iloc indexer, we can index the underlying array as if it is a simple NumPy array
(using the implicit Python-style index), but the DataFrame index and column labels
are maintained in the result:
In[28]: data.iloc[:3, :2]
Out[28]: area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135

In[29]: data.loc[:'Illinois', :'pop']

Out[29]: area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135

The ix indexer allows a hybrid of these two approaches:

In[30]: data.ix[:3, :'pop']
Out[30]: area pop
California 423967 38332521
Florida 170312 19552860
Illinois 149995 12882135

Keep in mind that for integer indices, the ix indexer is subject to the same potential
sources of confusion as discussed for integer-indexed Series objects.
Any of the familiar NumPy-style data access patterns can be used within these index‐
ers. For example, in the loc indexer we can combine masking and fancy
indexing as in the following:
In[31]: data.loc[data.density > 100, ['pop', 'density']]
Out[31]: pop density
Florida 19552860 114.806121
New York 19651127 139.076746

Any of these indexing conventions may also be used to set or modify values; this is
done in the standard way that you might be accustomed to from working with NumPy:
In[32]: data.iloc[0, 2] = 90
data
Out[32]: area pop density
California 423967 38332521 90.000000
Florida 170312 19552860 114.806121
Illinois 149995 12882135 85.883763
New York 141297 19651127 139.076746
Texas 695662 26448193 38.018740

To build up your fluency in Pandas data manipulation, I suggest spending some time
with a simple DataFrame and exploring the types of indexing, slicing, masking, and
fancy indexing that are allowed by these various indexing approaches.

Additional indexing conventions

indexing refers to columns, slicing refers to rows:
In[33]: data['Florida':'Illinois']
Out[33]: area pop density
Florida 170312 19552860 114.806121
Illinois 149995 12882135 85.883763

Such slices can also refer to rows by number rather than by index:
In[34]: data[1:3]
Out[34]: area pop density
Florida 170312 19552860 114.806121
Illinois 149995 12882135 85.883763

Similarly, direct masking operations are also interpreted row-wise rather than column-
wise:
In[35]: data[data.density > 100]
Out[35]: area pop density
Florida 170312 19552860 114.806121
New York 141297 19651127 139.076746
One of the essential pieces of NumPy is the ability to perform quick element-wise
operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and
with more sophisticated operations (trigonometric functions, exponential and loga‐
rithmic functions, etc.). Pandas inherits much of this functionality from NumPy, and
the ufuncs.

Ufuncs: Index Preservation

ufunc will work on Pandas Series and DataFrame objects. Let’s start by defining
a simple Series and DataFrame on which to demonstrate this:
In[1]: import pandas as pd
import numpy as np
In[2]: rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
ser
Out[2]: 0 6
4 3
5 7
6 4
dtype: int64

In[3]: df = pd.DataFrame(rng.randint(0, 10, (3, 4)),

columns=['A', 'B', 'C', 'D'])
df
Out[3]: A B C D
0 6 9 2 6
1 7 4 3 7
2 7 2 5 4

If we apply a NumPy ufunc on either of these objects, the result will be another Pan‐
das object with the indices preserved:
In[4]: np.exp(ser)
Out[4]: 0 403.428793
4 20.085537
5 1096.633158
6 54.598150
dtype: float64

Or, for a slightly more complex calculation:

In[5]: np.sin(df * np.pi / 4)
Out[5]: A B C D
0 -1.000000 7.071068e-01 1.000000 -1.000000e+00
1 -0.707107 1.224647e-16 0.707107 -7.071068e-01
2 -0.707107 1.000000e+00 -0.707107 1.224647e-16

Any of the ufuncs discussed in “Computation on NumPy Arrays: Universal Func‐

tions” on page 50 can be used in a similar manner.

UFuncs: Index Alignment

For binary operations on two Series or DataFrame objects, Pandas will align
indices in the process of performing the operation. This is very convenient when you
are working with incomplete data, as we’ll see in some of the examples that follow.

Index alignment in Series

Let’s see what happens when we divide these to compute the population density:
In[7]: population / area
Out[7]: Alaska NaN
California 90.413926
New York NaN
Texas 38.018740
dtype: float64

The resulting array contains the union of indices of the two input arrays, which we
could determine using standard Python set arithmetic on these indices:
In[8]: area.index | population.index
Out[8]: Index(['Alaska', 'California', 'New York', 'Texas'], dtype='object')

Any item for which one or the other does not have an entry is marked with NaN, or “Not a
Number,” which is how Pandas marks missing data (see further discussion of missing data
in “Handling Missing Data” on page 119). This index matching is imple‐
mented this way for any of Python’s built-in arithmetic expressions; any missing val‐
ues are filled in with NaN by default:
In[9]: A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
A + B
Out[9]: 0 NaN
4 5.0
5 9.0
6 NaN
dtype: float64

Index alignment in DataFrame

In[13]: A +
B Out[13]: A B C
3 1.0 15.0 NaN
4 13.0 6.0 NaN
5 NaN NaN NaN
Notice that indices are aligned correctly irrespective of their order in the two objects,
and indices in the result are sorted. As was the case with Series, we can use the asso‐
ciated object’s arithmetic method and pass any desired fill_value to be used in
place of missing entries. Here we’ll fill with the mean of all values in A (which we
compute by first stacking the rows of A):
In[14]: fill = A.stack().mean()
A.add(B, fill_value=fill)
Out[14]: A B C
0 1.0 15.0 13.5
1 13.0 6.0 4.5
2 6.5 13.5 10.5

Table 3-1 lists Python operators and their equivalent Pandas object methods.

Table 3-1. Mapping between Python operators and Pandas methods

Python operator Pandas method(s)

+ add()
- sub(), subtract()
* mul(), multiply()
/ truediv(), div(), divide()
// floordiv()
% mod()
** pow()

Ufuncs: Operations Between DataFrame and Series

When you are performing operations between a DataFrame and a Series, the index
and column alignment is similarly maintained. Operations between a DataFrame and
a Series are similar to operations between a two-dimensional and one-dimensional
NumPy array. Consider one common operation, where we find the difference of a two-
dimensional array and one of its rows:
In[15]: A = rng.randint(10, size=(3, 4))
A
Out[15]: array([[3, 8, 2, 4],
[2, 6, 4, 8],
[6, 1, 3, 8]])
In[16]: A - A[0]
Out[16]: array([[ 0, 0, 0, 0],
[-1, -2, 2, 4],
[ 3, -7, 1, 4]])
According to NumPy’s broadcasting rules (see “Computation on Arrays: Broadcast‐
ing” on page 63), subtraction between a two-dimensional array and one of its rows is
applied row-wise.
In Pandas, the convention similarly operates row-wise by default:
In[17]: df = pd.DataFrame(A, columns=list('QRST'))
df - df.iloc[0]
Out[17]: Q R S T
0 0 0 0 0
1 -1 -2 2 4
2 3 -7 1 4

Note that these DataFrame/Series operations, like the operations discussed

before, will automatically align indices between the two elements:
In[19]: halfrow = df.iloc[0, ::2]
halfrow
Out[19]: Q 3
S 2
Name: 0, dtype: int64
In[20]: df - halfrow
Out[20]: Q R S T
3 0.0 NaN 0.0 NaN
4 -1.0 NaN 2.0 NaN
5 3.0 NaN 1.0 NaN

Handling Missing Data

Real-world data is rarely clean and homogeneous.
Many interesting datasets will have some amount of data missing.
Different data sources may indicate missing data in different ways.
Missing data in general represented as null, NaN, or NA values in Pandas.

Trade-Offs in Missing Data Conventions

Two strategies:
using a 1. mask that globally indicates missing values, or
the mask might be an entirely separate Boolean array, or it may involve
appropriation of one bit in the data representation to locally indicate the null
status of a value.

Use of a separate mask array requires allocation of an additional Boolean

array, which adds overhead in both storage and computation.

choosing a 2. sentinel value that indicates a missing entry.

the sentinel value could be some data-specific convention, such as indicating
a missing integer value with –9999 or some rare bit pattern, or it could be a
more global convention, such as indicating a missing floating-point value with
NaN (Not a Number)

A sentinel value reduces the range of valid values that can be represented, and
may require extra (often non-optimized) logic in CPU and GPU arithmetic.
Common special values like NaN are not available for all data types.

Missing Data in Pandas

Pandas use sentinels for missing data, and further chose to use two already-existing
Python null values: the special floating-point NaN value, and the Python None
object.

None: Pythonic missing data

The first sentinel value used by Pandas is None, a Python singleton object that is
often used for missing data in Python code. Because None is a Python object, it
cannot be used in any arbitrary NumPy/Pandas array, but only in arrays with data type
'object' (i.e., arrays of Python objects):

In[1]: import numpy as np

import pandas as pd
In[2]: vals1 = np.array([1, None, 3, 4])
vals1
Out[2]: array([1, None, 3, 4], dtype=object)

This dtype=object means that the best common type representation NumPy could
infer for the contents of the array is that they are Python objects.
NaN: Missing numerical data
The other missing data representation, NaN (acronym for Not a Number), is different;
it is a special floating-point value recognized by all systems that use the standard IEEE
floating-point representation:
In[5]: vals2 = np.array([1, np.nan, 3, 4])
vals2.dtype
Out[5]: dtype('float64')

Regardless of the operation, the result of arithmetic with NaN will be another NaN:
In[6]: 1 + np.nan
Out[6]: nan
In[7]: 0 * np.nan
Out[7]: nan

Aggregates over the values are well defined (i.e., they don’t result in an error) but not
always useful:
In[8]: vals2.sum(), vals2.min(), vals2.max()
Out[8]: (nan, nan, nan)
NumPy does provide some special aggregations that will ignore these missing values.

In[9]: np.nansum(vals2), np.nanmin(vals2), np.nanmax(vals2)

Out[9]: (8.0, 1.0, 4.0)
Keep in mind that NaN is specifically a floating-point value; there is no equivalent
NaN value for integers, strings, or other types.

NaN and None in Pandas

NaN and None both have their place, and Pandas is built to handle the two of them
nearly interchangeably, converting between them where appropriate:
In[10]: pd.Series([1, np.nan, 2, None])
Out[10]: 0 1.0
1 NaN
2 2.0
3 NaN
dtype: float64

Pandas automatically type-casts when NA values are present. For example, if we set a
value in an integer array to np.nan, it will automatically be upcast to a floating-point
type to accommodate the NA:
In[11]: x = pd.Series(range(2), dtype=int)
x
Out[11]: 0 0
1 1
dtype: int64

In[12]: x[0] = None

x
Out[12]: 0 NaN
1 1.0
dtype: float64
Notice that in addition to casting the integer array to floating point, Pandas
automatically converts the None to a NaN value.

Table 3-2. Pandas handling of NAs by type

Typeclass Conversion when storing NAs NA sentinel value
floating No change np.nan
object No change None or np.nan
integer Cast to float64 np.nan
boolean Cast to object None or np.nan

Keep in mind that in Pandas, string data is always stored with an object dtype.

Operating on Null Values

Pandas treats None and NaN as essentially interchangeable for indicating missing or
null values. To facilitate this convention, there are several useful methods for
detecting, removing, and replacing null values in Pandas data structures. They are:
isnull()
Generate a Boolean mask indicating missing values
notnull()
Opposite of isnull()
dropna()
Return a filtered version of the data
fillna()
Return a copy of the data with missing values filled or imputed

Detecting null values

Pandas data structures have two useful methods for detecting null data: isnull()
and notnull(). Either one will return a Boolean mask over the data. For example:
In[13]: data = pd.Series([1, np.nan, 'hello', None])
In[14]: data.isnull()
Out[14]: 0 False
1 True
2 False
3 True
dtype: bool

In[15]: data[data.notnull()]
Out[15]: 0 1
2 hello
dtype: object

The isnull() and notnull() methods produce similar Boolean results for
Data Frames.
Dropping null values
dropna() (which removes NA values) and fillna() (which fills in NA values).
For a Series, the result is straightforward:
In[16]: data.dropna()
Out[16]: 0 1
2 hello
dtype: object

For a DataFrame, there are more options. Consider the following DataFrame:
In[17]: df = pd.DataFrame([[1, np.nan, 2],
[2, 3, 5],
[np.nan, 4, 6]])
df
Out[17]: 0 1 2
0 1.0 NaN 2
1 2.0 3.0 5
2 NaN 4.0 6

We cannot drop single values from a DataFrame; we can only drop full rows or full
columns.
dropna() will drop all rows in which any null value is present:
In[18]: df.dropna()
Out[18]: 0 1 2
1 2.0 3.0 5

drop NA values along a different axis; axis=1 drops all columns containing a null
value:
In[19]: df.dropna(axis='columns')
Out[19]: 2
0 2
1 5
2 6
But this drops some good data as well; you might rather be interested in dropping rows
or columns with all NA values, or a majority of NA values.
how or thresh parameters, allow fine control of the number of nulls to allow.
how='any', allows row or column (depending on the axis key word) containing a
null value will be dropped.
how='all', which will only drop rows/columns that are all null values:
In[20]: df[3] = np.nan
df
Out[20]: 0 1 2 3
0 1.0 NaN 2 NaN
1 2.0 3.0 5 NaN
2 NaN 4.0 6 NaN
In[21]: df.dropna(axis='columns', how='all')
Out[21]: 0 1 2
0 1.0 NaN 2
1 2.0 3.0 5
2 NaN 4.0 6

thresh parameter specify a minimum number of non-null values for the row/column
to be kept:
In[22]: df.dropna(axis='rows', thresh=3)
Out[22]: 0 1 2 3
1 2.0 3.0 5 NaN

Here the first and last row have been dropped, because they contain only two non-null
values.

Filling null values

Sometimes rather than dropping NA values, replace them with a valid value. This value
might be a single number like zero, or it might be some sort of imputation or
interpolation from the good values.
isnull() method as a mask, but because it is such a common operation Pandas
provides the fillna() method, which returns a copy of the array with the null values
replaced.
Consider the following Series:
In[23]: data = pd.Series([1, np.nan, 2, None, 3],
index=list('abcde')) data
Out[23]: a 1.0
b NaN
c 2.0
d NaN
e 3.0
dtype: float64

fill NA entries with a single value, such as zero:

In[24]: data.fillna(0)
Out[24]: a 1.0
b 0.0
c 2.0
d 0.0
e 3.0
dtype: float64

We can specify a forward-fill to propagate the previous value forward:

In[25]: # forward-fill
data.fillna(method='ffill')
Out[25]: a 1.0
b 1.0
c 2.0
d 2.0
e 3.0
dtype: float64

Or we can specify a back-fill to propagate the next values backward:

In[26]: # back-fill
data.fillna(method='bfill')
Out[26]: a 1.0
b 2.0
c 2.0
d 3.0
e 3.0
dtype: float64

For DataFrames, the options are similar, but we can also specify an axis along
which the fills take place:
In[27]: df
Out[27]: 0 1 2 3
0 1.0 NaN 2 NaN
1 2.0 3.0 5 NaN
2 NaN 4.0 6 NaN
In[28]: df.fillna(method='ffill', axis=1)
Out[28]: 0 1 2 3
0 1.0 1.0 2.0 2.0
1 2.0 3.0 5.0 5.0
2 NaN 4.0 6.0 6.0

Notice that if a previous value is not available during a forward fill, the NA
value remains.
Hierarchical Indexing
Create a pandas.Series object with a MultiIndex. Let's break it down:
1. Data: Randomly generated numbers using np.random.uniform(size=9),
resulting in 9 floating-point values.
2. Index: A MultiIndex with two levels:
o The first level has the values: ["a", "a", "a", "b", "b", "c", "c", "d",
"d"].
o The second level has the values: [1, 2, 3, 1, 3, 1, 2, 2, 3].

data =
pd.Series(np.random.uniform(size=9),index=[["a",
"a", "a", "b", "b", "c", "c", "d", "d"],
[1, 2, 3, 1, 3, 1, 2, 2, 3]])
a 1 0.412178
2 0.158417
3 0.111628
b 1 0.394864
3 0.001674
c 1 0.498256
2 0.936531
d 2 0.096640
3 0.746740
dtype: float64

data.index
MultiIndex([('a', 1),
('a', 2),
('a', 3),
('b', 1),
('b', 3),
('c', 1),
('c', 2),
('d', 2),
('d', 3)],
)
A hierarchically indexed object (partial indexing) is used to select subsets of the
data:

data["b"]
1 0.963700
3 0.042998
dtype: float64
data["b":"c"]
b 1 0.963700
3 0.042998
c 1 0.111421
2 0.131365
dtype: float64

data.loc[["b", "d"]]
b 1 0.963700
3 0.042998
d 2 0.943148
3 0.744674
dtype: float64

Selection is even possible from an “inner” level

select all of the values having the value 2 from the second index level:

data.loc[:, 2]
a 0.713453
c 0.131365
d 0.943148
dtype: float64

Hierarchical indexing plays an important role in reshaping data and in group-

based operations like forming a pivot table.

Rearrange this data into a DataFrame using its unstack method:

data.unstack()

The inverse operation of unstack is stack:

data.unstack().stack()
With a DataFrame, either axis can have a hierarchical index:
frame = pd.DataFrame(np.arange(12).reshape((4, 3)),
index=[["a", "a", "b", "b"], [1, 2, 1, 2]],
columns=[["Ohio", "Ohio", "Colorado"],
["Green", "Red", "Green"]])
Frame

The hierarchical levels can have names as strings or any Python objects

frame.index.names = ["key1", "key2"]

frame

frame.columns.names = ["state", "color"]

frame
how many levels an index has by accessing its nlevels attribute:

frame.index.nlevels
2
With partial column indexing you can similarly select groups of columns:

frame["Ohio"]

A MultiIndex can be created by itself and then reused

pd.MultiIndex.from_arrays([["Ohio", "Ohio", "Colorado"],

["Green", "Red", "Green"]],
names=["state", "color"])

Reordering and Sorting Levels

frame.swaplevel("key1", "key2")
sort_index by default sorts the data lexicographically using all the index levels,
but you can choose to use only a single level or a subset of levels to sort by passing
the level argument.
frame.sort_index(level=1)

frame.swaplevel(0, 1).sort_index(level=0)

Summary Statistics by Level

Many descriptive and summary statistics on DataFrame and Series have a level
option in which we can specify the level to aggregate by on a particular axis.
frame.groupby(level="key2").sum()

frame.groupby(level="color", axis="columns").sum()
Indexing with a DataFrame’s columns
frame = pd.DataFrame({"a": range(7), "b": range(7, 0, -1),
"c": ["one", "one", "one", "two", "two",
"two", "two"],
"d": [0, 1, 2, 0, 1, 2, 3]})

DataFrame’s set_index function will create a new DataFrame using one or more of
its columns as the index:

frame2 = frame.set_index(["c", "d"])

frame2
By default, the columns are removed from the DataFrame, though you can leave
them in by passing drop=False to set_index:

frame.set_index(["c", "d"], drop=False)

reset_index, on the other hand, does the opposite of set_index; the hierarchical index
levels are moved into the columns:
frame2.reset_index()

Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
138 pages
Solution Manual For Data Abstraction & Problem Solving With C++: Walls and Mirrors, 6/E, Frank M. Carrano Timothy Henrypdf Download
100% (2)
Solution Manual For Data Abstraction & Problem Solving With C++: Walls and Mirrors, 6/E, Frank M. Carrano Timothy Henrypdf Download
52 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
MATLAB Programming Fundamentals-MathWorks (2023)
No ratings yet
MATLAB Programming Fundamentals-MathWorks (2023)
1,602 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
B.Tech - R23 - II - Year - AIML - Syllabus - FINAL
No ratings yet
B.Tech - R23 - II - Year - AIML - Syllabus - FINAL
44 pages
Class 12 IP Ch-1, 2 3
No ratings yet
Class 12 IP Ch-1, 2 3
28 pages
Pandas
No ratings yet
Pandas
57 pages
Pandas
No ratings yet
Pandas
163 pages
Data Structure Book by Yandyesh
No ratings yet
Data Structure Book by Yandyesh
159 pages
12 IP Questions
No ratings yet
12 IP Questions
181 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
Unit 1: Introduction To Python
100% (1)
Unit 1: Introduction To Python
184 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
PYTHON Unit 2
No ratings yet
PYTHON Unit 2
72 pages
Python Ques
No ratings yet
Python Ques
73 pages
Xii CS - 2024-25 - Practical File
No ratings yet
Xii CS - 2024-25 - Practical File
77 pages
Phyton 3
No ratings yet
Phyton 3
81 pages
Pandas Fundamentals
No ratings yet
Pandas Fundamentals
90 pages
SR Ip Pandas I Full Notes
No ratings yet
SR Ip Pandas I Full Notes
30 pages
Notes - EDA-Unit2
No ratings yet
Notes - EDA-Unit2
43 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
DV
No ratings yet
DV
53 pages
Leip 102
No ratings yet
Leip 102
36 pages
Unit 2
No ratings yet
Unit 2
81 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Lecture Intro Python
No ratings yet
Lecture Intro Python
26 pages
Pandas
No ratings yet
Pandas
82 pages
Pandas
No ratings yet
Pandas
36 pages
Introduction To Databases Part 1
No ratings yet
Introduction To Databases Part 1
78 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
09 - Pandas Slides
No ratings yet
09 - Pandas Slides
33 pages
Python NOTES Unit-4
No ratings yet
Python NOTES Unit-4
163 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Python Pandas
100% (1)
Python Pandas
35 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
9 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
2 - Data Types in Python
No ratings yet
2 - Data Types in Python
44 pages
Python Pandas - Series Notes
No ratings yet
Python Pandas - Series Notes
13 pages
Python Pandas-Series-neww
100% (1)
Python Pandas-Series-neww
80 pages
Leetcode 75 Blind - Part 2
No ratings yet
Leetcode 75 Blind - Part 2
8 pages
Unit II Notes Revision
No ratings yet
Unit II Notes Revision
20 pages
Chapter 1 and 2 Series and Data Frame
No ratings yet
Chapter 1 and 2 Series and Data Frame
45 pages
Dedupe Documentation: Release 2.0.0
No ratings yet
Dedupe Documentation: Release 2.0.0
60 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
PPL Experiment No-8
No ratings yet
PPL Experiment No-8
7 pages
Introduction To Pythons Syntax
No ratings yet
Introduction To Pythons Syntax
13 pages
Bit Bank DS
No ratings yet
Bit Bank DS
14 pages
Pandas
No ratings yet
Pandas
21 pages
Java Collection Fraemework
No ratings yet
Java Collection Fraemework
18 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Comp 212
No ratings yet
Comp 212
9 pages
Pandas
No ratings yet
Pandas
20 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Apache Hive Tutorial
No ratings yet
Apache Hive Tutorial
139 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
PYTHON UNIT-5 Part-C
No ratings yet
PYTHON UNIT-5 Part-C
4 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
2.2 Data Indexing and Selection
No ratings yet
2.2 Data Indexing and Selection
8 pages
Python Overview CSC 201 (NOTE 4) - 1
No ratings yet
Python Overview CSC 201 (NOTE 4) - 1
16 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Ai Unit1 (List, Tuple, Set, Dictionary) PDF
No ratings yet
Ai Unit1 (List, Tuple, Set, Dictionary) PDF
15 pages
Hadoop Exams
No ratings yet
Hadoop Exams
14 pages
CSF211 Data Structures and Algorithms II Sem 2024-25 Handout 2
No ratings yet
CSF211 Data Structures and Algorithms II Sem 2024-25 Handout 2
4 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
DIP Lab Manual No 01
No ratings yet
DIP Lab Manual No 01
20 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
Dictionary Paper Python
No ratings yet
Dictionary Paper Python
1 page
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Class12 Pandas Notes
No ratings yet
Class12 Pandas Notes
23 pages
Streaming Algorithms
No ratings yet
Streaming Algorithms
73 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Pandas Notoes For XII PDF
No ratings yet
Pandas Notoes For XII PDF
12 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet