0% found this document useful (0 votes)
91 views18 pages

AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory

The document discusses NumPy arrays and broadcasting in Python. It introduces how to create 1D, 2D and 3D NumPy arrays. It shows various ways of initializing arrays using NumPy functions like ones, zeros, empty, full, arange and linspace. It demonstrates inspecting array attributes like shape, size, itemsize and nbytes. It explains broadcasting rules that allow arithmetic operations between arrays of different dimensions. In particular, it shows how broadcasting pads arrays with fewer dimensions and stretches arrays of shape 1 to match other array shapes.

Uploaded by

palaniappan.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views18 pages

AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory

The document discusses NumPy arrays and broadcasting in Python. It introduces how to create 1D, 2D and 3D NumPy arrays. It shows various ways of initializing arrays using NumPy functions like ones, zeros, empty, full, arange and linspace. It demonstrates inspecting array attributes like shape, size, itemsize and nbytes. It explains broadcasting rules that allow arithmetic operations between arrays of different dimensions. In particular, it shows how broadcasting pads arrays with fewer dimensions and stretches arrays of shape 1 to match other array shapes.

Uploaded by

palaniappan.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.

ipynb - Colaboratory

Python Basic operations

from IPython.core.interactiveshell import InteractiveShell


InteractiveShell.ast_node_interactivity = "all"

Numpy Array Basics

importing numpy and creating different types of numpy arrays

# importing numpy
import numpy as np

# Defining 1D array
my1DArray = np.array([1, 8, 27, 64])
print(my1DArray)

"""
# Defining and printing 2D array
my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

#Defining and printing 3D array


my3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1, 2, 3, 4],[ 9, 10, 11, 1
print(my3Darray)
"""

[ 1 8 27 64]
'\n# Defining and printing 2D array\nmy2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16
and printing 3D array\nmy3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1
y)\n'

# Defining and printing 2D array


my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

[[ 1 2 3 4]
[ 2 4 9 16]
[ 4 8 18 32]]

Array using numpy built-in functions

creating an array using built-in NumPy functions, we will use the following code:

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 1/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

# Array of ones
print ("Arrays containing Ones\n")
ones = np.ones((2,2),int)
print(ones)
print ("\n\nArrays containing Zeros\n")

# Array of zeros
zeros = np.zeros((2,10),int)
print(zeros)

Arrays containing Ones

[[1 1]
[1 1]]

Arrays containing Zeros

[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]

# Array with random values


print(np.random.randint(1,10,(3,5)))
print(np.random.random((3,5)))

[[5 9 1 9 4]
[8 6 5 3 8]
[6 7 3 8 4]]
[[0.6253672 0.36754516 0.51695366 0.4776632 0.98696659]
[0.32946382 0.10104098 0.95875064 0.63990203 0.50221926]
[0.39447164 0.78698187 0.41467121 0.18306899 0.59477965]]

# Empty array
emptyArray = np.empty((3,2))
print(emptyArray)

[[2.1018035e-316 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]]

# Full array
fullArray = np.full((2,2),np.pi)
print(fullArray)

# Array of evenly-spaced values


https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 2/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

evenSpacedArray = np.arange(10,25,4)
#evenSpacedArray = np.arange(12).reshape(3,4)
print(evenSpacedArray)

# Array of evenly-spaced values


evenSpacedArray2 = np.linspace(0,2,10)
print(evenSpacedArray2)

[[3.14159265 3.14159265]
[3.14159265 3.14159265]]
[10 14 18 22]
[0. 0.22222222 0.44444444 0.66666667 0.88888889 1.11111111
1.33333333 1.55555556 1.77777778 2. ]

arange allow you to define the size of the step. linspace allow you to define the number of steps.

Inspecting Numpy Arrays

NumPy Array Attributes

ndarray.flags ------>

Information about the memory layout of the array.

'''
# Print the number of `my2DArray`'s dimensions
print(my2DArray.ndim)

# Print the number of `my2DArray`'s elements


print(my2DArray.size)

# Print information about `my2DArray`'s memory layout


print(my2DArray.flags)
'''

# Print the length of one array element in bytes


print(zeros.itemsize)

#itemsize returns the size (in bytes) of each element of a NumPy array

ch = np.array([['a','b','c'],['d','e','f']])
print(ch)
print(ch.itemsize)

# Print the total consumed bytes by `my2DArray`'s elements


print(ch.nbytes)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 3/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

8
[['a' 'b' 'c']
['d' 'e' 'f']]
4
24

# This is formatted as code

Broadcasting in NumPy
Broadcasting is a mechanism that permits NumPy to operate with arrays of different shapes
when performing arithmetic operations.

If the dimensions of two arrays are dissimilar, element-to-element operations are not possible.
However, operations on arrays of non-similar shapes is still possible in NumPy, because of the
broadcasting capability.

a = np.array([1,2,3,4])
print(a.shape)
b = np.array([10,20,30,40])
print(b.shape)
c = a + b
print(c.shape)
print (c)

(4,)
(5,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-161c6451a913> in <module>
3 b = np.array([10,20,30,40,50])
4 print(b.shape)
----> 5 c = a + b
6 print(c.shape)
7 print (c)

ValueError: operands could not be broadcast together with shapes (4,) (5,)

SEARCH STACK OVERFLOW

a = np.array([1,2,3,4])
print(a.shape)
b = 5

#b = np.array([5])

c = a + b

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 4/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

print(c.shape)
print (c)

(4,)
(4,)
[6 7 8 9]

We can think of this as an operation that stretches or duplicates the value 5 into the array [5, 5,
5,5], and adds the results. The advantage of NumPy's broadcasting is that this duplication of
values does not actually take place, but it is a useful mental model as we think about
broadcasting.

We can similarly extend this to arrays of higher dimension. Observe the result when we add a
one-dimensional array to a two-dimensional array:

Broadcasting is an operation of matching the dimensions of differently shaped arrays in order to


be able to perform further operations on those arrays (eg per-element arithmetic).

Rule 1: Two dimensions are operatable if they are equal

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer
dimensions is padded with ones on its leading (left) side.

# Create an array of two dimension


A =np.ones((6, 8))

# Shape of A
print(A.shape)

# Create another array


B = np.random.random((6,8))

# Shape of B
print(B.shape)

# Sum of A and B, here the shape of both the matrix is same.


print(A + B)

(6, 8)
(6, 8)
[[1.02235338 1.07426222 1.07796564 1.92165778 1.69896669 1.27116382
1.87724144 1.3361346 ]
[1.44088172 1.67502615 1.07147457 1.79199182 1.14112224 1.36919479
1.80727734 1.01625392]
[1.65668092 1.84113539 1.54138284 1.23198665 1.31228451 1.59306397
1.46382149 1.87437678]
[1.5552956 1.65484125 1.39042551 1.7380376 1.95208506 1.40201893
1.8552564 1.90180634]
[1.97609831 1.77141474 1.76127984 1.07468606 1.93739374 1.03612855
1.48747198 1.78455217]
[1.37723426 1.57121336 1.99178788 1.77909114 1.16735127 1.80668618
1.80688359 1.18839159]]

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 5/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape
equal to 1 in that dimension is stretched to match the other shape.

Rule 2: Two dimensions are also comptable when one of them is 1

# Initialize `x`
x = np.ones((2, 5))
print(x)

# Check shape of `x`


print(x.shape)

# Initialize `y`
y = np.arange(5)
print(y)

# Check shape of `y`


print(y.shape)

# Subtract `x` and `y`


x-y

[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
(2, 5)
[0 1 2 3 4]
(5,)
array([[ 1., 0., -1., -2., -3.],
[ 1., 0., -1., -2., -3.]])

# Rule 3: Arrays can be broadcasted together if they are compatible in all dimensions

x = np.ones((1,2,8))
print("x = "+"\n", x)
print("shape of x = ")
y = np.random.random((2, 1, 1))
print("y = "+"\n", y)

print("\nthe output = ",x + y)

# Analytical question

#The dimensions of x(1,1,4) and y(3,2,4) are diffrent. However, it is possible


#to add them. Why is that? Also, change x(10,2,8) or y(10,1,4) it will
#give ValueError. Can you find out why?

x =
[[[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]]]
shape of x =
y =
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 6/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

[[[0.8065366 ]]

[[0.00414051]]]

the output = [[[1.8065366 1.8065366 1.8065366 1.8065366 1.8065366 1.8065366


1.8065366 1.8065366 ]
[1.8065366 1.8065366 1.8065366 1.8065366 1.8065366 1.8065366
1.8065366 1.8065366 ]]

[[1.00414051 1.00414051 1.00414051 1.00414051 1.00414051 1.00414051


1.00414051 1.00414051]
[1.00414051 1.00414051 1.00414051 1.00414051 1.00414051 1.00414051
1.00414051 1.00414051]]]

Numpy and mathematics at work

x = np.array([[1, 2, 3], [2, 3, 4]])


y = np.array([[1, 4, 9], [2, 3, -2]])
print(x+y)

[[ 2 6 12]
[ 4 6 2]]

NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic
operators. The standard addition, subtraction, multiplication, and division can all be used:

The following table lists the arithmetic operators implemented in NumPy:


Operator Equivalent ufunc Description

+ np.add Addition (e.g., 1 + 1 = 2 )

- np.subtract Subtraction (e.g., 3 - 2 = 1 )

- np.negative Unary negation (e.g., -2 )

* np.multiply Multiplication (e.g., 2 * 3 = 6 )

/ np.divide Division (e.g., 3 / 2 = 1.5 )

// np.floor_divide Floor division (e.g., 3 // 2 = 1 )

** np.power Exponentiation (e.g., 2 ** 3 = 8 )

% np.mod Modulus/remainder (e.g., 9 % 4 = 1 )

# Basica operations (+, -, *, /, %)


x = np.array([[1, 2, 3], [2, 3, 4]])
y = np.array([[1, 4, 9], [2, 3, -2]])

# Add two array


add = np.add(x, y)
print(add)

# Subtract two array


sub = np.subtract(x, y)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 7/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

print(sub)

# Multipley two array


mul = np.multiply(x, y)
print(mul)

# Divide x, y
div = np.divide(x,y)
print(div)

# Calculated the remainder of x and y


rem = np.mod(x, y)
print(rem)

[[ 2 6 12]
[ 4 6 2]]
[[ 0 -2 -6]
[ 0 0 6]]
[[ 1 8 27]
[ 4 9 -8]]
[[ 1. 0.5 0.33333333]
[ 1. 1. -2. ]]
[[0 2 3]
[0 0 0]]

Subset, Slice, And Index Arrays

x = np.array([10, 20, 30, 40, 50])

# Select items at index 0 and 1


print("# Select items at index 0 and 1")
print(x[0:2])

#Output the Columns


print("#Output the Columns")
print(y[:,1])

#Output the Rows


print("#Output the Rows")
print(y[1,:])

# Select item at row 0 and 1 and column 1 from 2D array


print('#Select item at row 0 and 1 and column 1 from 2D array')
y = np.array([[ 1, 2, 3, 4], [ 9, 10, 11 ,12],[13,14,15,16]])
print(y)
print(y[0:3, 1])

# Specifying conditions
print("# Specifying conditions")

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 8/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

biggerThan2 = (y >= 2)
print(y[biggerThan2])

# Select items at index 0 and 1


[10 20]
#Output the Columns
[4 3]
#Output the Rows
[ 2 3 -2]
#Select item at row 0 and 1 and column 1 from 2D array
[[ 1 2 3 4]
[ 9 10 11 12]
[13 14 15 16]]
[ 2 10 14]
# Specifying conditions
[ 2 3 4 9 10 11 12 13 14 15 16]

Pandas

# Importing pandas
import pandas as pd

Can you set default parameters in Pandas?

print("Pandas Version:", pd.__version__)

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

Pandas Version: 1.3.5

Data structure of pandas


Series
DataFrames

The Pandas Series Object


** A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or
array as follows: **

series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])

print(series)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 9/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

series1 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = [0,4,6,1,2,9,8,7])

print(series1)
print("series1[2] = ",series1[2:4])

series2 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = list('abcdefgh'))


print(series2)
print("series2['g'] = ",series2['g'])

0 2
1 3
2 7
3 11
4 13
5 17
6 19
7 23
dtype: int64
0 2
4 3
6 7
1 11
2 13
9 17
8 19
7 23
dtype: int64
series1[2] = 6 7
1 11
dtype: int64
a 2
b 3
c 7
d 11
e 13
f 17
g 19
h 23
dtype: int64
series2['g'] = 19

series1.values
series1.keys
series1.index

array([ 2, 3, 7, 11, 13, 17, 19, 23])<bound method Series.keys of 0 2


4 3
6 7
1 11
2 13
9 17
8 19
7 23
dtype: int64>Int64Index([0, 4, 6, 1, 2, 9, 8, 7], dtype='int64')

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 10/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

The Pandas DataFrame Object


The next fundamental structure in Pandas is the DataFrame. Like the Series object the
DataFrame can be thought of either as a generalization of a NumPy array, or as a specialization
of a Python dictionary

# Creating dataframe from Dictionary


dict_df = [{'A': 'Apple', 'B': 'Ball'},{'A': 'Aeroplane', 'B': 'Bat', 'C': 'Cat'}]
dict_df = pd.DataFrame(dict_df)
print(dict_df)

A B C
0 Apple Ball NaN
1 Aeroplane Bat Cat

# Creating dataframe from Series


series_df = pd.DataFrame({
'A': range(1, 5),
'B': pd.Timestamp('20190526'),
'C': pd.Series(5, index=list(range(4)), dtype='float64'),
'D': np.array([3] * 4, dtype='int64'),
'E': pd.Categorical(["Depression", "Social Anxiety", "Bipolar Disorder", "Eating Disor
'F': 'Mental health',
'G': 'is challenging'
})
print(series_df)

A B C D E F G
0 1 2019-05-26 5.0 3 Depression Mental health is challenging
1 2 2019-05-26 5.0 3 Social Anxiety Mental health is challenging
2 3 2019-05-26 5.0 3 Bipolar Disorder Mental health is challenging
3 4 2019-05-26 5.0 3 Eating Disorder Mental health is challenging

# Creating a dataframe from ndarrays


sdf = {
'County':['Østfold', 'Hordaland', 'Oslo', 'Hedmark', 'Oppland', 'Buskerud'],
'ISO-Code':[1,2,3,4,5,6],
'Area': [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94],
'Administrative centre': ["Sarpsborg", "Oslo", "City of Oslo", "Hamar", "Lillehammer",
}
sdf = pd.DataFrame(sdf)
print(sdf)

County ISO-Code Area Administrative centre


0 Østfold 1 4180.69 Sarpsborg
1 Hordaland 2 4917.94 Oslo
2 Oslo 3 454.07 City of Oslo
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 11/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

3 Hedmark 4 27397.76 Hamar


4 Oppland 5 25192.10 Lillehammer
5 Buskerud 6 14910.94 Drammen

Loading a dataset into Pandas DataFrame

import pandas as pd
columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num',
'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender','capital_gain','
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.dat
df.head(10)

age workclass fnlwgt education education_num marital_status occupation rela

0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical N

Self-emp- Married-civ- Exec-


1 50 83311 Bachelors 13
not-inc spouse managerial

Handlers-
2 38 Private 215646 HS-grad 9 Divorced N
cleaners

Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners

Married-civ- Prof-
4 28 Private 338409 Bachelors 13
spouse specialty

Married-civ- Exec-
5 37 Private 284582 Masters 14
spouse managerial

Married-spouse- Other-
6 49 Private 160187 9th 5 N
absent service

Self-emp- Married-civ- Exec-


7 52 209642 HS-grad 9
not-inc spouse managerial

Prof-
8 31 Private 45781 Masters 14 Never-married N
specialty

Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial

# Displays the rows, columns, data types and memory used by the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 12/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

0 age 32561 non-null int64


1 workclass 32561 non-null object
2 fnlwgt 32561 non-null int64
3 education 32561 non-null object
4 education_num 32561 non-null int64
5 marital_status 32561 non-null object
6 occupation 32561 non-null object
7 relationship 32561 non-null object
8 ethnicity 32561 non-null object
9 gender 32561 non-null object
10 capital_gain 32561 non-null int64
11 capital_loss 32561 non-null int64
12 hours_per_week 32561 non-null int64
13 country_of_origin 32561 non-null object
14 income 32561 non-null object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB

# Displays the no of data points and columns in the dataframe


df.shape

(32561, 15)

# Display all columns of the dataframe


df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education_num',


'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender',
'capital_gain', 'capital_loss', 'hours_per_week', 'country_of_origin',
'income'],
dtype='object')

# Displays summary statistics for each numerical column in the dataframe


df.describe()

age fnlwgt education_num capital_gain capital_loss hours_p

count 32561.000000 3.256100e+04 32561.000000 32561.000000 32561.000000 32561

mean 38.581647 1.897784e+05 10.080679 1077.648844 87.303830 40

std 13.640433 1.055500e+05 2.572720 7385.292085 402.960219 12

min 17.000000 1.228500e+04 1.000000 0.000000 0.000000 1

25% 28.000000 1.178270e+05 9.000000 0.000000 0.000000 40

50% 37.000000 1.783560e+05 10.000000 0.000000 0.000000 40

75% 48.000000 2.370510e+05 12.000000 0.000000 0.000000 45

max 90.000000 1.484705e+06 16.000000 99999.000000 4356.000000 99

Selecting rows and columns in the dataframe


https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 13/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

# Selects a row
df.iloc[10]

age 37
workclass Private
fnlwgt 280464
education Some-college
education_num 10
marital_status Married-civ-spouse
occupation Exec-managerial
relationship Husband
ethnicity Black
gender Male
capital_gain 0
capital_loss 0
hours_per_week 80
country_of_origin United-States
income >50K
Name: 10, dtype: object

# Selects 10 rows
df.iloc[0:10]

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 14/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

age workclass fnlwgt education education_num marital_status occupation rela

0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical N

Self-emp- Married-civ- Exec-


1 50 83311 Bachelors 13
# Selects a range not-inc
of rows spouse managerial
df.iloc[10:15] Handlers-
2 38 Private 215646 HS-grad 9 Divorced N
cleaners

Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners
age workclass fnlwgt education education_num marital_status occupation re
Married-civ- Prof-
4 28 Private 338409 Some-
Bachelors 13 Married-civ- Exec-
10 37 Private 280464 10 spouse specialty
college spouse managerial
Married-civ- Exec-
5 37 Private 284582 Masters 14 Married-civ- Prof-
11 30 State-gov 141297 Bachelors 13 spouse managerial
spouse specialty
Married-spouse- Other-
12
6 23
49 Private 160187
Private 122272 Bachelors
9th 13
5 Never-married Adm-clerical N
absent service
Assoc-
13 32 Private 205019
Self-emp- 12 Never-married
Married-civ- Sales
Exec- N
7 52 209642 acdm
HS-grad 9
not-inc spouse managerial
Married-civ-
14 40 Private 121772 Assoc-voc 11 Craft-repair
Prof-
8 31 Private 45781 Masters 14 spouse
Never-married N
specialty

Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial

# Selects the last 2 rows


df.iloc[-2:]

age workclass fnlwgt education education_num marital_status occupation

32559 22 Private 201490 HS-grad 9 Never-married Adm-clerical

Self-emp- Married-civ- Exec-


32560 52 287927 HS-grad 9
inc spouse managerial

# Selects every other row in columns 3-5


df.iloc[::2, 3:5].head()

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 15/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

education education_num

0 Bachelors 13

2 HS-grad 9

4 Bachelors 13

6 9th 5
Combine Pandas and Numpy
8 Masters 14

import pandas as pd
import numpy as np

np.random.seed(24)
df = pd.DataFrame({'F': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 5), columns=list('EDCBA'))],
axis=1)
df.iloc[::2, 3:5] = np.nan
df

F E D C B A

0 1.0 1.329212 -0.770033 NaN NaN -1.070816

1 2.0 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.0 0.678805 1.889273 NaN NaN -0.481165

3 4.0 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.0 -1.336936 0.562861 NaN NaN 0.121668

5 6.0 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.0 -0.385684 0.519818 NaN NaN 1.428984

7 8.0 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.0 1.264103 0.290035 NaN NaN 1.030550

9 10.0 0.118098 -0.021853 0.046841 -1.628753 -0.392361

# Define a function that should color the values that are less than 0
def colorNegativeValueToRed(value):
if value < 0:
color = 'red'
elif value > 0:
color = 'black'
else:
color = 'green'

return 'color: %s' % color

s = df.style.applymap(colorNegativeValueToRed, subset=['A','B','C','D','E'])
s
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 16/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

# Let us hightlight max value in the column with green background and min value with orang
def highlightMax(s):
isMax = s == s.max()
return ['background-color: orange' if v else '' for v in isMax]

def highlightMin(s):
isMin = s == s.min()
return ['background-color: green' if v else '' for v in isMin]

df.style.apply(highlightMax).apply(highlightMin).highlight_null(null_color='red')

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

import seaborn as sns

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 17/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

cm = sns.light_palette("pink", as_cmap=True)

s = df.style.background_gradient(cmap=cm)
s

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

Colab paid products - Cancel contracts here

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 18/18

You might also like