AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.

ipynb - Colaboratory

Python Basic operations

from IPython.core.interactiveshell import InteractiveShell


InteractiveShell.ast_node_interactivity = "all"

Numpy Array Basics

importing numpy and creating different types of numpy arrays

# importing numpy
import numpy as np

# Defining 1D array
my1DArray = np.array([1, 8, 27, 64])
print(my1DArray)

"""
# Defining and printing 2D array
my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

#Defining and printing 3D array


my3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1, 2, 3, 4],[ 9, 10, 11, 1
print(my3Darray)
"""

[ 1 8 27 64]
'\n# Defining and printing 2D array\nmy2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16
and printing 3D array\nmy3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1
y)\n'

# Defining and printing 2D array


my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

[[ 1 2 3 4]
[ 2 4 9 16]
[ 4 8 18 32]]

Array using numpy built-in functions

creating an array using built-in NumPy functions, we will use the following code:

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 1/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

# Array of ones
print ("Arrays containing Ones\n")
ones = np.ones((2,2),int)
print(ones)
print ("\n\nArrays containing Zeros\n")

# Array of zeros
zeros = np.zeros((2,10),int)
print(zeros)

Arrays containing Ones

[[1 1]
[1 1]]

Arrays containing Zeros

[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]

# Array with random values


print(np.random.randint(1,10,(3,5)))
print(np.random.random((3,5)))

[[5 9 1 9 4]
[8 6 5 3 8]
[6 7 3 8 4]]
[[0.6253672 0.36754516 0.51695366 0.4776632 0.98696659]
[0.32946382 0.10104098 0.95875064 0.63990203 0.50221926]
[0.39447164 0.78698187 0.41467121 0.18306899 0.59477965]]

# Empty array
emptyArray = np.empty((3,2))
print(emptyArray)

[[2.1018035e-316 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]]

# Full array
fullArray = np.full((2,2),np.pi)
print(fullArray)

# Array of evenly-spaced values


https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 2/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

evenSpacedArray = np.arange(10,25,4)
#evenSpacedArray = np.arange(12).reshape(3,4)
print(evenSpacedArray)

# Array of evenly-spaced values


evenSpacedArray2 = np.linspace(0,2,10)
print(evenSpacedArray2)

[[3.14159265 3.14159265]
[3.14159265 3.14159265]]
[10 14 18 22]
[0. 0.22222222 0.44444444 0.66666667 0.88888889 1.11111111
1.33333333 1.55555556 1.77777778 2. ]

arange allow you to define the size of the step. linspace allow you to define the number of steps.

Inspecting Numpy Arrays

NumPy Array Attributes

ndarray.flags ------>

Information about the memory layout of the array.

'''
# Print the number of `my2DArray`'s dimensions
print(my2DArray.ndim)

# Print the number of `my2DArray`'s elements


print(my2DArray.size)

# Print information about `my2DArray`'s memory layout


print(my2DArray.flags)
'''

# Print the length of one array element in bytes


print(zeros.itemsize)

#itemsize returns the size (in bytes) of each element of a NumPy array

ch = np.array([['a','b','c'],['d','e','f']])
print(ch)
print(ch.itemsize)

# Print the total consumed bytes by `my2DArray`'s elements


print(ch.nbytes)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 3/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

8
[['a' 'b' 'c']
['d' 'e' 'f']]
4
24

# This is formatted as code

Broadcasting in NumPy
Broadcasting is a mechanism that permits NumPy to operate with arrays of different shapes
when performing arithmetic operations.

If the dimensions of two arrays are dissimilar, element-to-element operations are not possible.
However, operations on arrays of non-similar shapes is still possible in NumPy, because of the
broadcasting capability.

a = np.array([1,2,3,4])
print(a.shape)
b = np.array([10,20,30,40])
print(b.shape)
c = a + b
print(c.shape)
print (c)

(4,)
(5,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-161c6451a913> in <module>
3 b = np.array([10,20,30,40,50])
4 print(b.shape)
----> 5 c = a + b
6 print(c.shape)
7 print (c)

ValueError: operands could not be broadcast together with shapes (4,) (5,)

SEARCH STACK OVERFLOW

a = np.array([1,2,3,4])
print(a.shape)
b = 5

#b = np.array([5])

c = a + b

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 4/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

print(c.shape)
print (c)

(4,)
(4,)
[6 7 8 9]

We can think of this as an operation that stretches or duplicates the value 5 into the array [5, 5,
5,5], and adds the results. The advantage of NumPy's broadcasting is that this duplication of
values does not actually take place, but it is a useful mental model as we think about
broadcasting.

We can similarly extend this to arrays of higher dimension. Observe the result when we add a
one-dimensional array to a two-dimensional array:

Broadcasting is an operation of matching the dimensions of differently shaped arrays in order to


be able to perform further operations on those arrays (eg per-element arithmetic).

Rule 1: Two dimensions are operatable if they are equal

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer
dimensions is padded with ones on its leading (left) side.

# Create an array of two dimension


A =np.ones((6, 8))

# Shape of A
print(A.shape)

# Create another array


B = np.random.random((6,8))

# Shape of B
print(B.shape)

# Sum of A and B, here the shape of both the matrix is same.


print(A + B)

(6, 8)
(6, 8)
[[1.02235338 1.07426222 1.07796564 1.92165778 1.69896669 1.27116382
1.87724144 1.3361346 ]
[1.44088172 1.67502615 1.07147457 1.79199182 1.14112224 1.36919479
1.80727734 1.01625392]
[1.65668092 1.84113539 1.54138284 1.23198665 1.31228451 1.59306397
1.46382149 1.87437678]
[1.5552956 1.65484125 1.39042551 1.7380376 1.95208506 1.40201893
1.8552564 1.90180634]
[1.97609831 1.77141474 1.76127984 1.07468606 1.93739374 1.03612855
1.48747198 1.78455217]
[1.37723426 1.57121336 1.99178788 1.77909114 1.16735127 1.80668618
1.80688359 1.18839159]]

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 5/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape
equal to 1 in that dimension is stretched to match the other shape.

Rule 2: Two dimensions are also comptable when one of them is 1

# Initialize `x`
x = np.ones((2, 5))
print(x)

# Check shape of `x`


print(x.shape)

# Initialize `y`
y = np.arange(5)
print(y)

# Check shape of `y`


print(y.shape)

# Subtract `x` and `y`


x-y

[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
(2, 5)
[0 1 2 3 4]
(5,)
array([[ 1., 0., -1., -2., -3.],
[ 1., 0., -1., -2., -3.]])

# Rule 3: Arrays can be broadcasted together if they are compatible in all dimensions

x = np.ones((1,2,8))
print("x = "+"\n", x)
print("shape of x = ")
y = np.random.random((2, 1, 1))
print("y = "+"\n", y)

print("\nthe output = ",x + y)

# Analytical question

#The dimensions of x(1,1,4) and y(3,2,4) are diffrent. However, it is possible


#to add them. Why is that? Also, change x(10,2,8) or y(10,1,4) it will
#give ValueError. Can you find out why?

x =
[[[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]]]
shape of x =
y =
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 6/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

[[[0.8065366 ]]

[[0.00414051]]]

the output = [[[1.8065366 1.8065366 1.8065366 1.8065366 1.8065366 1.8065366


1.8065366 1.8065366 ]
[1.8065366 1.8065366 1.8065366 1.8065366 1.8065366 1.8065366
1.8065366 1.8065366 ]]

[[1.00414051 1.00414051 1.00414051 1.00414051 1.00414051 1.00414051


1.00414051 1.00414051]
[1.00414051 1.00414051 1.00414051 1.00414051 1.00414051 1.00414051
1.00414051 1.00414051]]]

Numpy and mathematics at work

x = np.array([[1, 2, 3], [2, 3, 4]])


y = np.array([[1, 4, 9], [2, 3, -2]])
print(x+y)

[[ 2 6 12]
[ 4 6 2]]

NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic
operators. The standard addition, subtraction, multiplication, and division can all be used:

The following table lists the arithmetic operators implemented in NumPy:


Operator Equivalent ufunc Description

+ np.add Addition (e.g., 1 + 1 = 2 )

- np.subtract Subtraction (e.g., 3 - 2 = 1 )

- np.negative Unary negation (e.g., -2 )

* np.multiply Multiplication (e.g., 2 * 3 = 6 )

/ np.divide Division (e.g., 3 / 2 = 1.5 )

// np.floor_divide Floor division (e.g., 3 // 2 = 1 )

** np.power Exponentiation (e.g., 2 ** 3 = 8 )

% np.mod Modulus/remainder (e.g., 9 % 4 = 1 )

# Basica operations (+, -, *, /, %)


x = np.array([[1, 2, 3], [2, 3, 4]])
y = np.array([[1, 4, 9], [2, 3, -2]])

# Add two array


add = np.add(x, y)
print(add)

# Subtract two array


sub = np.subtract(x, y)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 7/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

print(sub)

# Multipley two array


mul = np.multiply(x, y)
print(mul)

# Divide x, y
div = np.divide(x,y)
print(div)

# Calculated the remainder of x and y


rem = np.mod(x, y)
print(rem)

[[ 2 6 12]
[ 4 6 2]]
[[ 0 -2 -6]
[ 0 0 6]]
[[ 1 8 27]
[ 4 9 -8]]
[[ 1. 0.5 0.33333333]
[ 1. 1. -2. ]]
[[0 2 3]
[0 0 0]]

Subset, Slice, And Index Arrays

x = np.array([10, 20, 30, 40, 50])

# Select items at index 0 and 1


print("# Select items at index 0 and 1")
print(x[0:2])

#Output the Columns


print("#Output the Columns")
print(y[:,1])

#Output the Rows


print("#Output the Rows")
print(y[1,:])

# Select item at row 0 and 1 and column 1 from 2D array


print('#Select item at row 0 and 1 and column 1 from 2D array')
y = np.array([[ 1, 2, 3, 4], [ 9, 10, 11 ,12],[13,14,15,16]])
print(y)
print(y[0:3, 1])

# Specifying conditions
print("# Specifying conditions")

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 8/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

biggerThan2 = (y >= 2)
print(y[biggerThan2])

# Select items at index 0 and 1


[10 20]
#Output the Columns
[4 3]
#Output the Rows
[ 2 3 -2]
#Select item at row 0 and 1 and column 1 from 2D array
[[ 1 2 3 4]
[ 9 10 11 12]
[13 14 15 16]]
[ 2 10 14]
# Specifying conditions
[ 2 3 4 9 10 11 12 13 14 15 16]

Pandas

# Importing pandas
import pandas as pd

Can you set default parameters in Pandas?

print("Pandas Version:", pd.__version__)

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

Pandas Version: 1.3.5

Data structure of pandas


Series
DataFrames

The Pandas Series Object


** A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or
array as follows: **

series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])

print(series)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 9/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

series1 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = [0,4,6,1,2,9,8,7])

print(series1)
print("series1[2] = ",series1[2:4])

series2 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = list('abcdefgh'))


print(series2)
print("series2['g'] = ",series2['g'])

0 2
1 3
2 7
3 11
4 13
5 17
6 19
7 23
dtype: int64
0 2
4 3
6 7
1 11
2 13
9 17
8 19
7 23
dtype: int64
series1[2] = 6 7
1 11
dtype: int64
a 2
b 3
c 7
d 11
e 13
f 17
g 19
h 23
dtype: int64
series2['g'] = 19

series1.values
series1.keys
series1.index

array([ 2, 3, 7, 11, 13, 17, 19, 23])<bound method Series.keys of 0 2


4 3
6 7
1 11
2 13
9 17
8 19
7 23
dtype: int64>Int64Index([0, 4, 6, 1, 2, 9, 8, 7], dtype='int64')

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 10/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

The Pandas DataFrame Object


The next fundamental structure in Pandas is the DataFrame. Like the Series object the
DataFrame can be thought of either as a generalization of a NumPy array, or as a specialization
of a Python dictionary

# Creating dataframe from Dictionary


dict_df = [{'A': 'Apple', 'B': 'Ball'},{'A': 'Aeroplane', 'B': 'Bat', 'C': 'Cat'}]
dict_df = pd.DataFrame(dict_df)
print(dict_df)

A B C
0 Apple Ball NaN
1 Aeroplane Bat Cat

# Creating dataframe from Series


series_df = pd.DataFrame({
'A': range(1, 5),
'B': pd.Timestamp('20190526'),
'C': pd.Series(5, index=list(range(4)), dtype='float64'),
'D': np.array([3] * 4, dtype='int64'),
'E': pd.Categorical(["Depression", "Social Anxiety", "Bipolar Disorder", "Eating Disor
'F': 'Mental health',
'G': 'is challenging'
})
print(series_df)

A B C D E F G
0 1 2019-05-26 5.0 3 Depression Mental health is challenging
1 2 2019-05-26 5.0 3 Social Anxiety Mental health is challenging
2 3 2019-05-26 5.0 3 Bipolar Disorder Mental health is challenging
3 4 2019-05-26 5.0 3 Eating Disorder Mental health is challenging

# Creating a dataframe from ndarrays


sdf = {
'County':['Østfold', 'Hordaland', 'Oslo', 'Hedmark', 'Oppland', 'Buskerud'],
'ISO-Code':[1,2,3,4,5,6],
'Area': [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94],
'Administrative centre': ["Sarpsborg", "Oslo", "City of Oslo", "Hamar", "Lillehammer",
}
sdf = pd.DataFrame(sdf)
print(sdf)

County ISO-Code Area Administrative centre


0 Østfold 1 4180.69 Sarpsborg
1 Hordaland 2 4917.94 Oslo
2 Oslo 3 454.07 City of Oslo
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 11/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

3 Hedmark 4 27397.76 Hamar


4 Oppland 5 25192.10 Lillehammer
5 Buskerud 6 14910.94 Drammen

Loading a dataset into Pandas DataFrame

import pandas as pd
columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num',
'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender','capital_gain','
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.dat
df.head(10)

age workclass fnlwgt education education_num marital_status occupation rela

0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical N

Self-emp- Married-civ- Exec-


1 50 83311 Bachelors 13
not-inc spouse managerial

Handlers-
2 38 Private 215646 HS-grad 9 Divorced N
cleaners

Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners

Married-civ- Prof-
4 28 Private 338409 Bachelors 13
spouse specialty

Married-civ- Exec-
5 37 Private 284582 Masters 14
spouse managerial

Married-spouse- Other-
6 49 Private 160187 9th 5 N
absent service

Self-emp- Married-civ- Exec-


7 52 209642 HS-grad 9
not-inc spouse managerial

Prof-
8 31 Private 45781 Masters 14 Never-married N
specialty

Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial

# Displays the rows, columns, data types and memory used by the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 12/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

0 age 32561 non-null int64


1 workclass 32561 non-null object
2 fnlwgt 32561 non-null int64
3 education 32561 non-null object
4 education_num 32561 non-null int64
5 marital_status 32561 non-null object
6 occupation 32561 non-null object
7 relationship 32561 non-null object
8 ethnicity 32561 non-null object
9 gender 32561 non-null object
10 capital_gain 32561 non-null int64
11 capital_loss 32561 non-null int64
12 hours_per_week 32561 non-null int64
13 country_of_origin 32561 non-null object
14 income 32561 non-null object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB

# Displays the no of data points and columns in the dataframe


df.shape

(32561, 15)

# Display all columns of the dataframe


df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education_num',


'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender',
'capital_gain', 'capital_loss', 'hours_per_week', 'country_of_origin',
'income'],
dtype='object')

# Displays summary statistics for each numerical column in the dataframe


df.describe()

age fnlwgt education_num capital_gain capital_loss hours_p

count 32561.000000 3.256100e+04 32561.000000 32561.000000 32561.000000 32561

mean 38.581647 1.897784e+05 10.080679 1077.648844 87.303830 40

std 13.640433 1.055500e+05 2.572720 7385.292085 402.960219 12

min 17.000000 1.228500e+04 1.000000 0.000000 0.000000 1

25% 28.000000 1.178270e+05 9.000000 0.000000 0.000000 40

50% 37.000000 1.783560e+05 10.000000 0.000000 0.000000 40

75% 48.000000 2.370510e+05 12.000000 0.000000 0.000000 45

max 90.000000 1.484705e+06 16.000000 99999.000000 4356.000000 99

Selecting rows and columns in the dataframe


https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 13/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

# Selects a row
df.iloc[10]

age 37
workclass Private
fnlwgt 280464
education Some-college
education_num 10
marital_status Married-civ-spouse
occupation Exec-managerial
relationship Husband
ethnicity Black
gender Male
capital_gain 0
capital_loss 0
hours_per_week 80
country_of_origin United-States
income >50K
Name: 10, dtype: object

# Selects 10 rows
df.iloc[0:10]

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 14/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

age workclass fnlwgt education education_num marital_status occupation rela

0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical N

Self-emp- Married-civ- Exec-


1 50 83311 Bachelors 13
# Selects a range not-inc
of rows spouse managerial
df.iloc[10:15] Handlers-
2 38 Private 215646 HS-grad 9 Divorced N
cleaners

Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners
age workclass fnlwgt education education_num marital_status occupation re
Married-civ- Prof-
4 28 Private 338409 Some-
Bachelors 13 Married-civ- Exec-
10 37 Private 280464 10 spouse specialty
college spouse managerial
Married-civ- Exec-
5 37 Private 284582 Masters 14 Married-civ- Prof-
11 30 State-gov 141297 Bachelors 13 spouse managerial
spouse specialty
Married-spouse- Other-
12
6 23
49 Private 160187
Private 122272 Bachelors
9th 13
5 Never-married Adm-clerical N
absent service
Assoc-
13 32 Private 205019
Self-emp- 12 Never-married
Married-civ- Sales
Exec- N
7 52 209642 acdm
HS-grad 9
not-inc spouse managerial
Married-civ-
14 40 Private 121772 Assoc-voc 11 Craft-repair
Prof-
8 31 Private 45781 Masters 14 spouse
Never-married N
specialty

Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial

# Selects the last 2 rows


df.iloc[-2:]

age workclass fnlwgt education education_num marital_status occupation

32559 22 Private 201490 HS-grad 9 Never-married Adm-clerical

Self-emp- Married-civ- Exec-


32560 52 287927 HS-grad 9
inc spouse managerial

# Selects every other row in columns 3-5


df.iloc[::2, 3:5].head()

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 15/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

education education_num

0 Bachelors 13

2 HS-grad 9

4 Bachelors 13

6 9th 5
Combine Pandas and Numpy
8 Masters 14

import pandas as pd
import numpy as np

np.random.seed(24)
df = pd.DataFrame({'F': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 5), columns=list('EDCBA'))],
axis=1)
df.iloc[::2, 3:5] = np.nan
df

F E D C B A

0 1.0 1.329212 -0.770033 NaN NaN -1.070816

1 2.0 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.0 0.678805 1.889273 NaN NaN -0.481165

3 4.0 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.0 -1.336936 0.562861 NaN NaN 0.121668

5 6.0 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.0 -0.385684 0.519818 NaN NaN 1.428984

7 8.0 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.0 1.264103 0.290035 NaN NaN 1.030550

9 10.0 0.118098 -0.021853 0.046841 -1.628753 -0.392361

# Define a function that should color the values that are less than 0
def colorNegativeValueToRed(value):
if value < 0:
color = 'red'
elif value > 0:
color = 'black'
else:
color = 'green'

return 'color: %s' % color

s = df.style.applymap(colorNegativeValueToRed, subset=['A','B','C','D','E'])
s
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 16/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

# Let us hightlight max value in the column with green background and min value with orang
def highlightMax(s):
isMax = s == s.max()
return ['background-color: orange' if v else '' for v in isMax]

def highlightMin(s):
isMin = s == s.min()
return ['background-color: green' if v else '' for v in isMin]

df.style.apply(highlightMax).apply(highlightMin).highlight_null(null_color='red')

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

import seaborn as sns

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 17/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

cm = sns.light_palette("pink", as_cmap=True)

s = df.style.background_gradient(cmap=cm)
s

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

Colab paid products - Cancel contracts here

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 18/18

You might also like