AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory
AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory
AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory
ipynb - Colaboratory
# importing numpy
import numpy as np
# Defining 1D array
my1DArray = np.array([1, 8, 27, 64])
print(my1DArray)
"""
# Defining and printing 2D array
my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)
[ 1 8 27 64]
'\n# Defining and printing 2D array\nmy2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16
and printing 3D array\nmy3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1
y)\n'
[[ 1 2 3 4]
[ 2 4 9 16]
[ 4 8 18 32]]
creating an array using built-in NumPy functions, we will use the following code:
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 1/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
# Array of ones
print ("Arrays containing Ones\n")
ones = np.ones((2,2),int)
print(ones)
print ("\n\nArrays containing Zeros\n")
# Array of zeros
zeros = np.zeros((2,10),int)
print(zeros)
[[1 1]
[1 1]]
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
[[5 9 1 9 4]
[8 6 5 3 8]
[6 7 3 8 4]]
[[0.6253672 0.36754516 0.51695366 0.4776632 0.98696659]
[0.32946382 0.10104098 0.95875064 0.63990203 0.50221926]
[0.39447164 0.78698187 0.41467121 0.18306899 0.59477965]]
# Empty array
emptyArray = np.empty((3,2))
print(emptyArray)
[[2.1018035e-316 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]]
# Full array
fullArray = np.full((2,2),np.pi)
print(fullArray)
evenSpacedArray = np.arange(10,25,4)
#evenSpacedArray = np.arange(12).reshape(3,4)
print(evenSpacedArray)
[[3.14159265 3.14159265]
[3.14159265 3.14159265]]
[10 14 18 22]
[0. 0.22222222 0.44444444 0.66666667 0.88888889 1.11111111
1.33333333 1.55555556 1.77777778 2. ]
arange allow you to define the size of the step. linspace allow you to define the number of steps.
ndarray.flags ------>
'''
# Print the number of `my2DArray`'s dimensions
print(my2DArray.ndim)
#itemsize returns the size (in bytes) of each element of a NumPy array
ch = np.array([['a','b','c'],['d','e','f']])
print(ch)
print(ch.itemsize)
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 3/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
8
[['a' 'b' 'c']
['d' 'e' 'f']]
4
24
Broadcasting in NumPy
Broadcasting is a mechanism that permits NumPy to operate with arrays of different shapes
when performing arithmetic operations.
If the dimensions of two arrays are dissimilar, element-to-element operations are not possible.
However, operations on arrays of non-similar shapes is still possible in NumPy, because of the
broadcasting capability.
a = np.array([1,2,3,4])
print(a.shape)
b = np.array([10,20,30,40])
print(b.shape)
c = a + b
print(c.shape)
print (c)
(4,)
(5,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-161c6451a913> in <module>
3 b = np.array([10,20,30,40,50])
4 print(b.shape)
----> 5 c = a + b
6 print(c.shape)
7 print (c)
ValueError: operands could not be broadcast together with shapes (4,) (5,)
a = np.array([1,2,3,4])
print(a.shape)
b = 5
#b = np.array([5])
c = a + b
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 4/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
print(c.shape)
print (c)
(4,)
(4,)
[6 7 8 9]
We can think of this as an operation that stretches or duplicates the value 5 into the array [5, 5,
5,5], and adds the results. The advantage of NumPy's broadcasting is that this duplication of
values does not actually take place, but it is a useful mental model as we think about
broadcasting.
We can similarly extend this to arrays of higher dimension. Observe the result when we add a
one-dimensional array to a two-dimensional array:
Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer
dimensions is padded with ones on its leading (left) side.
# Shape of A
print(A.shape)
# Shape of B
print(B.shape)
(6, 8)
(6, 8)
[[1.02235338 1.07426222 1.07796564 1.92165778 1.69896669 1.27116382
1.87724144 1.3361346 ]
[1.44088172 1.67502615 1.07147457 1.79199182 1.14112224 1.36919479
1.80727734 1.01625392]
[1.65668092 1.84113539 1.54138284 1.23198665 1.31228451 1.59306397
1.46382149 1.87437678]
[1.5552956 1.65484125 1.39042551 1.7380376 1.95208506 1.40201893
1.8552564 1.90180634]
[1.97609831 1.77141474 1.76127984 1.07468606 1.93739374 1.03612855
1.48747198 1.78455217]
[1.37723426 1.57121336 1.99178788 1.77909114 1.16735127 1.80668618
1.80688359 1.18839159]]
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 5/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape
equal to 1 in that dimension is stretched to match the other shape.
# Initialize `x`
x = np.ones((2, 5))
print(x)
# Initialize `y`
y = np.arange(5)
print(y)
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
(2, 5)
[0 1 2 3 4]
(5,)
array([[ 1., 0., -1., -2., -3.],
[ 1., 0., -1., -2., -3.]])
# Rule 3: Arrays can be broadcasted together if they are compatible in all dimensions
x = np.ones((1,2,8))
print("x = "+"\n", x)
print("shape of x = ")
y = np.random.random((2, 1, 1))
print("y = "+"\n", y)
# Analytical question
x =
[[[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]]]
shape of x =
y =
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 6/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
[[[0.8065366 ]]
[[0.00414051]]]
[[ 2 6 12]
[ 4 6 2]]
NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic
operators. The standard addition, subtraction, multiplication, and division can all be used:
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 7/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
print(sub)
# Divide x, y
div = np.divide(x,y)
print(div)
[[ 2 6 12]
[ 4 6 2]]
[[ 0 -2 -6]
[ 0 0 6]]
[[ 1 8 27]
[ 4 9 -8]]
[[ 1. 0.5 0.33333333]
[ 1. 1. -2. ]]
[[0 2 3]
[0 0 0]]
# Specifying conditions
print("# Specifying conditions")
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 8/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
biggerThan2 = (y >= 2)
print(y[biggerThan2])
Pandas
# Importing pandas
import pandas as pd
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)
print(series)
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 9/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
print(series1)
print("series1[2] = ",series1[2:4])
0 2
1 3
2 7
3 11
4 13
5 17
6 19
7 23
dtype: int64
0 2
4 3
6 7
1 11
2 13
9 17
8 19
7 23
dtype: int64
series1[2] = 6 7
1 11
dtype: int64
a 2
b 3
c 7
d 11
e 13
f 17
g 19
h 23
dtype: int64
series2['g'] = 19
series1.values
series1.keys
series1.index
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 10/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
A B C
0 Apple Ball NaN
1 Aeroplane Bat Cat
A B C D E F G
0 1 2019-05-26 5.0 3 Depression Mental health is challenging
1 2 2019-05-26 5.0 3 Social Anxiety Mental health is challenging
2 3 2019-05-26 5.0 3 Bipolar Disorder Mental health is challenging
3 4 2019-05-26 5.0 3 Eating Disorder Mental health is challenging
import pandas as pd
columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num',
'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender','capital_gain','
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.dat
df.head(10)
Handlers-
2 38 Private 215646 HS-grad 9 Divorced N
cleaners
Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners
Married-civ- Prof-
4 28 Private 338409 Bachelors 13
spouse specialty
Married-civ- Exec-
5 37 Private 284582 Masters 14
spouse managerial
Married-spouse- Other-
6 49 Private 160187 9th 5 N
absent service
Prof-
8 31 Private 45781 Masters 14 Never-married N
specialty
Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial
# Displays the rows, columns, data types and memory used by the dataframe
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 12/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
(32561, 15)
# Selects a row
df.iloc[10]
age 37
workclass Private
fnlwgt 280464
education Some-college
education_num 10
marital_status Married-civ-spouse
occupation Exec-managerial
relationship Husband
ethnicity Black
gender Male
capital_gain 0
capital_loss 0
hours_per_week 80
country_of_origin United-States
income >50K
Name: 10, dtype: object
# Selects 10 rows
df.iloc[0:10]
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 14/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners
age workclass fnlwgt education education_num marital_status occupation re
Married-civ- Prof-
4 28 Private 338409 Some-
Bachelors 13 Married-civ- Exec-
10 37 Private 280464 10 spouse specialty
college spouse managerial
Married-civ- Exec-
5 37 Private 284582 Masters 14 Married-civ- Prof-
11 30 State-gov 141297 Bachelors 13 spouse managerial
spouse specialty
Married-spouse- Other-
12
6 23
49 Private 160187
Private 122272 Bachelors
9th 13
5 Never-married Adm-clerical N
absent service
Assoc-
13 32 Private 205019
Self-emp- 12 Never-married
Married-civ- Sales
Exec- N
7 52 209642 acdm
HS-grad 9
not-inc spouse managerial
Married-civ-
14 40 Private 121772 Assoc-voc 11 Craft-repair
Prof-
8 31 Private 45781 Masters 14 spouse
Never-married N
specialty
Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 15/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
education education_num
0 Bachelors 13
2 HS-grad 9
4 Bachelors 13
6 9th 5
Combine Pandas and Numpy
8 Masters 14
import pandas as pd
import numpy as np
np.random.seed(24)
df = pd.DataFrame({'F': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 5), columns=list('EDCBA'))],
axis=1)
df.iloc[::2, 3:5] = np.nan
df
F E D C B A
# Define a function that should color the values that are less than 0
def colorNegativeValueToRed(value):
if value < 0:
color = 'red'
elif value > 0:
color = 'black'
else:
color = 'green'
s = df.style.applymap(colorNegativeValueToRed, subset=['A','B','C','D','E'])
s
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 16/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
F E D C B A
# Let us hightlight max value in the column with green background and min value with orang
def highlightMax(s):
isMax = s == s.max()
return ['background-color: orange' if v else '' for v in isMax]
def highlightMin(s):
isMin = s == s.min()
return ['background-color: green' if v else '' for v in isMin]
df.style.apply(highlightMax).apply(highlightMin).highlight_null(null_color='red')
F E D C B A
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 17/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory
cm = sns.light_palette("pink", as_cmap=True)
s = df.style.background_gradient(cmap=cm)
s
F E D C B A
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 18/18