0% found this document useful (0 votes)
25 views

2 Numpy Basics

The NumPy ndarray is a multidimensional array object that allows efficient vectorized computations. NumPy arrays can represent multidimensional grids of values, with each value represented by a data type. NumPy provides functions to create, manipulate, and perform arithmetic on arrays efficiently. Boolean and fancy indexing can be used to select subsets of array values. Arrays support basic operations like slicing, transposing, and changing axes.

Uploaded by

Girraj Dohare
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

2 Numpy Basics

The NumPy ndarray is a multidimensional array object that allows efficient vectorized computations. NumPy arrays can represent multidimensional grids of values, with each value represented by a data type. NumPy provides functions to create, manipulate, and perform arithmetic on arrays efficiently. Boolean and fancy indexing can be used to select subsets of array values. Arrays support basic operations like slicing, transposing, and changing axes.

Uploaded by

Girraj Dohare
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

NumPy Basics: Arrays and Vectorized Computation

The NumPy ndarray: A Multidimensional Array Object

import numpy as np
# Generate some random data
data = np.random.randn(2, 3)
data

array([[-0.5139784 , -1.27040773, 0.57550144],


[ 0.37410083, 1.40450123, 0.08800731]])

data * 10
data + data

array([[-1.0279568 , -2.54081545, 1.15100289],


[ 0.74820165, 2.80900247, 0.17601461]])

data.shape
data.dtype

dtype('float64')

Creating ndarrays

data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
[5, 6, 7, 8]])

arr2.ndim
arr2.shape

(2, 4)

arr1.dtype
arr2.dtype

dtype('int64')

np.zeros(10)
np.zeros((3, 6))
np.empty((2, 3, 2))

array([[[9.89217446e-317, 1.03977794e-312],
[2.12199579e-312, 2.56761491e-312],
[2.14321575e-312, 2.05833592e-312]],

[[2.41907520e-312, 9.76118064e-313],
[2.46151512e-312, 2.37663529e-312],
[4.99006302e-322, 0.00000000e+000]]])

np.arange(15)

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

Data Types for ndarrays

arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr1.dtype
arr2.dtype

dtype('int32')

arr = np.array([1, 2, 3, 4, 5])
arr.dtype
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr
arr.astype(np.int32)

array([ 3, -1, -2, 0, 12, 10], dtype=int32)

numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42. ])

Arithmetic with NumPy Arrays


arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr
arr * arr
arr - arr

array([[0., 0., 0.],


[0., 0., 0.]])

1 / arr
arr ** 0.5

array([[1. , 1.41421356, 1.73205081],


[2. , 2.23606798, 2.44948974]])

arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2
arr2 > arr

array([[False, True, False],


[ True, False, True]])

Basic Indexing and Slicing

arr = np.arange(10)
arr
arr[5]
arr[5:8]
arr[5:8] = 12
arr

array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])

arr_slice = arr[5:8]
arr_slice

array([12, 12, 12])

arr_slice[1] = 12345
arr

array([ 0, 1, 2, 3, 4, 12, 12345, 12, 8,


9])

arr_slice[:] = 64
arr

array([ 0, 1, 2, 3, 4, 64, 64, 64, 8, 9])

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]
array([7, 8, 9])

arr2d[0][2]
arr2d[0, 2]

arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

array([[[ 1, 2, 3],
[ 4, 5, 6]],

[[ 7, 8, 9],
[10, 11, 12]]])

arr3d[0]

array([[1, 2, 3],
[4, 5, 6]])

old_values = arr3d[0].copy()
arr3d[0] = 42
print(arr3d)
arr3d[0] = old_values
print(arr3d)

[[[42 42 42]
[42 42 42]]

[[ 7 8 9]
[10 11 12]]]
[[[ 1 2 3]
[ 4 5 6]]

[[ 7 8 9]
[10 11 12]]]

arr3d[1, 0]

array([7, 8, 9])

x = arr3d[1]
print(x)
print(x[0])

[[ 7 8 9]
[10 11 12]]
[7 8 9]

Indexing with slices


print(arr)
print(arr[1:6])

[ 0 1 2 3 4 64 64 64 8 9]
[ 1 2 3 4 64]

print(arr2d)
print(arr2d[:2])

[[1 2 3]
[4 5 6]
[7 8 9]]
[[1 2 3]
[4 5 6]]

arr2d[:2, 1:]

array([[2, 3],
[5, 6]])

arr2d[1, :2]

array([4, 5])

arr2d[:2, 2]

array([3, 6])

arr2d[:, :1]

array([[1],
[4],
[7]])

arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
[4, 0, 0],
[7, 8, 9]])

Boolean Indexing

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
names
data

array([[ 0.20309709, 0.85111241, -0.98394599, 0.43167247],


[ 1.82408046, -0.00523696, -1.23592936, -1.49246949],
[ 0.86839758, -0.98189398, 2.87190707, -0.61232515],
[-0.11227721, -0.54649122, 1.23762715, -0.72116203],
[ 0.33307116, 0.92679623, 0.67911504, -0.99551941],
[-0.0443392 , -0.65179067, -0.04891349, 0.08904668],
[ 0.9447645 , -0.0926507 , -0.22921411, -1.04981254]])

names == 'Bob'

array([ True, False, False, True, False, False, False])

data[names == 'Bob']

array([[ 0.20309709, 0.85111241, -0.98394599, 0.43167247],


[-0.11227721, -0.54649122, 1.23762715, -0.72116203]])

print(data[names == 'Bob', 2:])
print(data[names == 'Bob', 3])

[[-0.98394599 0.43167247]
[ 1.23762715 -0.72116203]]
[ 0.43167247 -0.72116203]

names != 'Bob'
data[~(names == 'Bob')]

array([[ 1.82408046, -0.00523696, -1.23592936, -1.49246949],


[ 0.86839758, -0.98189398, 2.87190707, -0.61232515],
[ 0.33307116, 0.92679623, 0.67911504, -0.99551941],
[-0.0443392 , -0.65179067, -0.04891349, 0.08904668],
[ 0.9447645 , -0.0926507 , -0.22921411, -1.04981254]])

cond = names == 'Bob'
data[~cond]

array([[ 1.82408046, -0.00523696, -1.23592936, -1.49246949],


[ 0.86839758, -0.98189398, 2.87190707, -0.61232515],
[ 0.33307116, 0.92679623, 0.67911504, -0.99551941],
[-0.0443392 , -0.65179067, -0.04891349, 0.08904668],
[ 0.9447645 , -0.0926507 , -0.22921411, -1.04981254]])

mask = (names == 'Bob') | (names == 'Will')
mask
data[mask]

array([[ 0.20309709, 0.85111241, -0.98394599, 0.43167247],


[ 0.86839758, -0.98189398, 2.87190707, -0.61232515],
[-0.11227721, -0.54649122, 1.23762715, -0.72116203],
[ 0.33307116, 0.92679623, 0.67911504, -0.99551941]])

data[data < 0] = 0
data
array([[0.20309709, 0.85111241, 0. , 0.43167247],
[1.82408046, 0. , 0. , 0. ],
[0.86839758, 0. , 2.87190707, 0. ],
[0. , 0. , 1.23762715, 0. ],
[0.33307116, 0.92679623, 0.67911504, 0. ],
[0. , 0. , 0. , 0.08904668],
[0.9447645 , 0. , 0. , 0. ]])

data[names != 'Joe'] = 7
data

array([[7. , 7. , 7. , 7. ],
[1.82408046, 0. , 0. , 0. ],
[7. , 7. , 7. , 7. ],
[7. , 7. , 7. , 7. ],
[7. , 7. , 7. , 7. ],
[0. , 0. , 0. , 0.08904668],
[0.9447645 , 0. , 0. , 0. ]])

Fancy Indexing

arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],


[1., 1., 1., 1.],
[2., 2., 2., 2.],
[3., 3., 3., 3.],
[4., 4., 4., 4.],
[5., 5., 5., 5.],
[6., 6., 6., 6.],
[7., 7., 7., 7.]])

arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],


[3., 3., 3., 3.],
[0., 0., 0., 0.],
[6., 6., 6., 6.]])

arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],


[3., 3., 3., 3.],
[1., 1., 1., 1.]])

arr = np.arange(32).reshape((8, 4))
print(arr)
print(arr[[1, 5, 7, 2], [0, 3, 1, 2]])

[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]
[24 25 26 27]
[28 29 30 31]]
[ 4 23 29 10]

arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4, 7, 5, 6],
[20, 23, 21, 22],
[28, 31, 29, 30],
[ 8, 11, 9, 10]])

Transposing Arrays and Swapping Axes

arr = np.arange(15).reshape((3, 5))
print(arr)
print(arr.T)

[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
[[ 0 5 10]
[ 1 6 11]
[ 2 7 12]
[ 3 8 13]
[ 4 9 14]]

arr = np.random.randn(6, 3)
print(arr)
print(np.dot(arr.T, arr))

[[ 1.95427481 -1.45878156 0.67348335]


[-0.61853959 -1.14599705 0.3259365 ]
[-1.72807279 -0.09768532 -0.92022423]
[-0.46576703 2.36466648 -1.33743177]
[ 1.58203542 -0.8185547 -0.23942425]
[-1.10188807 0.2609223 -1.46490921]]
[[11.12194915 -4.65708155 4.56310132]
[-4.65708155 9.7806551 -4.61492065]
[ 4.56310132 -4.61492065 5.39863377]]

arr = np.arange(16).reshape((2, 2, 4))
print(arr)
print(arr.transpose((1, 0, 2)))

[[[ 0 1 2 3]
[ 4 5 6 7]]

[[ 8 9 10 11]
[12 13 14 15]]]
[[[ 0 1 2 3]
[ 8 9 10 11]]

[[ 4 5 6 7]
[12 13 14 15]]]

print(arr)
print(arr.swapaxes(1, 2))

[[[ 0 1 2 3]
[ 4 5 6 7]]

[[ 8 9 10 11]
[12 13 14 15]]]
[[[ 0 4]
[ 1 5]
[ 2 6]
[ 3 7]]

[[ 8 12]
[ 9 13]
[10 14]
[11 15]]]

Universal Functions: Fast Element-Wise Array Functions

arr = np.arange(10)
print(arr)
print(np.sqrt(arr))
print(np.exp(arr))

[0 1 2 3 4 5 6 7 8 9]
[0. 1. 1.41421356 1.73205081 2. 2.23606798
2.44948974 2.64575131 2.82842712 3. ]
[1.00000000e+00 2.71828183e+00 7.38905610e+00 2.00855369e+01
5.45981500e+01 1.48413159e+02 4.03428793e+02 1.09663316e+03
2.98095799e+03 8.10308393e+03]

x = np.random.randn(8)
y = np.random.randn(8)
print(x)
print(y)
print(np.maximum(x, y))

[ 0.15856676 -0.07415159 -1.33294673 0.48428409 0.47800204 0.90552637


-0.74381882 0.99362125]
[-0.81825258 -1.08881846 1.24975138 0.11782414 -0.1913887 -2.21727141
0.04640863 2.19171733]
[ 0.15856676 -0.07415159 1.24975138 0.48428409 0.47800204 0.90552637
0.04640863 2.19171733]

arr = np.random.randn(7) * 5
print(arr)
remainder, whole_part = np.modf(arr)
print(remainder)
print(whole_part)

[ -8.52084722 -10.00400311 3.48000378 -8.69320327 1.79760053


-3.10093711 0.3056864 ]
[-0.52084722 -0.00400311 0.48000378 -0.69320327 0.79760053 -0.10093711
0.3056864 ]
[ -8. -10. 3. -8. 1. -3. 0.]

arr
print(np.sqrt(arr))
print(np.sqrt(arr, arr))
arr

[ nan nan 1.86547682 nan 1.34074626 nan


0.55288914]
[ nan nan 1.86547682 nan 1.34074626 nan
0.55288914]
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: RuntimeWarnin

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: RuntimeWarnin
This is separate from the ipykernel package so we can avoid doing imports u
array([ nan, nan, 1.86547682, nan, 1.34074626,
nan, 0.55288914])

Expressing Conditional Logic as Array Operations

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
result

[1.1, 2.2, 1.3, 1.4, 2.5]

result = np.where(cond, xarr, yarr)
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

arr = np.random.randn(4, 4)
print(arr)
print(np.where(arr > 0, 2, -2))

[[-0.7861962 -0.13946245 0.68844061 -0.71467818]


[-0.54832876 0.18226867 -1.6592476 0.77302508]
[ 0.15499033 -0.47030354 -0.18876408 1.84634568]
[-1.23079449 0.62283825 0.1975187 0.85509704]]
[[-2 -2 2 -2]
[-2 2 -2 2]
[ 2 -2 -2 2]
[-2 2 2 2]]

np.where(arr > 0, 2, arr) # set only positive values to 2

array([[-0.7861962 , -0.13946245, 2. , -0.71467818],


[-0.54832876, 2. , -1.6592476 , 2. ],
[ 2. , -0.47030354, -0.18876408, 2. ],
[-1.23079449, 2. , 2. , 2. ]])

Mathematical and Statistical Methods

arr = np.random.randn(5, 4)
print(arr)
print(arr.mean())
print(np.mean(arr))
print(arr.sum())

[[ 0.17956617 2.05098392 -1.61486814 2.5413988 ]


[-0.71117579 -0.13974106 0.45235796 -1.11524969]
[-0.836497 -1.08838655 1.46391792 -1.75634215]
[-0.30571802 -0.70409783 -0.31632973 -0.16328416]
[ 0.33176937 1.48871505 -0.08836423 -1.14083929]]
-0.07360922308197385
-0.07360922308197385
-1.4721844616394768

print(arr.mean(axis=1))
print(arr.sum(axis=0))

[ 0.78927018 -0.37845215 -0.55432695 -0.37235743 0.14782023]


[-1.34205527 1.60747353 -0.10328622 -1.6343165 ]

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
print(arr.cumsum())

[ 0 1 3 6 10 15 21 28]

arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(arr)
print(arr.cumsum(axis=0))
print(arr.cumprod(axis=1))

[[0 1 2]
[3 4 5]
[6 7 8]]
[[ 0 1 2]
[ 3 5 7]
[ 9 12 15]]
[[ 0 0 0]
[ 3 12 60]
[ 6 42 336]]
Sorting

arr = np.random.randn(6)
print(arr)
arr.sort()
print(arr)

[-0.88881295 0.04444553 1.5204771 -2.58201627 0.05553665 -1.43352633]


[-2.58201627 -1.43352633 -0.88881295 0.04444553 0.05553665 1.5204771 ]

arr = np.random.randn(5, 3)
print(arr)
arr.sort(1)
print(arr)

[[-0.49561045 0.4421526 1.05353799]


[ 0.23263088 0.4252005 0.10175446]
[-0.99459786 0.14258819 0.06924503]
[-1.5746287 -0.29423728 -1.80719126]
[ 1.56217843 1.15934127 0.57719154]]
[[-0.49561045 0.4421526 1.05353799]
[ 0.10175446 0.23263088 0.4252005 ]
[-0.99459786 0.06924503 0.14258819]
[-1.80719126 -1.5746287 -0.29423728]
[ 0.57719154 1.15934127 1.56217843]]

Unique and Other Set Logic

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
print(np.unique(names))
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
print(np.unique(ints))

['Bob' 'Joe' 'Will']


[1 2 3 4]

sorted(set(names))

['Bob', 'Joe', 'Will']

values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])

array([ True, False, False, True, True, False, True])

Linear Algebra
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
print(x)
print(y)
print(x.dot(y))

[[1. 2. 3.]
[4. 5. 6.]]
[[ 6. 23.]
[-1. 7.]
[ 8. 9.]]
[[ 28. 64.]
[ 67. 181.]]

np.dot(x, y)

array([[ 28., 64.],


[ 67., 181.]])

np.dot(x, np.ones(3))

array([ 6., 15.])

x @ np.ones(3)

array([ 6., 15.])

from numpy.linalg import inv, qr
X = np.random.randn(5, 5)
mat = X.T.dot(X)
inv(mat)
mat.dot(inv(mat))
q, r = qr(mat)
print(q)
print(r)

[[-0.58004069 0.11200658 0.08368855 -0.31605658 -0.73763934]


[-0.23378042 -0.91202995 -0.23232723 -0.21689983 0.1119223 ]
[ 0.41480462 0.13313976 -0.80841698 -0.28352439 -0.27620073]
[ 0.65890338 -0.32318967 0.48205523 -0.10811917 -0.46618428]
[ 0.05179747 0.18297032 0.23044529 -0.87234561 0.38697155]]
[[-2.41236041 -2.4733672 1.87314495 3.87411293 2.15830329]
[ 0. -7.77357008 2.30369337 -2.22309322 1.34600138]
[ 0. 0. -2.05747106 2.97280974 2.27854315]
[ 0. 0. 0. -2.08140469 -3.13479868]
[ 0. 0. 0. 0. 0.27828391]]

Pseudorandom Number Generation

samples = np.random.normal(size=(4, 4))
samples
array([[ 1.27875166, 0.11922203, -0.11073966, 1.43658446],
[ 0.65663682, -3.35054852, 0.43615499, -1.7629089 ],
[-2.32263956, -0.50438609, 0.33691217, 0.57689917],
[-1.00524953, 0.18329934, 2.47755654, 0.53509911]])

from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0, 1) for _ in range(N)]
%timeit np.random.normal(size=N)

959 ms ± 78.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
48.5 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

np.random.seed(1234)

rng = np.random.RandomState(1234)
rng.randn(10)

array([ 0.47143516, -1.19097569, 1.43270697, -0.3126519 , -0.72058873,


0.88716294, 0.85958841, -0.6365235 , 0.01569637, -2.24268495])

 0s completed at 21:44

You might also like