Data Science Programming
Introduction to NumPy – Part 2
Week 9
Program Studi Teknik Informatika
Fakultas Teknik – Universitas Surabaya
Create Your Own ufunc
• You can create own ufunc, you have to define a function, like you do with
normal functions in Python, then you add it to your NumPy ufunc library
with the frompyfunc() method.
• The frompyfunc() method takes the following arguments:
– function - the name of the function.
– inputs - the number of input arguments (arrays).
– outputs - the number of output arrays.
# Create your own ufunc for addition
def myadd(x, y):
return x+y
myadd = np.frompyfunc(myadd, 2, 1)
print(myadd([1, 2, 3, 4], [5, 6, 7, 8])) # Output: [6 8 10 12]
Working with Boolean Arrays
Counting entries
• Given a Boolean array, there are a host of useful operations
you can do.
• To count the number of True entries in a Boolean array,
np.count_nonzero is useful
x = np.random.randint(10, size=(3, 4))
print(x)
Output:
# how many values less than 6?
print(np.count_nonzero(x < 6))
• Another way to get at this information is to use np.sum; in this
case, False is interpreted as 0, and True is interpreted as 1
print(np.sum(x < 6))
Counting entries
• The benefit of sum() is that like with other NumPy aggregation
functions, this summation can be done along rows or columns as well.
# how many values less than 6 in each row?
print(np.sum(x < 6, axis=1)) # Output: [3 2 3]
• If we’re interested in quickly checking whether any or all the values are
true, we can use (you guessed it) np.any() or np.all()
# are there any values greater than 8?
print(np.any(x > 8)) #Output: True
# are all values less than 10?
print(np.all(x < 10)) #Output: True
# are all values in each row less than 8?
print(np.all(x < 8, axis=1)) #Output: [False False True]
Fancy Indexing
Exploring Fancy Indexing
• Fancy indexing is like the simple indexing we’ve already seen, but we pass
arrays of indices in place of single scalars.
• This allows us to very quickly access and modify complicated subsets of an
array’s values.
• Fancy indexing is conceptually simple: it means passing an array of indices to
access multiple array elements at once.
• For example, consider the following array
x = np.random.randint(100, size=10)
print(x) # Output: [97 2 8 94 77 38 18 49 91 50]
• Suppose we want to access three different elements. we can pass a single list
or array of indices to obtain the result
ind = [3, 7, 4]
print(x[ind]) # Output: [94 49 77]
Exploring Fancy Indexing
• With fancy indexing, the shape of the result reflects the shape of the
index arrays rather than the shape of the array being indexed.
ind = np.array([[3, 7],[4, 5]])
print(x[ind]) Output:
• Fancy indexing also works in multiple dimensions. Consider the
following array.
X = np.arange(12).reshape((3, 4)) Output:
print(X)
• Like with standard indexing, the first index refers to the row, and the
second to the column row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
print(X[row, col]) # Output: [ 2 5 11]
Exploring Fancy Indexing
• Notice that the first value in the result is X[0, 2], the second is X[1, 1],
and the third is X[2, 3].
• If we combine a column vector and a row vector within the indices, we
get a two-dimensional result.
print(X[row[:, np.newaxis], col]) Output:
• For even more powerful operations, fancy indexing can be combined with the
other indexing schemes.
# We can combine fancy and simple indices
print(X[2, [2, 0, 1]]) # Output: [10 8 9]
# We can also combine fancy indexing with slicing
print(X[1:, [2, 0, 1]]) Output:
Modifying Values with Fancy Indexing
• Just as fancy indexing can be used to access parts of an array, it can
also be used to modify parts of an array.
• For example, imagine we have an array of indices and we’d like to set
the corresponding items in an array to some value.
x = np.arange(10)
i = np.array([2, 1, 8, 4])
x[i] = 99
print(x) # Output: [ 0 99 99 3 99 5 6 7 99 9]
• We can use any assignment-type operator for this. For example
x[i] -= 10
print(x) # Output: [ 0 89 89 3 89 5 6 7 89 9]
Sorting, Searching, and Filtering
Sorting
• This section covers algorithms related to sorting values in NumPy
arrays.
• For example, a simple selection sort repeatedly finds the minimum
value from a list, and makes swaps until the list is sorted.
• We can code this in just a few lines of Python.
def selection_sort(x):
for i in range(len(x)):
swap = i + np.argmin(x[i:])
(x[i], x[swap]) = (x[swap], x[i])
return x
x = np.array([2, 1, 4, 3, 5])
print(selection_sort(x)) # Output: [ 1 2 3 4 5 ]
Sorting
• The selection sort is useful for its simplicity, but is much too slow to be
useful for larger arrays.
• For a list of N values, it requires N loops, each of which does on the
order of ~ N comparisons to find the swap value.
• In terms of the “big-O” notation often used to characterize these
algorithms, selection sort averages O(N2).
• If you double the number of items in the list, the execution time will go
up by about a factor of four.
• Although Python has built-in sort and sorted functions to work with lists, we
won’t discuss them here because NumPy’s np.sort function turns out to be
much more efficient and useful for our purposes.
Sorting
• By default np.sort uses an O(N log N) , quicksort algorithm, though
mergesort and heapsort are also available.
• For most applications, the default quicksort is more than sufficient.
x = np.array([2, 1, 4, 3, 5])
print(np.sort(x)) # Output: [ 1 2 3 4 5 ]
• A related function is argsort, which instead returns the indices of the
sorted elements.
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i) # Output: [1 0 3 2 4]
• The first element of that result gives the index of the smallest element, the
second value gives the index of the second smallest, and so on.
Sorting
• A useful feature of NumPy’s sorting algorithms is the ability to sort
along specific rows or columns of a multidimensional array using the
axis argument.
X = np.random.randint(0, 10, (4, 6))
print(X) Output:
# sort each column of X
print(np.sort(X, axis=0))
Output:
# sort each row of X
print(np.sort(X, axis=1)) Output:
Sorting
• Sometimes we’re not interested in sorting the entire array, but simply want to
find the K smallest values in the array.
• NumPy provides this in the np.partition function.
• np.partition takes an array and a number K; the result is a new array with the
smallest K values to the left of the partition, and the remaining values to the
right, in arbitrary order.
x = np.array([7, 2, 3, 1, 6, 5, 4])
print(np.partition(x, 3)) # Output: [2 1 3 4 6 5 7]
• Note that the first three values in the resulting array are the three smallest in
the array, and the remaining array positions contain the remaining values.
• Within the two partitions, the elements have arbitrary order.
Searching
• We can search an array for a certain value, and return the indexes
that get a match. To search an array, use the where() method.
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x) # Output: (array([3, 5, 6], dtype=int64),)
• Another example: Find the indexes where the values are even or
odd
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
y = np.where(arr%2 == 1)
print(x) # Output: (array([1, 3, 5, 7], dtype=int64),)
print(y) # Output: (array([0, 2, 4, 6], dtype=int64),)
Searching
• There is a method called searchsorted() which performs a binary
search in the array, and returns the index where the specified value
would be inserted to maintain the search order.
arr = np.array([2, 7, 9, 12, 12])
# The number 8 should be inserted on index 2 to remain the sort order.
# The method starts the search from the left.
print(np.searchsorted(arr, 8)) # Output: 2
# Find the indexes where the value 10 should be inserted, starting from the right.
print(np.searchsorted(arr, 10, side='right')) # Output: 3
# Find the indexes where the values 2, 4, 6, and 11 should be inserted.
# The return value is an array: [0 1 1 3] containing the four indexes,
# where 2, 4, 6, 11 would be inserted in the original array to maintain the order.
print(np.searchsorted(arr, [2, 4, 6, 11])) # Output: [0 1 1 3]
Filtering
• Getting some elements out of an existing array and creating a
new array out of them is called filtering.
• In NumPy, you filter an array using a boolean index list.
arr = np.array([41, 42, 43, 44])
x = arr[[True, False, True, False]]
print(x) # Output: [41 43]
• The example above will return [41 43], why? Because the new
filter contains only the values where the filter array had the
value True, in this case, index 0 and 2.
Filtering
• Another example:
# Create a filter array that will return only even elements from the original array
arr = np.array([1, 2, 3, 4, 5, 6, 7])
filter_arr = arr % 2 == 0
newarr = arr[filter_arr]
print(filter_arr) #Output: [False True False True False True False]
print(newarr) #Output: [2 4 6]
# Create a filter array that will return only values higher than 42
arr = np.array([41, 42, 43, 44])
filter_arr = arr > 42
newarr = arr[filter_arr]
print(filter_arr) #Output: [False False True True]
print(newarr) #Output: [43 44]
Questions??
Exercise
• Create NRP_Nickname_ExWeek9.ipynb file.
Question 1
Create a 5X2 integer array from the range 100 to 200 so that the
difference between each element is 10. Here is an example of
what it looks like:
Exercise
Question 2
The following provides a numPy array.
np.array([[11 ,22, 33], [44, 55, 66], [77, 88, 99]])
Returns an array of items in the second column of all existing
rows. Here is the expected display:
Exercise
Question 3
The following provides a numPy array.
np.array([[3 ,6, 9, 12], [15 ,18, 21, 24],[27 ,30, 33, 36],
[39 ,42, 45, 48], [51 ,54, 57, 60]])
Returns the given array of odd rows and even columns. Here is
the expected display:
Exercise
Question 4
Add the following two NumPy arrays
arrayOne = np.array([[5, 6, 9], [21 ,18, 27]])
arrayTwo = np.array([[15 ,33, 24], [4 ,7, 1]])
And modify the resulting array by calculating the square root of each
element. Here is the expected display: