0% found this document useful (0 votes)
13 views3 pages

Topic 4 Aggregates

Uploaded by

sobaba6180
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views3 pages

Topic 4 Aggregates

Uploaded by

sobaba6180
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Aggregates

Pruthvish Rajput, Venus Patel


February 23, 2023

1 Aggregations: Min, Max, and Everything In Between


• a first step in data processing is to compute summary statistics.
– the mean and standard deviation
– the sum
– product
– median
– minimum and maximum, quantiles, etc.

1.1 Summing the Values in an Array


As a quick example, consider computing the sum of all values in an array. Python itself can do this
using the built-in sum function:
[1]: import numpy as np

[2]: L = np.random.random(100)
sum(L)

[2]: 51.93544860115952

The syntax is quite similar to that of NumPy’s sum function, and the result is the same in the
simplest case:
[3]: np.sum(L)

[3]: 51.93544860115953

However, because it executes the operation in compiled code, NumPy’s version of the operation is
computed much more quickly:
[4]: big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

94.2 ms ± 5.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
845 µs ± 102 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
• Be careful about: the sum function and the np.sum function

1
1.2 Minimum and Maximum
Similarly, Python has built-in min and max functions, used to find the minimum value and maximum
value of any given array:
[5]: min(big_array), max(big_array)

[5]: (6.064240321013159e-08, 0.9999998126919177)

NumPy’s corresponding functions have similar syntax, and again operate much more quickly:
[6]: np.min(big_array), np.max(big_array)

[6]: (6.064240321013159e-08, 0.9999998126919177)

[7]: %timeit min(big_array)


%timeit np.min(big_array)

58.7 ms ± 7.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
570 µs ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For min, max, sum, and several other NumPy aggregates, a shorter syntax is to use methods of the
array object itself:
[8]: print(big_array.min(), big_array.max(), big_array.sum())

6.064240321013159e-08 0.9999998126919177 500287.889621271


Whenever possible, make sure that you are using the NumPy version of these aggregates when
operating on NumPy arrays!

1.2.1 Multi dimensional aggregates


One common type of aggregation operation is an aggregate along a row or column. Say you have
some data stored in a two-dimensional array:
[9]: M = np.random.random((3, 4))
print(M)

[[0.28500679 0.68234357 0.06552604 0.04215306]


[0.60573798 0.67655455 0.69527212 0.24607059]
[0.89005827 0.13258705 0.35994861 0.97976416]]
By default, each NumPy aggregation function will return the aggregate over the entire array:
[10]: M.sum()

[10]: 5.661022791056153

Aggregation functions take an additional argument specifying the axis along which the aggregate is
computed. For example, we can find the minimum value within each column by specifying axis=0:

2
[11]: M.min(axis=0)

[11]: array([0.28500679, 0.13258705, 0.06552604, 0.04215306])

The function returns four values, corresponding to the four columns of numbers.
Similarly, we can find the maximum value within each row:
[12]: M.max(axis=1)

[12]: array([0.68234357, 0.69527212, 0.97976416])

1.2.2 Other aggregation functions

Function Name NaN-safe Version Description


np.sum np.nansum Compute sum of elements
np.prod np.nanprod Compute product of elements
np.mean np.nanmean Compute mean of elements
np.std np.nanstd Compute standard deviation
np.var np.nanvar Compute variance
np.min np.nanmin Find minimum value
np.max np.nanmax Find maximum value
np.argmin np.nanargmin Find index of minimum value
np.argmax np.nanargmax Find index of maximum value
np.median np.nanmedian Compute median of elements
np.percentile np.nanpercentile Compute rank-based statistics of elements
np.any N/A Evaluate whether any elements are true
np.all N/A Evaluate whether all elements are true

You might also like