0% found this document useful (0 votes)
15 views

Numpy

Foundation of data science
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
15 views

Numpy

Foundation of data science
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 28
F Foundations of Data 412 iption Data type one a byte bool_ Boolean (True or False) stored as a ther int64 int Default integer type (same as C long: normally ether nt64 of in, nt64) inte Identical to C int (normally int32 oF ! | imtp Integer used for indexing (same as C ssizet: normally either ing) , int64) ints Byte (-128 to 127) . intl6 Integer (-32768 to 32767) int32 Integer (-2147483648 to 2147483647) into4 Integer (-9223372036854775808 to 9223372036854775807) ui Unsigned integer (0 to 255) uintl6 Unsigned integer (0 to 65535) uint32 Unsigned integer (0 to 4294967295) uint64 Unsigned integer (0 to 18446744073709551615) float_ Shorthand for float64 floatl6 Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa float32 Single-precision float: sign bit, 8 bits exponent, 23 bits mantissa float64, Double-precision float: sign bit, 11 bits exponent, 52 bits mantissa complex_ Shorthand for complex128 complex64 Complex number. represented by two 32-bit floats complex128 Complex number, represented by two 64-bit floats 4.2 THE BASICS OF NUMPY ARRAYS Data manipulation in Python is nearly synonymous with NumPy am manipulation: even newer tools like Pandas are built around the NumPy array Categories of basic array manipulations here: (a) Attributes of arrays Determining the size, shape, memory consumption. and data types of arrays. (b) Indexing of arrays Getting and Setting the value of individu! array elements UU python Libraries for Data Wrangling 413 (c) Slicing of arrays : Getting and setting smaller subarrays within a larger array (d) Reshaping of arrays Changing the shape of a given array (e) Joining and splitting of arrays Combining multiple arrays into one, and splitting one array into many (a) NumPy Array Attributes Define free random arrays, a one-dimensional, two-dimensional, and three-dimensional arrays using NumPy’s random number generator. We will seed with a set value in order to ensure that the same random arrays are generated during each execution: import numpy as np np.random.seed(O}# seed for reproducibility .5)# One-dimensional array =(2.4)}# Two-dimensional array =(3.4.5))# Three-dimensional array x/=np.random.randint( 10,si 42=np.random.randint( 10,size 33=np.random.randint( 10. Each array has the following attributes ndim : number of dimensions shape : size of each dimension size : total size of the array atype : data type of the array itemsize + lists the size (in bytes) of each array element nbytes lists the total size (in bytes) of the array: Array attributes of the array x3 can be printed using the following code. Print("x3 ndim: ", x3.ndim) Print("x3 shape:”, x3.shape) Print"x3 size: ", x3.size) Print “dtype:". x3.dtype) Print “itemsize:”, x3.itemsize. “bytes") Print"nbytes:", x3.nbytes, “bytes”) OUTPUT ndim: 3 13 shapes (3, 4. 5) a oo” Foundations of Data 4.14 Scien, 23 size: 60 dtype: int64 itemsize: 8 bytes nbytes: 480 bytes (b) Array Indexing: Accessing Single Elements Array indexing is the same as accessing an array element. We can access a array element by referring to its index number. The indexes in NumPy AITAYS stg with 0, meaning that the first element has index 0, and the second has index | e Inf5]: x1 Out[5]: array({5, 0, 3. 3. 7, 9]) In[6]: x1{0] Out[6]: 5 Inf7]: x1[4] Out{7}: 7 Example Get third and fourth elements from the following array and add them. import numpy as np arr = nparray({1, 2, 3, 4]) print(arr[2] + arr{3]) To index from the end of the array, you can use negative indices: In[8]: x1[-1] Out[8]: 9 In(9]: x1{-2] Out[9]: 7 In a multi-dimensional array. items can be accessed using a comma-sepi™ tuple of indices: In[ 10]: x2 Out[ 10]: arrav({[3. 5, 2. 4}. 17, 6. 8 8]. [1. 6.7. 7))) In[11]: x2{0. 0} Haccess the element at 0 Out{ 1]: 3 Inf 12]. x2[2, O] faccess the elemeny at 20 Out 12}: 1 Inf 13]e x2[2. -1] access the elemem ay >) Pe pynon_Uibraries for Data Wranging a oui[ 13]: 7 Example Access the third element of the second array of the first array: import numpy as np an = np.array({[[1. 2. 3]. (4. 5. 6]. [17. 8. 9}. (10, 11, 1211) prindarr(0. 1. 21) ouput 6 Elements can be modified using any of the above index notation: Inf 14]; x2[0, 0] =20 2 Oui[14]: array({[20, 5. 2. 4]. 17, 6, 8 8). (1. 6, 7, 71D) NumPy arrays have a fixed type and in an attempt to insert a floating-point value to an integer array, the fractional part will be truncated automatically. Inf15|: x1[0] =3.14159# this will be truncated! al Out 15]: array({3, 0. 3. 3, 7. 91) (©) Array Slicing: Accessing Subarrays Square brackets can be used to access the subarrays with the slice notation, marked by the colon (:) character The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x. use this: x{start:stop:step | © If value for start is not given. It Is considered as 0. © If value for end is not given. It considered as length of array in that dimension © ff value for step iy not given. ICIS considered as Slicing in One-dimensional subarrays ?>>4 =nparranget !0) >>>>4 array 12 4 4 $8.6 78 OTD Pvt SY] m first five elements arrasfo pot AP Foundations of Data Scien 4.16 >>>>x[5:] # elements after index 5 arrav([5, 6. 7, 8, 9]) >>>x/4:7] # middle sub-array arrav(|4, 5, 6]) . ie 2] # en other element, default start and end and step value is 2 array([0. 2. 4, 6, 8]) >>>a{1::2] # every other element, starting at index 1 arrav({ 1, 3, 5, 7, 9}) Use the minus operator to refer to an index from the end. When the Step valy 1s Negative, the defaults for start and stop are swapped. This becomes a convenie way to reverse an array: >>>xfez-1] # alll elements, reversed arra([9, 8, 7, 6, 5. 4. 3, 2. 1. O)) >>>x/5::-2] # reversed every other from index 5 array({S, 3. 1]) Example Slice from the index 3 from the end to index 1 from the end: import numpy as np arr = np.arravi[1, 2. 3. 4. 5. 6, 7, 8}) printtarr{-3:-1]) # from -3 column upto -2 column, exclude «1 column OUTPUT {6 7] Multi-dimensional subarrays Multi-dimensional slices work i : mthe same way, with multiple slices separ by commas. For example: >>> arravt{{20. 5. 2. 4). 17.6.8 8]. £16.77" PPP2L2 31 # owe rOWMOKL roms), three cola. & 9 o arravt{20. 5. 2). 2 columny) 17.6. 8)p python Libraries for Data Wrangling _ 4.17 poradi3. : 2] # all rows, every other column array(({20. 2]. 17. 8). Lh 7p sub-array dimensions can even be reversed together: popadficd el] ravi 7. 7. 6 I. 18 86. 7]. [ 4. 2, 5. 20) Accessing array rows and columns | | Single row or column of an array can be accessed by combining indexing and slicing, using an empty slice marked by a single colon (:): >>pprin(x2[:.0])# first column of x2 1271) >>>print(x2(0,:]# first row of x2 112524) >>>printtx2(0])# equivalent to x2{0, :] 12524) Subarrays as no-copy views Array slices return views rather than copies of the array data. NumPy array slicing differs from Python list slicing as in lists, slices will be copies. Consider the two-dimensional array of the previous example >>>printtx2) (205 2 4) 17688) llo7r7y Let's extract a 22 sub-array from this: >>>x2_sub = x2/-2, 22) Print(x2_subj IRo 5 1764 Changes made in the subarray will be reflected in the onginal array also >>>.2_subfU. 0) =200 »>>>print 2 sub) ions _of D: Ae Foundation ata Scien ([200 5] (7 6) >>>printix2) 1(200 5 2 4] 17688) 116771) This default behaviour is actually quite u: pieces of these datasets without the need to Copy the undet seful and we can access and proces; rlying larger data buffer Creating copies of arrays views. it is sometimes useful to explicit Despite the significant features of array This can be done with the copy() method copy the data within an array or a subarray. >>>x2_sub_copy=x2f:2.:2).copv) >>>printx2_sub_copy) 1(200 5} 17 63) Changes made in this subarray will not affect the original array. >>>x2_sub_copy[0.0]=402 >>>print(x2_sub_copy) [[402 3] (76) >>>print(x2) {200 5 2 4) [7688] 11677]]} (d) Reshaping of Arrays Another useful type of operation is reshaping of arrays. It is done using restuf method. For example. to put the numbers | through 9 in a 3x3 grid, we caf bo the following ee >>>grid =nparrange( |. 10)-reshapet(3. 3) >>>prinugrid) mea 145 6] 178911 python Libraries for Data Wrangling 4.19 Note « The size of the initial array must match the size of the reshaped array. «The reshape method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case. Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. This can be done with the reshape method, or by making use of the new axis keyword within a slice operation: p>ox =np.arrayi[l, 2, 3}) # row vector via reshape >>>xreshape((I, 3)) array{{1, 2, 3]]) # row vector via new axis >>>x[np.newaxis, :] arrav({{1, 2. 31) # column vector via reshape >>>xreshape((3, 1)) array {{1]. (2). BI) # column vector via newaxis >>>x/-, np.newaxis} arrav({{ 1], 12). BI) (©) Array Concatenation and Splitting All of the preceding routines worked on single arrays. It's also possible to Combine multiple arrays into one. and to conversely split a single array into multiple arrays, . Concatenation of arrays Concatenation, or joining of two arrays in NumPy, is done using the routines "Pconcatenate. np.vstack, and np.hstack, np.concatenate takes a tuple or list of arrays 48 its first argument. as seen here MI \snp.arravt{ 1.2.41) venp.arrav({3.2.11) up-concatenatet|\.0]) Out{}: array({1. 2. 3, 3, 2, 1) Inf ]: z = [100, 200, 300] print(np.concatenate([x. ¥, =])) # ¢ Out{J: {1 2 3 32 1 100 200 300] concatenate 3 array Concatenation of two-dimensional arrays grid=np.array({{1, 2, 3). 14, 5, 61) In [J:# concatenate along the first axis np.concatenate({grid. grid]) Outf J:array({{1, 2. 3]. (4, 5. 61, (1. 2, 3}. 14. 5, 6))) In []:# concetenate along the second axis (zero-indexed) np.concatenate({grid, grid], axis=1) Out{|:array({[i. 2. 3, 1. 2. 3}. 14, 5. 6. 4. 5, 61D) Concatenation of arrays of mixed dimensions For working with arrays of mixed dimensions, #np.vstack :vertical stack #np.hstack :horizontal stack #np.dstack :stack arrays along the third axis Inf J: x =np.arrasi{19, 20, 30)) grid=np.array([[9. 8. 7]. 16. 5. 4/) # vertically stack the arravy np.vstack([x, grid]) Out }:arravi{[{10. 20. 30}. 19. 8. 7). 16. 5. 41) In []: horizontally stack the arrays y =np.arravi{ [100], {100])) ap-hstack [grid, y]) Foundations of Data Scien, pynor Libraries for Data Wrangling 4.21 pullaravl 9. 8 7, 100). 16. 5. 4, 100)}) splitting of arrays Splitting breaks one array into multiple. The opposite of concatenation is splitting. which is done by the functions np.split().np-hsplit(and np.vsplit(). pass a list of indices giving the split points to these functions. N split-points, leads to N + 1 subarrays. array_split(): for splitting arrays. pass the array to split and the number of splits. The return value of the array_split() method is an array containing each of the split as an array. Each split can be accessed just like any array element: tn [Js x =[1,2,3,100,100.3.2.1] cd.x2.x3=np.split(x,[3.5]} # 3 and 5 are the split points print(x1.x2.x3) uf}: [1 2 3) (99 99] [32 1) numpy.vsplit() function split an array into multiple sub-arrays vertically (row-wise). vsplit is equivalent to split with axis=0 (default), the array is always split along the first axis regardless of the array dimension. Syntax: Aoump. s sphivary. mdices ov sections) Mh [Jerit=np arranges 16) reshape) end Out) arravsi} 01 2 I ieee P89 tod HD 14 4 ISI my upper dom eveng vyplingrul (21) Pratapper) Foundations of Data 4.22 printilower) Out{} {{0 1 2 3] 14567) W89 10 1} 11213 14 USI) The hyplit() function is used to split an array int (column-wise). split is equivalent to split with axis the second axis regardless of the array dimension 9 multiple sub-arrays horizon |. the array 1s always split ‘lon, np nspiit,2) ‘ fof roy (BH ta) ED Syntax: #numpy.hsplittary, indices_or_sections) In []: lefiright=np.hsplit(grid.[2)) print(left) print right) Oul}: (101) 1451 P89) (12 13)) 23) 167) 0 I] [14 15/1 ** Similarly. np.dsplit will split arrays along the third axis 4.2 AGGREGATIONS Computing aggregations gives insight into the nature of a potentially large dl While processing large amount of data, the first step is to compute summary sill for the data considered for analysis, Python numpy module has many wee! ’ as many age! \n Libraries for Data Wranglin, 423 functions to work with a single-din functions are sum, min, max, variance, argmin, mensional Mean, average, argmax, percentile, cumprod, Or multi-dimensional array. These Product. median. standard deviation, cumsum, and corrcoef. (a) Sum Built-in sum function: alues in an array using the built-in sum function: >>>import numpy as np peal = np.random.random; 100) >>>sumt{L) 35612091 16604941 NumPy’s sum function: computin; i the sum of all values in an array using NumPys sum function is shown below: >>>np.sum(L) 55612091 166049424 Python sum() Vs Numpy sum() >>>big_array=np.random.rand( 1000000) imeitsumibig_array) Sumeitnp.sumibig_array) 67.9 ms 989 %s per loop (mean std. dev. of 7 runs. 10 loops each) 233 %s 3.16 %s per loop (mean std. dey. of 7 runs, 1000 loops each) Numpysum() executes the operation in compiled code and 1s done much more Wickly. This Python numpy sum function allows to use an opuonal argument called @ axis. This helps to calculate the sum of a given axis Original array Foundations of Data Sores 4.26 Syntax: numpy.sum(a, axis, dtype, out) + over the specified axis. This function returns the sum of array clereg, @ : input array. jt is to be flattened. axis : axis along which to calculate the sum value. Defaul axis = 0: along the column axis = 1 : along the row out : Different array to place the result. expected output. Default is None. initial : (scalar. optional] Starting value of the sum. Return : Sum of the array elements (a scalar value if axis is none) or array wi sum values along the specified axis. The array must have same dimensions # Python Program numpy.sum{) method import numpy as np # 1D array arr =np.array{20, 2. .2. 10. 4] print(’\nSwm of arr >“. np.sumarr)) print("Sum of arrtuint8) : ". np.sumtarr. divpe =np.uint8)) print(“Sum of arrfloat32) : ". mp.sumarr. dtype =np.float32)) Output Sum of arr : 36.2 Sum of arniuint8) + 36 Sum of arrtfloat32) + 36.2 In the following example. axis = 0 & axis-1 to find the sum of each colum and row in an Numpy array. t # Python Program for numpy.sumt) method import numpy as np # 2D array arr = aparras{{14. 17, 12, 33. 44) HIS, 6, 27. 8. 19) (282 SIAL pra NiSian of are > ap. sumtarey) pan Sun of urrasis = ty MP untarr avy =Oy) prinn Sum of armas = 1) ~ ap smmare uci, =p aan =) , eens Libraries for Data Wranglir para asim of arr Keep dimension is True: \n npsun(arr, axis =1, keepdims =True)) Output sum of arr: 279 sun of arraxis = 0) : [52 25 93 42 67) sum of arrtaxis = 1): [120 75 84] (b) Minimum and Maximum Similarly, Python has built-in min and max functions. used to find the minimum value and maximum value of any given array: mintbig_array), max(big_array) Outf |: {117171281 366346 14e-06, 0.99999767849687 16) NumPy’s corresponding functions have similar syntax. and again operate much more quickly: In []:np.min(big_array). np.maxtbigarray) Out{|:(1.17171281 366346 146-06, 0.99999767849687 16) In [|:Stimeitmin( big_array) Stimeitnp.min{ big_arrav) 10 loops, best of 3: 82.3 ms per loop 1000 loops, best of 3: 497 %s per loop For min, max. sum, and several other NumPy aggregates. a shorter syntax ts to use methods of the array object itself: In [8]:printt big_array.mint), big_array.max, big_areay.sumt)) 1.171712813660-06 0.999997678497 49991 1.628197 Example program import numpy library import numpy # creating a munpy array of integers [ID array] wr snumpy arravi[ 10. 2. 40. 83. 32.770 # finding the maximum and miniman clement in the array Mma clement =numps. monary ‘min element =numpy nnn Foundations of Data Sey, 4.26 # printing the result print(‘maximum element: "max_element) print('minimum element: *. min_element) Output 85 minimum element in the array is: 2 maximum element in the array is Multidimensional aggregates row or column in an be done along @ ¢ Aggregation operation ¢: two-dimensional array. © By default, each NumPy aggregatio entire array ¢ Additional argument of the aggregate function 5] the aggregate is computed. aggregate on each column. values within each column will b n function returns the aggregate over i pecifies the axis along whic © axis= aggregated + aggregate on each row, values within each row will be aggregu © axis: Example program for 2D array aggregation # import numpy library import numpy # creating a two dimensional # numpy array of integers a =numpy.array({{1], 22, 3].[4. 55. 161,17, 88, 22] )) # finding the maximum and minimun element max_element =numpy.masta) min_element =numpy.min(a) # printing the result print('‘maximum element’, max_element) print('minimum element:’, min element) Output maximum element: 88 munimum element: 3 ype branes for Data Wrangiing 427 gsample program to aggregate 2D array sg amgort mumps librars gant MUM. aS: Np z creating @ two dimensional numpy array of imegers y= mparras([[ 11. 28. 31.14, 55. 16117, 88, 224)) 4 finding the maximum and minimum element in each column and row nax_element_column = np.maxta, 0) #column nox_clement_row = np.maxta, 1) #row in_clement_column = np.amin(arr, 0) un_element_row = np.amintarr, 1) column and row aggregation # panting the result prot maximum elements in each column:’. max_element_column) prnt‘maximum elements in each row:'.max_element_row) print'minimum elements in each column:” print!'minimum elements in each row Output aximum elements in each column : [11 88 22] maximum elements in each row : [28 55 88] minimum elements in each column ; [4 28 3] min_element_column) min_element_row) inimuom elements in each row : [3 4 7] Other aggregation functions SumPy provides many other aggregation functions and a few of them are listed below. aggregates have a NaN-safe version that ignores missing values and compute be result. "psu np.nansum Compute sum of elements "prod np.nanprod Compute product of elements *p.meun np.nanmedn Compute mean of elements "patd np.nansid Compute standard deviation Par np.nanvar Compute variance “min np nannut Find minimum element PP us yp manna Find maximum element Bargmm up rena gaun Find index of minimum element 4.28 Foundations of Data So, np.argmax np-nanargmax Find index of maximum value np.median np.nanmedian Compute median of elements np-percentile np.nanpercentile Compute rank-based statistics elements np.any N/A Evaluate whether any elements ate true np-all N/A Evaluate whether all elements ae true We will see these aggregates often throughout the rest of the book. Example program to compute aggregation values import numpy as np array! = np.array({{10, 20, 30}. (40. 50, 60]]) print("Mean: ", np.mean(array!)) np.std(array!)) np.vartarrayl)) , Mp.sum(array!)) print("Prod: ", np.prodiarray1)) OUTPUT Mean: 35.0 Std: 17.07825127659933 Var: 291.6666666666667 Sum: 210 Prad: 720000000 4.3 COMPUTATION ON NUMPY ARRays: UNIVERSAL FUNCTIONS ¢ A universal function (or ndarrays in an element-by ulune for shorty ix * @ function that operdle® -element fashion e@ = It is a “vectorized™ wrapper for a 0 . une i’ bet specific inputs and produces a fixed aS eee er Of specitic outputs, * These functions include standard trig . . arithmetic operations. handling comple nee functions, functo™ | umbers, fi n se dlatistical functio™ 4.29 non Libraries for Data_Wranglin, Challenges with Python Loops python’s default implementation (known ax CPython) does some operations very qowly. This is because of the dynamic and interpreted nature of the language As data types are flexible, the sequence of operations cannot be compiled down to efficient machine code. Python first examines the object's type and does a dynamic jookup of the correct function to use for thit type import numpy as mp np random.seedO) def compute_reciprocals( values): output=np.empty(len(values)) : for i in range(len(values)): output[i] =1.0/ valuesli] return output .random.randint(1, 100, big_array= Gaimeit compute_reciprocals(big_array) OUTPUT I loop, best of 3: 2.91 s per loop It takes several seconds to compute these million operations and to store the result. NumPy provides an efficient interface for this kind of compiled routine with static type. This can be achieved by simply performing an operation on the array that in turn applied to each element. This vectorized approach moves the loop into the compiled layer that underlies NumPy and makes the execution faster. 43.1 Characteristics of ufunc These functions operate on Numpy ndarray. © Tt implements fast element-wise array operations * It supports various features like array broadcasting, type casting ete. © Numpy. universal functions are objects of numpy.ufune class. «Python functions can alko be created as a universal function using from pyfune library function. d automatically during array arithmetic operations. For © Some ufunes are calle J internally to add two array using “+ operator example.np.add(Q) ts calles mPy arrays is very fast as they use vectorized operations * Computation on Nu fh NumPy’s universal functions (utunes) implemented throug! Foundations of Data 4.30 Scien There are two types of ufuncs: (i) Unary ufuncs which operate on single input. (ii) Binary ufuncs which operate on two inputs. NumPy’s ufuncs make use of Python's native arithmetic aan The stand addition. subtraction, multiplication. and division can all be used. The following table lists the arithmetic operators implemented in NumPy: paca Equivalent ufunc | Description + (npadd Addition (e.g. 1+ 1=2) - np.subtract Subtraction (e.g.. 3-2 = 1) - np.negative Unary negation (e.g.. — 2) : np.multiply Multiplication (e.g.. 2* 3=6) 7 7 npdivide Division (eg, 32=15) " 77 np floor_divide Floor division (e.g. 37 2=1) * np.power Exponentiation (e.g.. 2 ** 3 =8) % |np.mod Modulusremainder (eg.9%4=) import numpy as np # Array Arithmetic x = np.arrange(4) print("x =", x) prints + 5 =" 4 45) print’. 5 x -5) print "y #2 =" 2) print" /2 =". x 2) print" 1/2 =". « M2) # floor division OUTPUT v= (0123) v4 55/5678) v- Sf -$ -2/ vt25 40246] Python Libraries for Data Wrangling 4.31 1/2 210.05 1. 1.5] 22 (0011) Unary ufunc Unary - : Negation ** operator : Exponentiation % operator > modulus import numpy as np # Array Arithmetic x = np.arrange(4) print * x) print("x ** 2 =", x *#2) print("x % 2 =", x %2) OUTPUT x= 10-1 -2-3] r*2=/014 9) 1%2=/0101) Trigonometric functions These functions work on radians, so angles need to be converted to radians by multiplying by pi/180. Only then we can call trigonometric functions, They take an, array as input arguments. It includes functions like- Description Function Sin, cos, tan compute sine, cosine and tangent of angles calculate inverse sine resin, arecos, arctan hypot sinh, cosh, tanh [eompute hyperbolic sine. cosine and tangent calculate hypotenuse of given right triangle aresinh, arccosh. arctanh [compute inverse hyperbolic sine, cosine and tangent convert degree into radians convert radians into degree | 4.32 Foundations of Data Soe, import numpy as np # Trigonometric functions theta = np.linspace(0, np.pi, 3) #3 elements.between 0 and pi print("theta =", theta) print("sin(theta) = prini("cos(theta) Print("tan(theta) OUTPUT theta = [ 0. 1.57079633 314159265] Sin(theta) = [ 0.00000000e+00 1.00000000e+00 1.22464680e-16] Cos(theta) = { 1.00000000e+00 6.12323400¢-17 - 400000000 +00] fan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16] Absolute value NumPy ufune nj values. np.sin(theta)) np.cos(theta)) ~ mp.tan{theta)) p-absolute and its alias np.abs can be used for finding the abso # Absolute value import numpy as np x=np.array([-2,-1,0,1,2]) print(np.absolute(x)) print(np.abs(x)) OUTPUT : — arraw({2. 1, 0, 1, 2) array((2, 1, 0, 1, 2}) This ufunc can also handle complex data, in which the absolute value re the magnitude: x=np.array\[3-4j. 4-3), 2+0j. 0+1}]) np.abs(x) Output: array({ 5. 5., 2. 1.) Exponents and logarithms import nunpy as np x = ({10. 20, 30}) print("x =", x) print("x =". np.exp(x)) print("2hx =", np.exp2x)) print’"3x =", up.powert3, xi) pynon Libraries for Data Wrangling 4.33 OUTPUT y= (10, 2 ere ), 30) 1264658e+04 4.85165195e+08 1.06864746e+13] 1.02400000e+03 1.04857600e+06 1.07374182e+09] x = [ 59049 3486784401 205891132094649) The inverse of the exponentials, the logarithms, are also available. The basic aplog is used for natural logarithm; Base-2 logarithm or the base-10 logarithm can also be computed using the respective ufunc. omport numpy as np r=({l, 2, 4, 100)) print "x x) print “In(x) =", np.log(x) print("log2(x) =", np.log2(x)) print “loglO(x) =", np.log!0{x)) OUTPUT r= (I. 2, 4, 100] Ix) = [0. 0.69314718 1.38629436 4.60517019] log2ix) = [0, 1. 2. 6.64385619] logtO(x) = [0. 0.30103 0.60205999 2. | Specialized versions for maintaining precision with very small input. When x is very small, these functions give more precise values than the raw np.log or np.exp were to be used. "= (0. 0.001, 0.01. 0.1] Print “exp(x) - 1 =", np.expml(x)) Prat "log 1 + x , np.loglpx)) Output YIN) <1 = J 0. 0.0010005 0.01005017 0.10517092] log + x1 = 1 0. 0.009995 0.00995033 0.09531018) Specialized ufunes Num! more ufuncs available, including hyperbolic trig functions, bitwise a en operators, conversions from radians to degrees, "ounding and remainders, etc. zed and obscure ufuncs is the submodule scipy.special. If Another more specialt A is thematical function on the data, chances are {°t Want to compute some obscure ma 'S implemented in scipy.special Foundations of Data S 4.34 CT from scipy import special . “ ‘tions: # Gamma functions (generalized factorials) and related functi x = [1, 10, 100) print("gamma(x) special.gamma(x)) print("In|gamma(x)| =", special.gammatn(x)) print("beta(x, 2) special.beta(x, 2)) OUTPUT: gamma(x) = [1.000000000+000 3.628800006¢ +005 9,33262154e+155] In|gamma(x)| = [ 0. 12,80182748 359.13420537] beta(x, 2) = [5.00000000e-01 9.09090909e-03 9.90099010e-05] # Error function (integral of Gaussian)its complement, x =np.array([0, 0.4, 0.7, 1.0]) print(“erfx) =", special.erfix)) print("erfetx) =", special.erfe(x)) print( “erfin(x) =", special.erfinv(x)) and its inverse OUTPUT erftx) = [0. 0.42839236 0.67780119 0.84270079] erfe(x) = [1. 0.57160764 0.32219881 0.15729921] erfinv(x) = [0. 0.37080716 0.73286908 inf] Advanced Ufunc Features . Specifying output : out argument Out argument is used to write computation results directly to the memory locatidt For all ufuncs, this can be done using the out argument of ihe function: #out argument to store the output . import numpy as np x = np.arrange( 10) y = np.empty(10) np.multiply(x, 2, out=y) print(y) OUTPUT [ 0. 2.4.6. 8 10. 12. 14. 16. 18) This can be used with array views. every other element of a specified array. out argument can be significant. Ri { rae of a computation can be store! In larger ay if si rays the memory savings “ yn Libraries for Data Wrangling 4.35 Inf] 2 = np.arrange(5) y = mp.ceros(10) np.power(2, x, out=yl::2]) print("x: "\x) print"y: ".y) Out}: x: [0 1 2 3 4] ¥e[ 1. 0. 2.0. 4.0.8.0. 16.0.) Aggregates reduce method: A reduce repeatedly applies a given operation to the elements of an array until only a single result remains. #reduce on the add ufunc returns the sum of all elements in the array: + =np.arrange( I, 6) ‘padd.reduce(x) ourPuT; 15 # reduce on the multiply ufunc results in the product of ull array elements: * =np.arrange( 1, 6) "p-multiply.reduce(x) OUTPUT: 120 Accumulate method: to store all the intermediate results of the computation * = np.arrange( |. 6) "Padd.accumulate(x) OUTPUT: array({ 1. 3, 6. 10. 15) Suter products Outer method: ufunc can compute the output of all pairs of two different inputs Using the outer method. This creates output similar to a multiplication table "= nparrangett, $) "Paddoutens. .) Ureur ‘array({{2, 3, 4. $1. [2 4.5. OF 14.5. 6. 7). 15.6. 7. KID a 436 Foundations of Data gg 4.3.2 Computations on arrays: Broadcasting Arithmetic operations on arrays are usually done on Cornering clemen, y two arrays are of exactly the same shape, then these operations are Performed element by element arithmetic. * Smaller array is “broadcast” across the larger array to make them compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. © It is done without making copies of data and leads to efficient algorithe implementations. Broadcasting sometimes causes inefficient use of memory that slow computation. NumPy operations are usually don basis. In the simplest case, the two a the following example: e on pairs of arrays on an element-by-elemes rays must have exactly the same shape, as 6 ##arithmetic operation between arrays with same dimensions >>>x=np.array({10.0,20.0,30.0]) >>>y=np.array(2.0,2.0,2.0]) >>>rty array([22.. 22.. 32.]) Rules of Broadcasting Broadcasting in NumPy follows a strict set of rules to determine the interace® between the two arrays: * Rule 1: If the two arrays differ in the number of dimensions, the share © the one with fewer dimensions 'S prepended with ones oT. 4 Es ae | the two arrays are dissimilar in all dimension ® array with shape equal to | in that dim + match Saree there a Nension is stretched Ww 1 @ Rule 3: If in any dimension the 4p “ae an error 1s raised es disagree and neither 1s eal Array with a scalar e sealer value is strewhel ™ the shape of other array. and perfor tion Na" , 0 im « SIRE ComPUtatiON The advantage yon Libraries for Data Wrangling poadcasting is that this duplication of values useful mental model about broadcasting. ingot numpy as np cenparray((0, 1, 2]) a5 #array and a scalar arithmetic Output: aray((5. 6, 7]) Arrays of different dimensions 2 ae ree eee Te [hh p= [he 23d Pho be LIE Cbs 20 3d “Array broadcasting port numpy as np “=np.array({0, 1, 2]) # one dimensional array M =nponest(3, 3)) # 3x3 matrix of all Is Primi + a) Surpur “owl 1.2. 3.) 11,2. 34 C12. 35p ay "sion in order to match the shape of M 4.37 does not actually take place, but it is Here the one-dimensional array a is stretched. or broadcast across the second Foundations of Data Soe 4.38 Arrays can differ in their dimensions. For example, a 256 x 256 x3 ama, , RGB values, can be scaled in each color in the image by a different value, multiplying the image by a one-dimensional array with 3 values. Image( 3Darray):256x256x3 Scale(1Darray):3 Result(3Darray):256x256x3 Broadcasting of both arrays (012+ {0} = [[0. 1. 2). uy U1. 2, 3}. Pil 12, 3. 4]) Both array are stretched to match a common shape. import numpy as np @ =np.arrange(3) # 1x3 array with 3 elements np.arrange(3){:. np-newaxis] # 3x1 array with 3 elements printa + b) OUTPUT array({[0. 1. 2). U1. 2. 3). 12. 3. 41) Broadcasting in ufunc of Numpy Consider an array of 10 observations with 3 values each like the bp. Susi! temperature of a patient taken at 10 different time period of the day. This &" stored in 10 x3 array. ¥ #Broudcusting X =ap.random.randomt(10, 3) Xmean=X.mean(0) #Columawise mean value print “Mean of x(col):".Xmean) X_centered= X -Xmean enter the array X_centered_mean=X_centered. meaniO) # Met ‘an of the Cemrered vem array print’\Meun of Xcentered:".X_centered_mean) Output Meun of tcoll: arrast] 053514715, 066567217 OAAINSS Mean of Xcentered sarravi| 222044605 ,.17 21) TTISON IAF 1.665884 8de-071 | python Libraries. for Data Wrangling 4.39 Practical Example of broadcasting: Vector Quantization . Vector quantization (VQ) algorithm is used in information theory, classification, and other related areas. « The basic operation in VQ finds the closest point in a set of points, called codes, to a given point, called the observation. Arrays used in this example sample: describes the weight and height of an athlete to be classified. classes: represent different classes of athletes. * Finding the closest point requires calculating the distance between observation and each of the codes. © The shortest distance provides the best match. In this example, codes|0] is the nearest class indicating that the athlete is likely a basketball player. ‘import numpy as np from numpy import array. argmin, sqrt, sum sample = array({107.0, 198.0}) classes = array({{102.0, 203.0]. [132.0. 193.0]. [45.0, 155.0], 157.0, 173.0]]) diff = classes - sample # the broadcast happens here dist = np.sqri( sum diff**2.axis=-1)) Print("Sample belongs to Class:".np.argmin(dist)) OUTPUT Sumple belongs to Class: 0 Limitations of Broadcasting 1. Broadcasting does not work for all cases, and imposes strict rules that must be satisfied for broadcasting w be performed. Arithmetic, including broadcasting. can only be performed when the shape: of each dimension in the arrays are equal or one has the dimension size of 1 4.4 COMPARISONS, MASKS, BOOLEANLOGIC Masking means to extract, modify, count, oF otherwise manipulate values in an) aay based on some criterion, Boolean masking 1 typically the most efficient way © quantity a sub-collection in a collection, The criterta iy represented ay a true of “he boolean value

You might also like