{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Python " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Numpy](https://numpy.org/)\n", "\n", "Why use NumPy?\n", "\n", "NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. \n", "NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#dir(np)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numpy Basic Data Types\n", "\n", "#### Type Array\n", "\n", "An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the dtype of the array. \n", "\n", "An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The rank of the array is the number of dimensions. The shape of the array is a tuple of integers giving the size of the array along each dimension. \n", "\n", "One way we can initialize NumPy arrays is from Python lists, using nested lists for two- or higher-dimensional data. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_numbers = [1,2,3,4]\n", "simple_array = np.array(my_numbers)\n", "simple_array" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numpy.ndarray" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(simple_array)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#dir(simple_array)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4,)\n", "4\n", "1\n" ] } ], "source": [ "print(simple_array.shape)\n", "print(simple_array.size)\n", "print(simple_array.ndim)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3, 4],\n", " [4, 5, 6, 7],\n", " [8, 9, 0, 1]])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_other_numbers = [[1,2,3,4],[4,5,6,7],[8,9,0,1]]\n", "other_simple_array = np.array(my_other_numbers)\n", "other_simple_array" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 4)\n", "12\n", "2\n" ] } ], "source": [ "print(other_simple_array.shape)\n", "print(other_simple_array.size)\n", "print(other_simple_array.ndim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Operations between arrays (scalar and vectorial)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "A = np.array([[1,2,3],[4,5,6],[8,9,0]])\n", "B = np.array([[2,1,5],[9,2,1],[8,7,6]])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [8, 9, 0]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2, 1, 5],\n", " [9, 2, 1],\n", " [8, 7, 6]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 3, 4, 5],\n", " [ 6, 7, 8],\n", " [10, 11, 2]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A + 2" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 3, 6, 9],\n", " [12, 15, 18],\n", " [24, 27, 0]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A * 3" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.25, 0.5 , 0.75],\n", " [1. , 1.25, 1.5 ],\n", " [2. , 2.25, 0. ]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A / 4" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 3, 3, 8],\n", " [13, 7, 7],\n", " [16, 16, 6]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A + B" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 2, 2, 15],\n", " [36, 10, 6],\n", " [64, 63, 0]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A * B" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 44, 26, 25],\n", " [101, 56, 61],\n", " [ 97, 26, 49]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(B)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 4, 8],\n", " [2, 5, 9],\n", " [3, 6, 0]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.T\n", "#A.transpose()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [8, 9, 0]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Creating arrays" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n", " 17, 18, 19])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(20) # ==> np.arange(0,20,1)\n", "a" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2, 3.4,\n", " 3.6, 3.8, 4. , 4.2, 4.4, 4.6, 4.8])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(1,5,0.2)\n", "a" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1. , 1.31034483, 1.62068966, 1.93103448, 2.24137931,\n", " 2.55172414, 2.86206897, 3.17241379, 3.48275862, 3.79310345,\n", " 4.10344828, 4.4137931 , 4.72413793, 5.03448276, 5.34482759,\n", " 5.65517241, 5.96551724, 6.27586207, 6.5862069 , 6.89655172,\n", " 7.20689655, 7.51724138, 7.82758621, 8.13793103, 8.44827586,\n", " 8.75862069, 9.06896552, 9.37931034, 9.68965517, 10. ])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.linspace(1,10,30)\n", "#b = np.linspace(1,2*np.pi,50)\n", "b" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1.00000000e+001, 2.59294380e+004, 6.72335754e+007, 1.74332882e+011,\n", " 4.52035366e+014, 1.17210230e+018, 3.03919538e+021, 7.88046282e+024,\n", " 2.04335972e+028, 5.29831691e+031, 1.37382380e+035, 3.56224789e+038,\n", " 9.23670857e+041, 2.39502662e+045, 6.21016942e+048, 1.61026203e+052,\n", " 4.17531894e+055, 1.08263673e+059, 2.80721620e+062, 7.27895384e+065,\n", " 1.88739182e+069, 4.89390092e+072, 1.26896100e+076, 3.29034456e+079,\n", " 8.53167852e+082, 2.21221629e+086, 5.73615251e+089, 1.48735211e+093,\n", " 3.85662042e+096, 1.00000000e+100])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b2 = np.logspace(1,100,30)\n", "b2" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0. 0. 0.]\n", " [0. 0. 0. 0.]\n", " [0. 0. 0. 0.]]\n" ] } ], "source": [ "a1 = np.zeros((3,4))\n", "print(a1)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1.]\n", " [1. 1.]]\n" ] } ], "source": [ "a2 = np.ones((2,2))\n", "print(a2)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1.67013635e-316 0.00000000e+000 0.00000000e+000]\n", " [0.00000000e+000 0.00000000e+000 0.00000000e+000]]\n" ] } ], "source": [ "a3 = np.empty((2,3))\n", "print(a3)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 0. 0.]\n", " [0. 1. 0.]\n", " [0. 0. 1.]]\n" ] } ], "source": [ "a4 = np.identity(3)\n", "print(a4)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0., 0.],\n", " [0., 1., 0.],\n", " [0., 0., 1.]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.eye(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Modifying Dimensions" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = np.arange(10)\n", "c" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2, 3, 4],\n", " [5, 6, 7, 8, 9]])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = c.reshape(2,5)\n", "#d = np.arange(10).reshape(2,5)\n", "d" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(10,)\n", "(2, 5)\n", "2\n", "int64\n" ] } ], "source": [ "print(c.shape)\n", "print(d.shape)\n", "print(np.ndim(d))\n", "print(d.dtype.name)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19],\n", " [20, 21, 22, 23, 24],\n", " [25, 26, 27, 28, 29],\n", " [30, 31, 32, 33, 34],\n", " [35, 36, 37, 38, 39],\n", " [40, 41, 42, 43, 44],\n", " [45, 46, 47, 48, 49]],\n", "\n", " [[50, 51, 52, 53, 54],\n", " [55, 56, 57, 58, 59],\n", " [60, 61, 62, 63, 64],\n", " [65, 66, 67, 68, 69],\n", " [70, 71, 72, 73, 74],\n", " [75, 76, 77, 78, 79],\n", " [80, 81, 82, 83, 84],\n", " [85, 86, 87, 88, 89],\n", " [90, 91, 92, 93, 94],\n", " [95, 96, 97, 98, 99]]])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d2 = np.arange(100).reshape(2,10,5)\n", "d2" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d2.ndim" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 10, 5)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d2.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Slicing multidimensional arrays" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19],\n", " [20, 21, 22, 23, 24],\n", " [25, 26, 27, 28, 29],\n", " [30, 31, 32, 33, 34],\n", " [35, 36, 37, 38, 39],\n", " [40, 41, 42, 43, 44],\n", " [45, 46, 47, 48, 49]],\n", "\n", " [[50, 51, 52, 53, 54],\n", " [55, 56, 57, 58, 59],\n", " [60, 61, 62, 63, 64],\n", " [65, 66, 67, 68, 69],\n", " [70, 71, 72, 73, 74],\n", " [75, 76, 77, 78, 79],\n", " [80, 81, 82, 83, 84],\n", " [85, 86, 87, 88, 89],\n", " [90, 91, 92, 93, 94],\n", " [95, 96, 97, 98, 99]]])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d2" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[21, 22],\n", " [26, 27]]])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d2[0:1,4:6,1:3]" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "d2[d2%2==0]\n", "print(np.ndim(d2[d2%2==0]))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "#np.mask_indices?" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,\n", " 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67,\n", " 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d2[~d2%2==0] #negation of condition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Stacking and Concatenating Arrays" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11],\n", " [12, 13, 14, 15]])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(16).reshape(4,4)\n", "a" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11],\n", " [12, 13, 14, 15],\n", " [ 0, 1, 2, 3]])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.vstack([a,np.arange(4).reshape(1,4)])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 0],\n", " [ 4, 5, 6, 7, 1],\n", " [ 8, 9, 10, 11, 2],\n", " [12, 13, 14, 15, 3]])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hstack([a,np.arange(4).reshape(4,1)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use the generic np.stack:" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [], "source": [ "a1 = np.array([1, 2, 3, 4])\n", "a2 = np.array([5, 6, 7, 8])" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3, 4],\n", " [5, 6, 7, 8]])" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.stack((a1,a2), axis=0)" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 5],\n", " [2, 6],\n", " [3, 7],\n", " [4, 8]])" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.stack((a1,a2), axis=1)" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6, 7, 8])" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate((a1,a2), axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting array data:" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 1, 5, 3, 7, 4, 6, 8])" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])\n", "arr" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6, 7, 8])" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sort(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Type Matrix" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,2.],[4,3]])\n", "b = np.array([[1,9],[7,5]])" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 2.],\n", " [4., 3.]])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 9],\n", " [7, 5]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "#dir(a)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 18.],\n", " [28., 15.]])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a * b" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "A = np.matrix(a)\n", "B = np.matrix(b)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[1, 9],\n", " [7, 5]])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "#dir(B)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[-0.0862069 , 0.15517241],\n", " [ 0.12068966, -0.01724138]])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B.I" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matrix([[15., 19.],\n", " [25., 51.]])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A * B" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "print(type(a))\n", "print(type(A))\n", "print(type(a * b))\n", "print(type(A * B))\n", "print(type(a * B))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### datatypes" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int64\n" ] } ], "source": [ "x = np.array([1, 2]) # Let numpy choose the datatype\n", "print(x.dtype) # Prints \"int64\"" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n" ] } ], "source": [ "x = np.array([1.0, 2.0]) # Let numpy choose the datatype\n", "print(x.dtype) # Prints \"float64\"" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n" ] } ], "source": [ "x = np.array([1, 2], dtype=np.float64) # Force a particular datatype\n", "print(x.dtype) # Prints \"int64\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Array Math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inline and vectorized operations:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 2.],\n", " [4., 3.]])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([[1,2.],[4,3]])\n", "a" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2., 4.],\n", " [8., 6.]])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a * 2" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 2.],\n", " [4., 3.]])" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# the original array stays the same\n", "a" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 3., 7., 10.])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.cumsum()" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11],\n", " [12, 13, 14, 15],\n", " [ 0, 1, 2, 3]])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(16).reshape(4,4)\n", "np.vstack([a,np.arange(4).reshape(1,4)])" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 0],\n", " [ 4, 5, 6, 7, 1],\n", " [ 8, 9, 10, 11, 2],\n", " [12, 13, 14, 15, 3]])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hstack([a,np.arange(4).reshape(4,1)])" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 6. 8.]\n", " [10. 12.]]\n", "[[ 6. 8.]\n", " [10. 12.]]\n" ] } ], "source": [ "x = np.array([[1,2],[3,4]], dtype=np.float64)\n", "y = np.array([[5,6],[7,8]], dtype=np.float64)\n", "\n", "# Elementwise sum; both produce the array\n", "# [[ 6.0 8.0]\n", "# [10.0 12.0]]\n", "print(x + y)\n", "print(np.add(x, y))" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-4. -4.]\n", " [-4. -4.]]\n", "[[-4. -4.]\n", " [-4. -4.]]\n" ] } ], "source": [ "# Elementwise difference; both produce the array\n", "# [[-4.0 -4.0]\n", "# [-4.0 -4.0]]\n", "print(x - y)\n", "print(np.subtract(x, y))" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 5. 12.]\n", " [21. 32.]]\n", "[[ 5. 12.]\n", " [21. 32.]]\n" ] } ], "source": [ "# Elementwise product; both produce the array\n", "# [[ 5.0 12.0]\n", "# [21.0 32.0]]\n", "print(x * y)\n", "print(np.multiply(x, y))" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.2 0.33333333]\n", " [0.42857143 0.5 ]]\n", "[[0.2 0.33333333]\n", " [0.42857143 0.5 ]]\n" ] } ], "source": [ "# Elementwise division; both produce the array\n", "# [[ 0.2 0.33333333]\n", "# [ 0.42857143 0.5 ]]\n", "print(x / y)\n", "print(np.divide(x, y))" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1.41421356]\n", " [1.73205081 2. ]]\n" ] } ], "source": [ "# Elementwise square root; produces the array\n", "# [[ 1. 1.41421356]\n", "# [ 1.73205081 2. ]]\n", "print(np.sqrt(x))" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "x = np.array([[1,2],[3,4]])\n", "y = np.array([[5,6],[7,8]])\n", "v = np.array([9,10])\n", "w = np.array([11, 12])" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]]\n", "\n", "[[5 6]\n", " [7 8]]\n", "\n", "[ 9 10]\n", "\n", "[11 12]\n" ] } ], "source": [ "print(x)\n", "print()\n", "print(y)\n", "print()\n", "print(v)\n", "print()\n", "print(w)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "219 \n", "\n", "219\n" ] } ], "source": [ "# Inner product of vectors; both produce 219\n", "print(v.dot(w), '\\n')\n", "print(np.dot(v, w))" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[29 67] \n", "\n", "[29 67]\n" ] } ], "source": [ "# Matrix / vector product; both produce the rank 1 array [29 67]\n", "print(x.dot(v), '\\n')\n", "print(np.dot(x, v))" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[19 22]\n", " [43 50]] \n", "\n", "[[19 22]\n", " [43 50]]\n" ] } ], "source": [ "# Matrix / matrix product; both produce the rank 2 array\n", "print(x.dot(y), '\\n')\n", "print(np.dot(x, y))" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]] \n", "\n", "10 \n", "\n", "[4 6] \n", "\n", "[3 7]\n" ] } ], "source": [ "x = np.array([[1,2],[3,4]])\n", "print(x, '\\n')\n", "print(np.sum(x), '\\n') # Compute sum of all elements; prints \"10\"\n", "print(np.sum(x, axis=0), '\\n') # Compute sum of each column; prints \"[4 6]\"\n", "print(np.sum(x, axis=1)) # Compute sum of each row; prints \"[3 7]\"" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "x = np.array([[1,2], [3,4]])" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]] \n", "\n", "[[1 3]\n", " [2 4]]\n" ] } ], "source": [ "print(x, '\\n')\n", "print(x.T)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "# Note that taking the transpose of a rank 1 array does nothing:\n", "v = np.array([1,2,3])" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3] \n", "\n", "[1 2 3]\n" ] } ], "source": [ "print(v, '\\n')\n", "print(v.T)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "# We will add the vector v to each row of the matrix x,\n", "# storing the result in the matrix y\n", "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", "v = np.array([1, 0, 1])\n", "y = np.empty_like(x) # Create an empty matrix with the same shape as x" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1 2 3]\n", " [ 4 5 6]\n", " [ 7 8 9]\n", " [10 11 12]] \n", "\n", "[1 0 1] \n", "\n", "[[ 35881264 0 206158430253]\n", " [193273528375 210453397555 214748364884]\n", " [249108103217 231928234032 223338299450]\n", " [197568495672 206158430258 386547056692]]\n" ] } ], "source": [ "print(x, '\\n')\n", "print(v, '\\n')\n", "print(y)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2 2 4]\n", " [ 5 5 7]\n", " [ 8 8 10]\n", " [11 11 13]]\n" ] } ], "source": [ "for i in range(4):\n", " y[i, :] = x[i, :] + v\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works; however when the matrix x is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix x is equivalent to forming a matrix vv by stacking multiple copies of v vertically, then performing elementwise summation of x and vv. We could implement this approach like this:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 0 1]\n", " [1 0 1]\n", " [1 0 1]\n", " [1 0 1]]\n" ] } ], "source": [ "# We will add the vector v to each row of the matrix x,\n", "# storing the result in the matrix y\n", "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", "v = np.array([1, 0, 1])\n", "vv = np.tile(v, (4, 1)) # Stack 4 copies of v on top of each other\n", "print(vv)" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2 2 4]\n", " [ 5 5 7]\n", " [ 8 8 10]\n", " [11 11 13]]\n" ] } ], "source": [ "y = x + vv # Add x and vv elementwise\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2 2 4]\n", " [ 5 5 7]\n", " [ 8 8 10]\n", " [11 11 13]]\n" ] } ], "source": [ "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n", "v = np.array([1, 0, 1])\n", "y = x + v # Add v to each row of x using broadcasting\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The line y = x + v works even though x has shape (4, 3) and v has shape (3,) due to broadcasting; this line works as if v actually had shape (4, 3), where each row was a copy of v, and the sum was performed elementwise." ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[1.40726934 1.96702702 1.50401247 1.05340359]\n", " [1.40726934 1.96702702 1.50401247 1.05340359]\n", " [1.40726934 1.96702702 1.50401247 1.05340359]]\n", "\n", " [[1.82399241 1.53184192 1.40375806 1.15097293]\n", " [1.82399241 1.53184192 1.40375806 1.15097293]\n", " [1.82399241 1.53184192 1.40375806 1.15097293]]\n", "\n", " [[1.36346038 1.85220375 1.07174827 1.23426041]\n", " [1.36346038 1.85220375 1.07174827 1.23426041]\n", " [1.36346038 1.85220375 1.07174827 1.23426041]]\n", "\n", " [[1.51218616 1.21828011 1.4603647 1.53051323]\n", " [1.51218616 1.21828011 1.4603647 1.53051323]\n", " [1.51218616 1.21828011 1.4603647 1.53051323]]\n", "\n", " [[1.54070196 1.68114157 1.66453639 1.25187083]\n", " [1.54070196 1.68114157 1.66453639 1.25187083]\n", " [1.54070196 1.68114157 1.66453639 1.25187083]]]\n" ] } ], "source": [ "# Initialize `x` and `y`\n", "x = np.ones((3,4))\n", "y = np.random.random((5,1,4))\n", "\n", "# Add `x` and `y`\n", "print(x + y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You see that, even though x and y seem to have somewhat different dimensions, the two can be added together. \n", "That is because they are compatible in all dimensions:\n", "\n", " Array x has dimensions 3 X 4,\n", " Array y has dimensions 5 X 1 X 4\n", "\n", "Since you have seen above that dimensions are also compatible if one of them is equal to 1, you see that these two arrays are indeed a good candidate for broadcasting! \n", "\n", "What you will notice is that in the dimension where y has size 1 and the other array has a size greater than 1 (that is, 3), the first array behaves as if it were copied along that dimension. \n", "\n", "Note that the shape of the resulting array will again be the maximum size along each dimension of x and y: the dimension of the result will be (5,3,4) \n", "\n", "In short, if you want to make use of broadcasting, you will rely a lot on the shape and dimensions of the arrays with which you’re working. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Useful functions:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [], "source": [ "grades1 = np.array([1.0,3,5.0,7,9,2,4,6])\n", "grades2 = np.array([0.9,3,4.9,7,9,4,4,6])" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([2, 3, 4, 7]),)" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.where(grades1 > 4)" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['lower', 'lower', 'bigger', 'bigger', 'bigger', 'lower', 'lower',\n", " 'bigger'], dtype=' 4, 'bigger', 'lower')" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades1.argmin()" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades1.argmax()" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 5, 1, 6, 2, 7, 3, 4])" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grades1.argsort()" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3., 4., 6., 7., 9.])" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.intersect1d(grades1,grades2)" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(grades1,grades2,0.1)" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(grades1,grades2,0.5)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }