0% found this document useful (0 votes)
12 views

Machine Learning With Python Cookbook 2e Preview

Uploaded by

Hendra TP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Machine Learning With Python Cookbook 2e Preview

Uploaded by

Hendra TP
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CHAPTER 1

Working with Vectors, Matrices,


and Arrays in NumPy

1.0 Introduction
NumPy is a foundational tool of the Python machine learning stack. NumPy allows
for efficient operations on the data structures often used in machine learning: vectors,
matrices, and tensors. While NumPy isn’t the focus of this book, it will show up
frequently in the following chapters. This chapter covers the most common NumPy
operations we’re likely to run into while working on machine learning workflows.

1.1 Creating a Vector


Problem
You need to create a vector.

Solution
Use NumPy to create a one-dimensional array:
# Load library
import numpy as np

# Create a vector as a row


vector_row = np.array([1, 2, 3])

# Create a vector as a column


vector_column = np.array([[1],
[2],
[3]])

1
Discussion
NumPy’s main data structure is the multidimensional array. A vector is just an array
with a single dimension. To create a vector, we simply create a one-dimensional array.
Just like vectors, these arrays can be represented horizontally (i.e., rows) or vertically
(i.e., columns).

See Also
• Vectors, Math Is Fun
• Euclidean vector, Wikipedia

1.2 Creating a Matrix


Problem
You need to create a matrix.

Solution
Use NumPy to create a two-dimensional array:
# Load library
import numpy as np

# Create a matrix
matrix = np.array([[1, 2],
[1, 2],
[1, 2]])

Discussion
To create a matrix we can use a NumPy two-dimensional array. In our solution, the
matrix contains three rows and two columns (a column of 1s and a column of 2s).
NumPy actually has a dedicated matrix data structure:
matrix_object = np.mat([[1, 2],
[1, 2],
[1, 2]])
matrix([[1, 2],
[1, 2],
[1, 2]])
However, the matrix data structure is not recommended for two reasons. First, arrays
are the de facto standard data structure of NumPy. Second, the vast majority of
NumPy operations return arrays, not matrix objects.

2 | Chapter 1: Working with Vectors, Matrices, and Arrays in NumPy


See Also
• Matrix, Wikipedia
• Matrix, Wolfram MathWorld

1.3 Creating a Sparse Matrix


Problem
Given data with very few nonzero values, you want to efficiently represent it.

Solution
Create a sparse matrix:
# Load libraries
import numpy as np
from scipy import sparse

# Create a matrix
matrix = np.array([[0, 0],
[0, 1],
[3, 0]])

# Create compressed sparse row (CSR) matrix


matrix_sparse = sparse.csr_matrix(matrix)

Discussion
A frequent situation in machine learning is having a huge amount of data; however,
most of the elements in the data are zeros. For example, imagine a matrix where the
columns are every movie on Netflix, the rows are every Netflix user, and the values
are how many times a user has watched that particular movie. This matrix would
have tens of thousands of columns and millions of rows! However, since most users
do not watch most movies, the vast majority of elements would be zero.
A sparse matrix is a matrix in which most elements are 0. Sparse matrices store only
nonzero elements and assume all other values will be zero, leading to significant
computational savings. In our solution, we created a NumPy array with two nonzero
values, then converted it into a sparse matrix. If we view the sparse matrix we can see
that only the nonzero values are stored:
# View sparse matrix
print(matrix_sparse)
(1, 1) 1
(2, 0) 3

1.3 Creating a Sparse Matrix | 3


There are a number of types of sparse matrices. However, in compressed sparse row
(CSR) matrices, (1, 1) and (2, 0) represent the (zero-indexed) indices of the
nonzero values 1 and 3, respectively. For example, the element 1 is in the second row
and second column. We can see the advantage of sparse matrices if we create a much
larger matrix with many more zero elements and then compare this larger matrix
with our original sparse matrix:
# Create larger matrix
matrix_large = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[3, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

# Create compressed sparse row (CSR) matrix


matrix_large_sparse = sparse.csr_matrix(matrix_large)
# View original sparse matrix
print(matrix_sparse)
(1, 1) 1
(2, 0) 3
# View larger sparse matrix
print(matrix_large_sparse)
(1, 1) 1
(2, 0) 3
As we can see, despite the fact that we added many more zero elements in the larger
matrix, its sparse representation is exactly the same as our original sparse matrix.
That is, the addition of zero elements did not change the size of the sparse matrix.
As mentioned, there are many different types of sparse matrices, such as compressed
sparse column, list of lists, and dictionary of keys. While an explanation of the differ‐
ent types and their implications is outside the scope of this book, it is worth noting
that while there is no “best” sparse matrix type, there are meaningful differences
among them, and we should be conscious about why we are choosing one type over
another.

See Also
• SciPy documentation: Sparse Matrices
• 101 Ways to Store a Sparse Matrix

1.4 Preallocating NumPy Arrays


Problem
You need to preallocate arrays of a given size with some value.

4 | Chapter 1: Working with Vectors, Matrices, and Arrays in NumPy


Solution
NumPy has functions for generating vectors and matrices of any size using 0s, 1s, or
values of your choice:
# Load library
import numpy as np

# Generate a vector of shape (1,5) containing all zeros


vector = np.zeros(shape=5)

# View the matrix


print(vector)
array([0., 0., 0., 0., 0.])
# Generate a matrix of shape (3,3) containing all ones
matrix = np.full(shape=(3,3), fill_value=1)

# View the vector


print(matrix)
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])

Discussion
Generating arrays prefilled with data is useful for a number of purposes, such as
making code more performant or using synthetic data to test algorithms. In many
programming languages, preallocating an array of default values (such as 0s) is
considered common practice.

1.5 Selecting Elements


Problem
You need to select one or more elements in a vector or matrix.

Solution
NumPy arrays make it easy to select elements in vectors or matrices:
# Load library
import numpy as np

# Create row vector


vector = np.array([1, 2, 3, 4, 5, 6])

# Create matrix
matrix = np.array([[1, 2, 3],

1.5 Selecting Elements | 5

You might also like