Topic 1 IntroductionToNumpy-2
Topic 1 IntroductionToNumpy-2
1 Introduction to NumPy
• Used for effectively loading, storing, and manipulating in-memory data in Python.
• Datasets can come from a wide range of sources and a wide range of formats
– documents
– images
– sound clips
– numerical measurements
• Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays
of numbers.
• For example, images–particularly digital images–can be thought of as simply two-dimensional
arrays of numbers representing pixel brightness across the area.
• Sound clips can be thought of as one-dimensional arrays of intensity versus time.
• Text can be converted in various ways into numerical representations, perhaps binary digits
representing the frequency of certain words or pairs of words.
• For this reason, efficient storage and manipulation of numerical arrays is absolutely funda-
mental to the process of doing data science.
• This chapter will cover NumPy in detail. NumPy (short for Numerical Python) provides an
efficient interface to store and operate on dense data buffers.
• In some ways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide
much more efficient storage and data operations as the arrays grow larger in size.
1
# Python code
x = 4
x = "four"
/* C code */
int x = 4;
x = "four"; // FAILS
• This sort of flexibility is one piece that makes Python and other dynamically-typed
languages convenient and easy to use.
• Understanding how this works is an important piece of learning to analyze data efficiently
and effectively with Python.
• But, they also contain extra information about the type of the value. We’ll explore this
more in the sections that follow.
2
• A C integer is essentially a label for a position in memory whose bytes encode an integer
value.
• A Python integer is a pointer to a position in memory containing all the Python object
information, including the bytes that contain the integer value.
• This extra information in the Python integer structure is what allows Python to be coded so
freely and dynamically.
• All this additional information in Python types comes at a cost, however, which becomes
especially apparent in structures that combine many of these objects.
[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[2]: type(L[0])
[2]: int
• At the implementation level, the array essentially contains a single pointer to one contiguous
block of data.
• The Python list, on the other hand, contains a pointer to a block of pointers, each of which
in turn points to a full Python object like the Python integer we saw earlier.
3
• Again, the advantage of the list is flexibility: because each list element is a full structure
containing both data and type information, the list can be filled with data of any desired
type.
• Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing
and manipulating data.
• While Python’s array object provides efficient storage of array-based data, NumPy adds to
this efficient operations on that data.
• We will explore these operations in later sections; here we’ll demonstrate several ways of
creating a NumPy array.
print(a,b)
'''
b=1/a
4
2.5 Creating Arrays from Scratch
Especially for larger arrays, it is more efficient to create arrays from scratch using routines built
into NumPy. Here are several examples: zeros, ones, full
[1]: import numpy as np
a = np.zeros((3,4))
b = np.ones((4,3))
c = np.full((5,5),fill_value = 4)
print(a)
print(b)
print(c)
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[4 4 4 4 4]
[4 4 4 4 4]
[4 4 4 4 4]
[4 4 4 4 4]
[4 4 4 4 4]]
[ 0 2 4 6 8 10 12 14 16 18]
5
5 Create a 3x3 array of uniformly distributed
[7]: # Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3,3))
np.random.normal?
6
9.1 NumPy Standard Data Types
• NumPy arrays contain values of a single type, so it is important to have detailed knowledge
of those types and their limitations.
• Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other
related languages.
The standard NumPy data types are listed in the following table. Note that when constructing an
array, they can be specified using a string:
np.zeros(10, dtype='int16')
Or using the associated NumPy object:
np.zeros(10, dtype=np.int16)