Get (Ebook) Python Data Analysis Numpy, Matplotlib and Pandas by Bernd Klein PDF ebook with Full Chapters Now
Get (Ebook) Python Data Analysis Numpy, Matplotlib and Pandas by Bernd Klein PDF ebook with Full Chapters Now
com
https://ebooknice.com/product/python-data-analysis-numpy-
matplotlib-and-pandas-47505714
OR CLICK HERE
DOWLOAD EBOOK
ebooknice.com
ebooknice.com
ebooknice.com
(Ebook) Python Data Analytics: With Pandas, NumPy, and
Matplotlib, 3rd Edition by Fabio Nelli ISBN 9781484295311,
9781484295328, 1484295315, 1484295323
https://ebooknice.com/product/python-data-analytics-with-pandas-numpy-
and-matplotlib-3rd-edition-51983918
ebooknice.com
ebooknice.com
Data
Analysis
by
Bernd Klein
bodenseo
© 2021 Bernd Klein
All rights reserved. No portion of this book may be reproduced or used in any
manner without written permission from the copyright owner.
www.python-course.eu
Python Course
Data Analysis With
Python by Bernd
Klein
Numpy Tutorial ..........................................................................................................................8
Numpy Tutorial: Creating Arrays.............................................................................................17
Data Type Objects, dtype..........................................................................................................36
Numerical Operations on Numpy Arrays.................................................................................48
Numpy Arrays: Concatenating, Flattening and Adding Dimensions .......................................68
Python, Random Numbers and Probability ..............................................................................79
Weighted Probabilities..............................................................................................................90
Synthetical Test Data With Python.........................................................................................119
Numpy: Boolean Indexing......................................................................................................136
Matrix Multiplicaion, Dot and Cross Product ........................................................................143
Reading and Writing Data Files .............................................................................................149
Overview of Matplotlib ..........................................................................................................157
Format Plots............................................................................................................................168
Matplotlib Tutorial..................................................................................................................172
Shading Regions with fill_between() .....................................................................................183
Matplotlib Tutorial: Spines and Ticks ....................................................................................186
Matplotlib Tutorial, Adding Legends and Annotations..........................................................197
Matplotlib Tutorial: Subplots .................................................................................................212
Exercise ....................................................................................................................................44
Exercise ....................................................................................................................................44
Matplotlib Tutorial: Gridspec .................................................................................................239
GridSpec using SubplotSpec ..................................................................................................244
Matplotlib Tutorial: Histograms and Bar Plots ......................................................................248
Matplotlib Tutorial: Contour Plots .........................................................................................268
Introduction into Pandas.........................................................................................................303
Data Structures .......................................................................................................................305
Accessing and Changing values of DataFrames.....................................................................343
Pandas: groupby .....................................................................................................................361
Reading and Writing Data ......................................................................................................380
Dealing with NaN...................................................................................................................394
Binning in Python and Pandas................................................................................................404
Expenses and Income Example ..............................................................................................465
Net Income Method Example.................................................................................................478
3
NUMERICAL PROGRAMMING WITH
PYTHON
Numerical Computing defines an area of computer science and mathematics dealing with algorithms for
numerical approximations of problems from mathematical or numerical analysis, in other words: Algorithms
solving problems involving continuous variables. Numerical analysis is used to solve science and engineering
problems.
Data science is an interdisciplinary subject which includes for example statistics and computer science,
especially programming and problem solving skills. Data Science includes everything which is necessary to
create and prepare data, to manipulate, filter and clense data and to analyse data. Data can be both structured
and unstructured. We could also say Data Science includes all the techniques needed to extract and gain
information and insight from data.
Data Science is an umpbrella term which incorporates data analysis, statistics, machine learning and other
related scientific fields in order to understand and analyze data.
Another term occuring quite often in this context is "Big Data". Big Data is for sure one of the most often used
buzzwords in the software-related marketing world. Marketing managers have found out that using this term
can boost the sales of their products, regardless of the fact if they are really dealing with big data or not. The
term is often used in fuzzy ways.
Big data is data which is too large and complex, so that it is hard for data-processing application software to
deal with them. The problems include capturing and collecting data, data storage, search the data, visualization
of the data, querying, and so on.
• volume:
the sheer amount of data, whether it will be giga-, tera-, peta- or exabytes
• velocity:
the speed of arrival and processing of data
• veracity:
4
uncertainty or imprecision of data
• variety:
the many sources and types of data both structured and unstructured
The big question is how useful Python is for these purposes. If we would only use Python without any special
modules, this language could only poorly perform on the previously mentioned tasks. We will describe the
necessary tools in the following chapter.
5
Numpy is a module which provides the basic data structures,
implementing multi-dimensional arrays and matrices. Besides
that the module supplies the necessary functionalities to create
and manipulate these data structures. SciPy is based on top of
Numpy, i.e. it uses the data structures provided by NumPy. It
extends the capabilities of NumPy with further useful functions
for minimization, regression, Fourier-transformation and many
others.
The principal disadvantage of MATLAB against Python are the costs. Python with NumPy, SciPy, Matplotlib
and Pandas is completely free, whereas MATLAB can be very expensive. "Free" means both "free" as in "free
beer" and "free" as in "freedom"! Even though MATLAB has a huge number of additional toolboxes available,
Python has the advantage that it is a more modern and complete programming language. Python is continually
becoming more powerful by a rapidly growing number of specialized modules.
Python in combination with Numpy, Scipy, Matplotlib and Pandas can be used as a complete replacement for
MATLAB.
6
7
NUMPY TUTORIAL
INTRODUCTION
NumPy is a module for Python. The name is an acronym for
"Numeric Python" or "Numerical Python". It is pronounced
/ˈnʌmpaɪ/ (NUM-py) or less often /ˈnʌmpi (NUM-pee)). It is an
extension module for Python, mostly written in C. This makes
sure that the precompiled mathematical and numerical functions
and functionalities of Numpy guarantee great execution speed.
SciPy (Scientific Python) is often mentioned in the same breath with NumPy. SciPy needs Numpy, as it is
based on the data structures of Numpy and furthermore its basic creation and manipulation functions. It
extends the capabilities of NumPy with further useful functions for minimization, regression, Fourier-
transformation and many others.
Both NumPy and SciPy are not part of a basic Python installation. They have to be installed after the Python
installation. NumPy has to be installed before installing SciPy.
(Comment: The diagram of the image on the right side is the graphical visualisation of a matrix with 14 rows
and 20 columns. It's a so-called Hinton diagram. The size of a square within this diagram corresponds to the
size of the value of the depicted matrix. The colour determines, if the value is positive or negative. In our
example: the colour red denotes negative values and the colour green denotes positive values.)
NumPy is based on two earlier Python modules dealing with arrays. One of these is Numeric. Numeric is like
NumPy a Python module for high-performance, numeric computing, but it is obsolete nowadays. Another
predecessor of NumPy is Numarray, which is a complete rewrite of Numeric but is deprecated as well. NumPy
is a merger of those two, i.e. it is build on the code of Numeric and the features of Numarray.
8
When we say "Core Python", we mean Python without any special modules, i.e. especially without NumPy.
import numpy
But you will hardly ever see this. Numpy is usually renamed to np:
import numpy as np
Our first simple Numpy example deals with temperatures. Given is a list with values, e.g. temperatures in
Celsius:
cvalues = [20.1, 20.8, 21.9, 22.5, 22.7, 22.3, 21.8, 21.2, 20.9, 2
0.1]
C = np.array(cvalues)
print(C)
[20.1 20.8 21.9 22.5 22.7 22.3 21.8 21.2 20.9 20.1]
Let's assume, we want to turn the values into degrees Fahrenheit. This is very easy to accomplish with a
numpy array. The solution to our problem can be achieved by simple scalar multiplication:
print(C * 9 / 5 + 32)
[68.18 69.44 71.42 72.5 72.86 72.14 71.24 70.16 69.62 68.18]
9
The array C has not been changed by this expression:
print(C)
[20.1 20.8 21.9 22.5 22.7 22.3 21.8 21.2 20.9 20.1]
Compared to this, the solution for our Python list looks awkward:
So far, we referred to C as an array. The internal type is "ndarray" or to be even more precise "C is an instance
of the class numpy.ndarray":
type(C)
Output: numpy.ndarray
In the following, we will use the terms "array" and "ndarray" in most cases synonymously.
If you use the jupyter notebook, you might be well advised to include the following line of code to prevent an
external window to pop up and to have your diagram included in the notebook:
%matplotlib inline
The code to generate a plot for our values looks like this:
plt.plot(C)
plt.show()
10
The function plot uses the values of the array C for the values of the ordinate, i.e. the y-axis. The indices of the
array C are taken as values for the abscissa, i.e. the x-axis.
11
To calculate the memory consumption of the list from the above picture, we will use the function getsizeof
from the module sys.
The size of a Python list consists of the general list information, the size needed for the references to the
elements and the size of all the elements of the list. If we apply sys.getsizeof to a list, we get only the size
without the size of the elements. In the previous example, we made the assumption that all the integer
elements of our list have the same size. Of course, this is not valid in general, because memory consumption
will be higher for larger integers.
We will check now, how the memory usage changes, if we add another integer element to the list. We also
look at an empty list:
lst = []
print("Emtpy list size: ", size(lst))
12
Size without the size of the elements: 104
Size of all the elements: 112
Total size of list, including elements: 216
Emtpy list size: 72
We can conclude from this that for every new element, we need another eight bytes for the reference to the
new object. The new integer object itself consumes 28 bytes. The size of a list "lst" without the size of the
elements can be calculated with:
64 + 8 * len(lst)
To get the complete size of an arbitrary list of integers, we have to add the sum of all the sizes of the integers.
We will examine now the memory consumption of a numpy.array. To this purpose, we will have a look at the
implementation in the following picture:
We will create the numpy array of the previous diagram and calculate the memory usage:
We get the memory usage for the general array information by creating an empty array:
e = np.array([])
print(size(e))
96
13
We can see that the difference between the empty array "e" and the array "a" with three integers consists in 24
Bytes. This means that an arbitrary integer array of length "n" in numpy needs
96 + n * 8 Bytes
64 + 8 len(lst) + len(lst) 28
This is a minimum estimation, as Python integers can use more than 28 bytes.
When we define a Numpy array, numpy automatically chooses a fixed integer size. In our example "int64".
We can determine the size of the integers, when we define an array. Needless to say, this changes the memory
requirement:
import time
size_of_vec = 1000
def pure_python_version():
t1 = time.time()
14
X = range(size_of_vec)
Y = range(size_of_vec)
Z = [X[i] + Y[i] for i in range(len(X)) ]
return time.time() - t1
def numpy_version():
t1 = time.time()
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)
Z = X + Y
return time.time() - t1
t1 = pure_python_version()
t2 = numpy_version()
print(t1, t2)
print("Numpy is in this example " + str(t1/t2) + " faster!")
0.0010614395141601562 5.2928924560546875e-05
Numpy is in this example 20.054054054054053 faster!
It's an easier and above all better way to measure the times by using the timeit module. We will use the Timer
class in the following script.
The constructor of a Timer object takes a statement to be timed, an additional statement used for setup, and a
timer function. Both statements default to 'pass'.
The statements may contain newlines, as long as they don't contain multi-line string literals.
A Timer object has a timeit method. timeit is called with a parameter number:
timeit(number=1000000)
The main statement will be executed "number" times. This executes the setup statement once, and then returns
the time it takes to execute the main statement a "number" of times. It returns the time in seconds.
import numpy as np
from timeit import Timer
size_of_vec = 1000
X_list = range(size_of_vec)
Y_list = range(size_of_vec)
15
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)
def pure_python_version():
Z = [X_list[i] + Y_list[i] for i in range(len(X_list)) ]
def numpy_version():
Z = X + Y
for i in range(3):
t1 = timer_obj1.timeit(10)
t2 = timer_obj2.timeit(10)
print("time for pure Python version: ", t1)
print("time for Numpy version: ", t2)
print(f"Numpy was {t1 / t2:7.2f} times faster!")
time for pure Python version: 0.0021230499987723306
time for Numpy version: 0.0004346180066931993
Numpy was 4.88 times faster!
time for pure Python version: 0.003020321993972175
time for Numpy version: 0.00014882600225973874
Numpy was 20.29 times faster!
time for pure Python version: 0.002028984992648475
time for Numpy version: 0.0002098319964716211
Numpy was 9.67 times faster!
The repeat() method is a convenience to call timeit() multiple times and return a list of results:
print(timer_obj1.repeat(repeat=3, number=10))
print(timer_obj2.repeat(repeat=3, number=10))
[0.0030275019962573424, 0.002999588003149256, 0.002212086998042650
5]
[6.104000203777105e-05, 0.0001641790004214272, 1.904800592456013
e-05]
In [ ]:
16
NUMPY TUTORIAL: CREATING ARRAYS
ARANGE
The syntax of arange:
arange returns evenly spaced values within a given interval. The values are generated within the half-open
interval '[start, stop)' If the function is used with integers, it is nearly equivalent to the Python built-in function
range, but arange returns an ndarray rather than a list iterator as range does. If the 'start' parameter is not given,
it will be set to 0. The end of the interval is determined by the parameter 'stop'. Usually, the interval will not
include this value, except in some cases where 'step' is not an integer and floating point round-off affects the
length of output ndarray. The spacing between two adjacent values of the output array is set with the optional
parameter 'step'. The default value for 'step' is 1. If the parameter 'step' is given, the 'start' parameter cannot be
optional, i.e. it has to be given as well. The type of the output array can be specified with the parameter 'dtype'.
If it is not given, the type will be automatically inferred from the other input arguments.
import numpy as np
a = np.arange(1, 10)
print(a)
x = range(1, 10)
17
print(x) # x is an iterator
print(list(x))
Be careful, if you use a float value for the step parameter, as you can see in the following example:
The help of arange has to say the following for the stop parameter: "End of interval. The interval does
not include this value, except in some cases where step is not an integer and floating point round-off
affects the length of out . This is what happened in our example.
The following usages of arange is a bit offbeat. Why should we use float values, if we want integers as
result. Anyway, the result might be confusing. Before arange starts, it will round the start value, end value and
the stepsize:
This result defies all logical explanations. A look at help also helps here: "When using a non-integer step, such
as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases. Using
linspace is not an easy workaround in some situations, because the number of values has to be known.
18
LINSPACE
The syntax of linspace:
linspace returns an ndarray, consisting of 'num' equally spaced samples in the closed interval [start, stop] or the
half-open interval [start, stop). If a closed or a half-open interval will be returned, depends on whether
'endpoint' is True or False. The parameter 'start' defines the start value of the sequence which will be created.
'stop' will the end value of the sequence, unless 'endpoint' is set to False. In the latter case, the resulting
sequence will consist of all but the last of 'num + 1' evenly spaced samples. This means that 'stop' is excluded.
Note that the step size changes when 'endpoint' is False. The number of samples to be generated can be set
with 'num', which defaults to 50. If the optional parameter 'endpoint' is set to True (the default), 'stop' will be
the last sample of the sequence. Otherwise, it is not included.
import numpy as np
19
We haven't discussed one interesting parameter so far. If the optional parameter 'retstep' is set, the function
will also return the value of the spacing between adjacent values. So, the function will return a tuple
('samples', 'step'):
import numpy as np
import numpy as np
x = np.array(42)
print("x: ", x)
print("The type of x: ", type(x))
print("The dimension of x:", np.ndim(x))
x: 42
The type of x: <class 'numpy.ndarray'>
The dimension of x: 0
ONE-DIMENSIONAL ARRAYS
We have already encountered a 1-dimenional array - better known to some as vectors - in our initial example.
What we have not mentioned so far, but what you may have assumed, is the fact that numpy arrays are
containers of items of the same type, e.g. only integers. The homogenous type of the array can be determined
with the attribute "dtype", as we can learn from the following example:
20
print("F: ", F)
print("V: ", V)
print("Type of F: ", F.dtype)
print("Type of V: ", V.dtype)
print("Dimension of F: ", np.ndim(F))
print("Dimension of V: ", np.ndim(V))
F: [ 1 1 2 3 5 8 13 21]
V: [ 3.4 6.9 99.8 12.8]
Type of F: int64
Type of V: float64
Dimension of F: 1
Dimension of V: 1
[[211 212]
[221 222]]
[[311 312]
[321 322]]]
3
21
SHAPE OF AN ARRAY
The function "shape" returns the shape of an array. The shape is a tuple of
integers. These numbers denote the lengths of the corresponding array
dimension. In other words: The "shape" of an array is a tuple with the number
of elements per axis (dimension). In our example, the shape is equal to (6, 3),
i.e. we have 6 lines and 3 columns.
print(np.shape(x))
(6, 3)
print(x.shape)
(6, 3)
The shape of an array tells us also something about the order in which the indices
are processed, i.e. first rows, then columns and after that the further dimensions.
x.shape = (3, 6)
print(x)
[[67 63 87 77 69 59]
[85 87 99 79 72 71]
[63 89 93 68 92 78]]
x.shape = (2, 9)
print(x)
22
[[67 63 87 77 69 59 85 87 99]
[79 72 71 63 89 93 68 92 78]]
You might have guessed by now that the new shape must correspond to the number of elements of the array,
i.e. the total size of the new array must be the same as the old one. We will raise an exception, if this is not the
case.
x = np.array(11)
print(np.shape(x))
()
print(B.shape)
(4, 2, 3)
Single indexing behaves the way, you will most probably expect it:
23
Indexing multidimensional arrays:
print(A[1][0])
1.1
We accessed an element in the second row, i.e. the row with the index 1, and the first column (index 0). We
accessed it the same way, we would have done with an element of a nested Python list.
You have to be aware of the fact, that way of accessing multi-dimensional arrays can be highly inefficient. The
reason is that we create an intermediate array A[1] from which we access the element with the index 0. So it
behaves similar to this:
tmp = A[1]
print(tmp)
print(tmp[0])
[ 1.1 -7.8 -0.7]
1.1
There is another way to access elements of multi-dimensional arrays in Numpy: We use only one pair of
square brackets and all the indices are separated by commas:
print(A[1, 0])
1.1
We assume that you are familar with the slicing of lists and tuples. The syntax is the same in numpy for one-
dimensional arrays, but it can be applied to multiple dimensions as well.
A[start:stop:step]
We illustrate the operating principle of "slicing" with some examples. We start with the easiest case, i.e. the
slicing of a one-dimensional array:
S = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(S[2:5])
print(S[:4])
print(S[6:])
24
print(S[:])
[2 3 4]
[0 1 2 3]
[6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
We will illustrate the multidimensional slicing in the following examples. The ranges for each dimension are
separated by commas:
A = np.array([
[11, 12, 13, 14, 15],
[21, 22, 23, 24, 25],
[31, 32, 33, 34, 35],
[41, 42, 43, 44, 45],
[51, 52, 53, 54, 55]])
print(A[:3, 2:])
[[13 14 15]
[23 24 25]
[33 34 35]]
print(A[3:, :])
[[41 42 43 44 45]
[51 52 53 54 55]]
25
print(A[:, 4:])
[[15]
[25]
[35]
[45]
[55]]
The following two examples use the third parameter "step". The reshape function is used to construct the two-
dimensional array. We will explain reshape in the following subchapter:
X = np.arange(28).reshape(4, 7)
print(X)
[[ 0 1 2 3 4 5 6]
[ 7 8 9 10 11 12 13]
[14 15 16 17 18 19 20]
[21 22 23 24 25 26 27]]
print(X[::2, ::3])
[[ 0 3 6]
[14 17 20]]
26
print(X[::, ::3])
[[ 0 3 6]
[ 7 10 13]
[14 17 20]
[21 24 27]]
If the number of objects in the selection tuple is less than the dimension N, then : is assumed
for any subsequent dimensions:
A = np.array(
[ [ [45, 12, 4], [45, 13, 5], [46, 12, 6] ],
[ [46, 14, 4], [45, 14, 5], [46, 11, 5] ],
[ [47, 13, 2], [48, 15, 5], [52, 15, 1] ] ])
27
Attention: Whereas slicings on lists and tuples create new objects, a slicing operation on an array creates a
view on the original array. So we get an another possibility to access the array, or better a part of the array.
From this follows that if we modify a view, the original array will be modified as well.
A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
S = A[2:6]
S[0] = 22
S[1] = 23
print(A)
[ 0 1 22 23 4 5 6 7 8 9]
Doing the similar thing with lists, we can see that we get a copy:
lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
lst2 = lst[2:6]
lst2[0] = 22
lst2[1] = 23
print(lst)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
If you want to check, if two array names share the same memory block, you can use the function
np.may_share_memory.
np.may_share_memory(A, B)
To determine if two arrays A and B can share memory the memory-bounds of A and B are computed. The
function returns True, if they overlap and False otherwise. The function may give false positives, i.e. if it
returns True it just means that the arrays may be the same.
np.may_share_memory(A, S)
Output: True
The following code shows a case, in which the use of may_share_memory is quite useful:
A = np.arange(12)
B = A.reshape(3, 4)
A[0] = 42
print(B)
[[42 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
28
We can see that A and B share the memory in some way. The array attribute "data" is an object pointer to the
start of an array's data.
But we saw that if we change an element of one array the other one is changed as well. This fact is reflected
by may_share_memory:
np.may_share_memory(A, B)
Output: True
The result above is "false positive" example for may_share_memory in the sense that somebody may think
that the arrays are the same, which is not the case.
import numpy as np
E = np.ones((2,3))
print(E)
F = np.ones((3,4),dtype=int)
print(F)
[[1. 1. 1.]
[1. 1. 1.]]
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
What we have said about the method ones() is valid for the method zeros() analogously, as we can see in the
following example:
Z = np.zeros((2,4))
print(Z)
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
29
There is another interesting way to create an array with Ones or with Zeros, if it has to have the same shape as
another existing array 'a'. Numpy supplies for this purpose the methods ones_like(a) and zeros_like(a).
x = np.array([2,5,18,14,4])
E = np.ones_like(x)
print(E)
Z = np.zeros_like(x)
print(Z)
[1 1 1 1 1]
[0 0 0 0 0]
There is also a way of creating an array with the empty function. It creates and returns a reference to a new
array of given shape and type, without initializing the entries. Sometimes the entries are zeros, but you
shouldn't be mislead. Usually, they are arbitrary values.
np.empty((2, 4))
Output: array([[0., 0., 0., 0.],
[0., 0., 0., 0.]])
COPYING ARRAYS
NUMPY.COPY()
copy(obj, order='K')
Parameter Meaning
The possible values are {'C', 'F', 'A', 'K'}. This parameter controls the memory layout of the copy. 'C' means C-order,
order 'F' means Fortran-order, 'A' means 'F' if the object 'obj' is Fortran contiguous, 'C' otherwise. 'K' means match the
layout of 'obj' as closely as possible.
30
import numpy as np
x = np.array([[42,22,12],[44,53,66]], order='F')
y = x.copy()
x[0,0] = 1001
print(x)
print(y)
[[1001 22 12]
[ 44 53 66]]
[[42 22 12]
[44 53 66]]
print(x.flags['C_CONTIGUOUS'])
print(y.flags['C_CONTIGUOUS'])
False
True
NDARRAY.COPY()
There is also a ndarray method 'copy', which can be directly applied to an array. It is similiar to the above
function, but the default values for the order arguments are different.
a.copy(order='C')
Parameter Meaning
order The same as with numpy.copy, but 'C' is the default value for order.
import numpy as np
x = np.array([[42,22,12],[44,53,66]], order='F')
y = x.copy()
x[0,0] = 1001
print(x)
31
print(y)
print(x.flags['C_CONTIGUOUS'])
print(y.flags['C_CONTIGUOUS'])
[[1001 22 12]
[ 44 53 66]]
[[42 22 12]
[44 53 66]]
False
True
IDENTITY ARRAY
In linear algebra, the identity matrix, or unit matrix, of size n is the n × n square matrix with ones on the main
diagonal and zeros elsewhere.
• identy
• eye
identity(n, dtype=None)
The parameters:
Parameter Meaning
n An integer number defining the number of rows and columns of the output, i.e. 'n' x 'n'
dtype An optional argument, defining the data-type of the output. The default is 'float'
The output of identity is an 'n' x 'n' array with its main diagonal set to one, and all other elements are 0.
import numpy as np
np.identity(4)
32
Output: array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
It returns a 2-D array with ones on the diagonal and zeros elsewhere.
Parameter Meaning
M An optional integer for setting the number of columns in the output. If it is None, it defaults to 'N'.
Defining the position of the diagonal. The default is 0. 0 refers to the main diagonal. A positive value refers to an
k
upper diagonal, and a negative value to a lower diagonal.
eye returns an ndarray of shape (N,M). All elements of this array are equal to zero, except for the 'k'-th
diagonal, whose values are equal to one.
import numpy as np
33
Output: array([[0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0]])
The principle of operation of the parameter 'd' of the eye function is illustrated in the following diagram:
EXERCISES:
1) Create an arbitrary one dimensional array called "v".
2) Create a new array which consists of the odd indices of previously created array "v".
a = np.array([1, 2, 3, 4, 5])
34
b = a[1:4]
b[0] = 200
print(a[1])
6) Create a new array from m, in which the elements of each row are in reverse order.
8) Create an array from m, where columns and rows are in reverse order.
9) Cut of the first and last row and the first and last column.
import numpy as np
a = np.array([3,8,12,18,7,11,30])
2)
odd_elements = a[1::2]
3) reverse_order = a[::-1]
4) The output will be 200, because slices are views in numpy and not copies.
5) m = np.array([ [11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]])
6) m[::,::-1]
7) m[::-1]
8) m[::-1,::-1]
9) m[1:-1,1:-1]
35
DATA TYPE OBJECTS, DTYPE
DTYPE
The data type object 'dtype' is an instance of numpy.dtype class. It
can be created with numpy.dtype.
36
Country Population Density Area Population
Before we start with a complex data structure like the previous data, we want to introduce dtype in a very
simple example. We define an int16 data type and call this type i16. (We have to admit, that this is not a nice
name, but we use it only here!). The elements of the list 'lst' are turned into i16 types to create the two-
dimensional array A.
import numpy as np
i16 = np.dtype(np.int16)
print(i16)
A = np.array(lst, dtype=i16)
print(A)
int16
[[ 3 8 9]
[ 1 -7 0]
[ 4 12 4]]
We introduced a new name for a basic data type in the previous example. This has nothing to do with the
structured arrays, which we mentioned in the introduction of this chapter of our dtype tutorial.
STRUCTURED ARRAYS
ndarrays are homogeneous data objects, i.e. all elements of an array have to be of the same data type. The data
type dytpe on the other hand allows as to define separate data types for each column.
37
Now we will take the first step towards implementing the table with European countries and the information
on population, area and population density. We create a structured array with the 'density' column. The data
type is defined as np.dtype([('density', np.int)]) . We assign this data type to the variable 'dt'
for the sake of convenience. We use this data type in the darray definition, in which we use the first three
densities.
import numpy as np
dt = np.dtype([('density', np.int32)])
print(x)
We can access the content of the density column by indexing x with the key 'density'. It looks like accessing a
dictionary in Python:
print(x['density'])
[393 337 256]
You may wonder that we have used 'np.int32' in our definition and the internal representation shows '<i4'. We
can use in the dtype definition the type directly (e.g. np.int32) or we can use a string (e.g. 'i4'). So, we could
have defined our dtype like this as well:
dt = np.dtype([('density', 'i4')])
x = np.array([(393,), (337,), (256,)],
dtype=dt)
print(x)
[(393,) (337,) (256,)]
The 'i' means integer and the 4 means 4 bytes. What about the less-than sign in front of i4 in the result? We
could have written '<i4' in our definition as well. We can prefix a type with the '<' and '>' sign. '<' means that
38
the encoding will be little-endian and '>' means that the encoding will be big-endian. No prefix means that we
get the native byte ordering. We demonstrate this in the following by defining a double-precision floating-
point number in various orderings:
# little-endian ordering
dt = np.dtype('<d')
print(dt.name, dt.byteorder, dt.itemsize)
# big-endian ordering
dt = np.dtype('>d')
print(dt.name, dt.byteorder, dt.itemsize)
The equal character '=' stands for 'native byte ordering', defined by the operating system. In our case this
means 'little-endian', because we use a Linux computer.
Another thing in our density array might be confusing. We defined the array with a list containing one-tuples.
So you may ask yourself, if it is possible to use tuples and lists interchangeably? This is not possible. The
tuples are used to define the records - in our case consisting solely of a density - and the list is the 'container'
for the records or in other words 'the lists are cursed upon'. The tuples define the atomic elements of the
structure and the lists the dimensions.
Now we will add the country name, the area and the population number to our data type:
39
('Ireland', 65, 70280, 4581269),
('Sweden', 20, 449964, 9515744),
('Finland', 16, 338424, 5410233),
('Norway', 13, 385252, 5033675)],
dtype=dt)
print(population_table[:4])
[(b'Netherlands', 393, 41526, 16928800)
(b'Belgium', 337, 30510, 11007020)
(b'United Kingdom', 256, 243610, 62262000)
(b'Germany', 233, 357021, 81799600)]
print(population_table['density'])
print(population_table['country'])
print(population_table['area'][2:5])
[393 337 256 233 205 192 177 173 111 97 81 65 20 16 13]
[b'Netherlands' b'Belgium' b'United Kingdom' b'Germany' b'Liechten
stein'
b'Italy' b'Switzerland' b'Luxembourg' b'France' b'Austria' b'Gree
ce'
b'Ireland' b'Sweden' b'Finland' b'Norway']
[243610 357021 160]
40
('Italy', 192, 301230, 59715625),
('Switzerland', 177, 41290, 7301994),
('Luxembourg', 173, 2586, 512000),
('France', 111, 547030, 63601002),
('Austria', 97, 83858, 8169929),
('Greece', 81, 131940, 11606813),
('Ireland', 65, 70280, 4581269),
('Sweden', 20, 449964, 9515744),
('Finland', 16, 338424, 5410233),
('Norway', 13, 385252, 5033675)],
dtype=dt)
print(population_table[:4])
[('Netherlands', 393, 41526, 16928800) ('Belgium', 337, 30510, 1
1007020)
('United Kingdom', 256, 243610, 62262000)
('Germany', 233, 357021, 81799600)]
np.savetxt("population_table.csv",
population_table,
fmt="%s;%d;%d;%d",
delimiter=";")
It is highly probable that you will need to read in the previously written file at a later date. This can be
achieved with the function genfromtxt.
x = np.genfromtxt("population_table.csv",
dtype=dt,
delimiter=";")
print(x)
41
[('Netherlands', 393, 41526, 16928800) ('Belgium', 337, 30510, 1
1007020)
('United Kingdom', 256, 243610, 62262000)
('Germany', 233, 357021, 81799600)
('Liechtenstein', 205, 160, 32842) ('Italy', 192, 301230, 5
9715625)
('Switzerland', 177, 41290, 7301994)
('Luxembourg', 173, 2586, 512000) ('France', 111, 547030, 636
01002)
('Austria', 97, 83858, 8169929) ('Greece', 81, 131940, 116068
13)
('Ireland', 65, 70280, 4581269) ('Sweden', 20, 449964, 95157
44)
('Finland', 16, 338424, 5410233) ('Norway', 13, 385252, 50336
75)]
There is also a function "loadtxt", but it is more difficult to use, because it returns the strings as binary strings!
To overcome this problem, we can use loadtxt with a converter function for the first column.
x = np.loadtxt("population_table.csv",
dtype=dt,
converters={0: lambda x: x.decode('utf-8')},
delimiter=";")
print(x)
[('Netherlands', 393, 41526, 16928800) ('Belgium', 337, 30510, 1
1007020)
('United Kingdom', 256, 243610, 62262000)
('Germany', 233, 357021, 81799600)
('Liechtenstein', 205, 160, 32842) ('Italy', 192, 301230, 5
9715625)
('Switzerland', 177, 41290, 7301994)
('Luxembourg', 173, 2586, 512000) ('France', 111, 547030, 636
01002)
('Austria', 97, 83858, 8169929) ('Greece', 81, 131940, 116068
13)
('Ireland', 65, 70280, 4581269) ('Sweden', 20, 449964, 95157
44)
('Finland', 16, 338424, 5410233) ('Norway', 13, 385252, 50336
75)]
42
EXERCISES:
Before you go on, you may take time to do some exercises to deepen the understanding of the previously
learned stuff.
1. Exercise:
Define a structured array with two columns. The first column contains the product ID, which can
be defined as an int32. The second column shall contain the price for the product. How can you
print out the column with the product IDs, the first row and the price for the third article of this
structured array?
2. Exercise:
Figure out a data type definition for time records with entries for hours, minutes and seconds.
SOLUTIONS:
Solution to the first exercise:
import numpy as np
print(stock[1])
print(stock["productID"])
print(stock[2]["price"])
print(stock)
(45765, 439.93)
[34765 45765 99661 12129]
344.19
[(34765, 603.76) (45765, 439.93) (99661, 344.19) (12129, 129.3
9)]
43
Other documents randomly have
different content
The importance of this invention can hardly be overestimated. It
ranks with Maudslay’s slide-rest and the turret tool-holder, as it is an
essential feature in all modern automatic lathes, both for bar-stock
and chucking work.
Assured of the success of the machine, Spencer withdrew from
active connection with the Billings & Spencer Company in 1874, and
in 1876, with George A. Fairfield, then superintendent of the Weed
Sewing Machine Company, and others, formed the Hartford Machine
Screw Company, one of the most successful enterprises in the city.
Unfortunately, Mr. Spencer withdrew in 1882 to manufacture a new
repeating shotgun and rifle which he had invented. The gun was a
success mechanically, but the Spencer Arms Company, which had
been formed in 1883 at Windsor, Conn., was a failure, and Mr.
Spencer lost heavily. In his later years Mr. Spencer has returned to
the field where he did his most brilliant work, automatic lathes. He
represents the New England mechanic at his best, and his tireless
and productive ingenuity has made a permanent impress on modern
manufacturing methods.
Francis A. Pratt was born at Woodstock, Vt. When he was eight
years old his family moved to Lowell. He was a mechanic from
boyhood but he had the good fortune to be apprenticed as a
machinist with Warren Aldrich, a good mechanic and a wise teacher.
At twenty, Mr. Pratt went to Gloucester, N. J., where he was
employed first as a journeyman, later as a contractor. In 1852 he
came to the Colt shop, where he worked for two years. He then
accepted the foremanship of the Phœnix Iron Works, which was run
by Levi Lincoln and his two sons.
Amos Whitney was born in Maine and moved to Lawrence, Mass.,
where he served his apprenticeship with the Essex Machine
Company which built cotton machinery, locomotives and machine
tools. He came from a family of mechanics. His father was a
locksmith and machinist, his grandfather was an expert blacksmith,
his great-grandfather was a small manufacturer of agricultural tools,
and he is of the same family as Eli Whitney of New Haven, and
Baxter D. Whitney, the veteran tool builder of Winchendon. In 1850
both he and his father were working at Colt’s factory at Hartford. In
1854 Amos Whitney joined Pratt in the Phœnix Iron Works, where
they worked together for ten years, the former as a contractor, the
latter as superintendent. Whitney was earning over eight dollars a
day when he left Colt’s and took up the new contract work which
offered at the beginning only two dollars a day.
Many of the shops of that generation were “contract shops.” The
Colt Armory was run on that basis, at least in its manufacturing
departments. Under this system the firm or company furnished all
the materials, machinery, tools, shop room and supplies, while the
workmen were employed by the contractor, their wages being paid
by the firm but charged against the contractor’s account. A better
training for future manufacturers could hardly be devised, and a
surprising number of these old-time contractors have succeeded
later in business for themselves.
In the summer of 1860 Pratt and Whitney rented a small room
and, in addition to their regular employment, began doing work on
their own account, i.e., manufacturing the small winder for the
Willimantic Linen Company. Mr. Whitney’s father-in-law acted as
pattern maker, millwright, bookkeeper and general utility man. The
following February they were burned out, but were running again a
month later in other quarters. Here they continued to spread from
room to room until all available space was outgrown. They
succeeded from the very start, and at once became leaders and
teachers of other mechanics, suggesters of new methods of work
and of new means for its accomplishment. Both Pratt and Whitney
were thoroughly familiar with gun manufacture, and the business
was hardly started when the outbreak of the Civil War gave them
more than they could do. In 1862 they took into partnership Monroe
Stannard of New Britain, each of the three contributing $1200. Mr.
Stannard took charge of the shop, as Pratt and Whitney were still
with the Phœnix Iron Works. Within two years the business had
increased to such an extent that they gave up their positions at the
Phœnix works and in 1865 erected the first building on their present
site. From $3600 in 1862 their net assets grew in four years to
$75,000, and during the three years following that they earned and
put back into the business more than $100,000. In 1869 the Pratt &
Whitney Company was formed with a capital of $350,000, later
increased to $500,000. In 1893 it was reorganized with a
capitalization of $3,000,000. Since that time it has become a part of
the Niles-Bement-Pond Company.
Figure 35. Francis A. Pratt
Figure 36. Amos Whitney
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com