3. Basic Python Packages for Data Analytics
3. Basic Python Packages for Data Analytics
Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 121
Chapter Outline
Introduction
NumPy
Pandas
Matplotlib
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 122
Introduction
Feature NumPy Pandas Matplotlib
Efficient numerical
Data manipulation and
Primary Purpose computations and array Data visualization
analysis
operations
Data Structures Multi-dimensional arrays Series and DataFrames Not a data structure library
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 123
Introduction
Suitable
Use Case Purpose
package
8 Clean and merge production data from multiple CSV files. ???
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 124
NumPy package
NumPy is a Python package that stands for Numerical Python
Open source.
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 125
Installing and importing NumPy
NumPy can be installed by using a package manager like pip or
conda.
o import numpy as np
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 126
Creating and Manipulating Arrays in
NumPy
Creating arrays: array(), zeros(), ones(), arrange(), linspace()
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 127
Creating and Manipulating Arrays in NumPy
Command Purpose Syntax Example
numpy.array(
) Creates an array from a list or tuple. numpy.array(object) numpy.array([1, 2, 3])
numpy.zeros(
) Creates an array filled with zeros. numpy.zeros(shape) numpy.zeros((2, 3))
numpy.ones(
) Creates an array filled with ones. numpy.ones(shape) numpy.ones((3, 2))
numpy.arang Creates an array with evenly spaced numpy.arange(start, stop,
e() values. step) numpy.arange(0, 10, 2)
numpy.linspa Creates an array with a specified numpy.linspace(start,
ce() number of values. stop, num) numpy.linspace(0, 1, 5)
Gets the number of dimensions of an
ndim array. array.ndim my_array.ndim
Gets the shape (dimensions) of an
shape array. array.shape zeros_array.shape
Gets the total number of elements in
size an array. array.size zeros_array.size
Gets the data type of the array
dtype elements. array.dtype my_array.dtype
array.reshape(new_shape my_array.reshape((2,
reshape() Changes the shape of an array. ) 2))
flatten() Flattens an array
10/26/2024 into
Dr. Mai one- Faculty
Cao Lan dimension. array.flatten()
of Geology reshaped_array.flatten()
& Petroleum Engineering, HCMUT 128
Returns a flattened array, may return
Array Operations
Element-wise operations: addition, subtraction, multiplication,
division element-by-element
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 129
Indexing and Slicing
Basic indexing and slicing (array[start:stop:step]).
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 130
Linear Algebra with NumPy
numpy.transpose
numpy.linalg.inv
numpy.linalg.det
numpy.linalg.eig
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 131
Linear Algebra with NumPy
Quizzes:
1. Create two matrices A = [[1, 2], [3, 4]] and B = [[5, 6], [7, 8]].
Compute their matrix multiplication.
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 132
Exercises with NumPy
Exercises:
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 133
Exercises with NumPy
Exercises:
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 135
Pandas Data Structure: DataFrame
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 136
Pandas Series & DataFrame
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 137
Series vs. DataFrame
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 138
Pandas Series
>>>
???
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 139
Pandas Series Accessing
There are two ways to access element of series:
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 140
Using Series.apply(function) in Pandas
Syntax:
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 141
Pandas DataFrame
Creating a DataFrame:
import pandas as pd
data = {
‘Name’: [‘A’,’B’,’C’,’D’],
‘Age’: [25,30,28,40]
‘City’: [‘New York’,’Los Angeles’,’Chicago’,’Houston’]
}
df = pd.DataFrame(data)
print(df)
>>>
???
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 142
Practical Exercise
The PVT fluid data from a gas well in the Anaconda Gas Field is given below:
qg
kh m p r m p w f
0.4 72 re
14 24 T ln s
rw
p 2p
m( p ) dp
0 Z
10/26/2024 Dr. Mai Cao Lan, Dept. of Drilling & Production Engineering, GEOPET, HCMUT 144
Practical Exercise (cont’d)
Trapezoidal Method
10/26/2024 Dr. Mai Cao Lan, Dept. of Drilling & Production Engineering, GEOPET, HCMUT 145
Practical Exercise (cont’d)
10/26/2024 Dr. Mai Cao Lân, Faculty of Geology & Petroleum Engineering, HCMUT, Vietnam 146
Matplotlib Package
Basic Plotting
# prepare data
x = np.linspace(0, 2 * np.pi, 200)
y = np.sin(x)
# plot the data
plt.plot(x, y)
# set axis lables and figure title
plt.xlabel('x')
plt.ylabel('y')
plt.title('Basic Plotting')
# set grid on
plt.grid()
# show the plot
plt.show()
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 147
Matplotlib – Basic Plotting
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 148
Matplotlib – Color, Marker & Line Style
import numpy as np
import matplotlib.pyplot as plt
# prepare data
x = np.arange(10)
y = x
# plot the data with customized plot properties
plt.plot(x, y,
color='m',
linestyle='--',
linewidth=1.5,
marker='^',
markersize=5)
# set grid on
plt.grid()
# show the plot
plt.show()
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 149
Matplotlib – Color, Marker & Line Style
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 150
Matplotlib – Multiline Plots
Multiline Plots
x = np.arange(10)
plt.plot(x, -x**2,'o--')
plt.plot(x, -x**3,'^--')
plt.plot(x, -2*x, '>--', x, -2**x, '*--')
plt.legend(['-x**2', '-x**3', '-2*x', '-2**x'], loc = 'lower
left')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Multiline Plotting')
plt.grid()
plt.show()
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 151
Matplotlib – Multiline Plots
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 152
Matplotlib Package
Practice
+ Function 1: 1 = sin( )
+ Function 2: 2 = cos( )
+ Function 3: 3 = 2
- Create a line plot for each function using a different color, marker
and line style. Add title and labels of the x-axis and y-axis for each
plot, including a legend to distinguish between the lines.
10/26/2024 Dr. Mai Cao Lan - Faculty of Geology & Petroleum Engineering, HCMUT 153