Day 3.numpy - Complete - Guide
Day 3.numpy - Complete - Guide
Table of Contents
1. Numpy Array Basics
1.1 Creating Numpy Arrays
2. Array Inspection
2.1 Array Dimension and Shapes
2.2 Array Indexing and Slicing
1. Array Operations
3.1 Element-wise Operations
3.2 Append and Delete
3.3 Aggregation Functions and ufuncs
2. Working with Numpy Arrays
4.1 Combining Arrays
4.2 Splitting Arrays
4.3 Alias vs. View vs. Copy of Arrays
4.4 Sorting Numpy Arrays
3. Numpy for Data Cleaning
5.1 Identify Missing Values
5.2 Removing rows or columns with Missing Values
4. Numpy for Statistical Analysis
6.1 Data Transformation
6.2 Random Sampling and Generation
5. Numpy for Linear Algebra
7.1 Complex Matrix Operations
7.2 Solve Linear Equations
6. Advanced Numpy Techniques
8.1 Masked Arrays
8.2 Structured Arrays Conclusion
[[5, 6],
[7, 8]]])
2-Array Inspection
First element - 10
Third element - 30
Last element - 50
Out[14]: 2
In [15]: # Slicing the array to create a new array
sliced_array = arr[1:4] # Slice from index 1 to 3 (exclusive) [20,30,40]
sliced_array
3-Array Operations
# Addition
result_add = arr1 + arr2 # [5, 7, 9]
result_add
# Aggregation functions
mean_value = np.mean(arr) # Mean: 3.0
print(mean_value)
3.0
# converting 1D array to 2D
arr_2d = arr_1d.reshape(2, 3)
arr_2d
understanding how to use -1: You can use -1 as a placeholder in any one dimension of the new shape, and NumPy
will automatically calculate the size for that dimension.
# So, if you want to convert it to 1D, you have to pass 116352 (303*384)
# Instead, if you don't want to calculate that and let numpy deal with it,
# IN such cases, you can just pass -1, and it will calculate 116352
reshaped_image = image.reshape(-1)
Alias: An alias refers to multiple variables that all point to the same underlying NumPy array object. They share
the same data in memory. Changes in alias array will affect the original array.
View: The .view() method creates a new array object that looks at the same data as the original array but does
not share the same identity. It provides a way to view the data differently or with different data types, but it still
operates on the same underlying data.
Copy: A copy is a completely independent duplicate of a NumPy array. It has its own data in memory, and
changes made to the copy will not affect the original array, and vice versa.
In [39]: original_arr = np.array([1, 2, 3])
# you can observe that it will also change the original array
original_arr
[99 2 3]
copy_arr[0] = 100
[1 2 3]
Ascending sort [1 2 3 4 5]
Descending sort [5 4 3 2 1]
Sorted Indices [1 3 0 4 2]
5-Numpy for Data Cleaning
NumPy provides functions to check for missing values in a numeric array, represented as NaN (Not a Number).
We can use np.isnan to get a boolean matrix with True for the indices where there is a missing value. And when we
pass it to np.any, it will return a 1D array with True for the index where any row item is True. And finally we ~ (not),
and pass the boolean to the original Matrix, which will remove the rows with missing values.
[[1. 2. 3.]
[7. 8. 9.]]
Data transformation involves modifying data to meet specific requirements or assumptions. Numpy doesn’t have
these features directly, but we can utilize the existing features to perform these.
In [45]: # Data Centering
data = np.array([10, 20, 30, 40, 50])
mean = np.mean(data)
centered_data = data - mean
print('Centered data = ',centered_data)
# Standardization
std_dev = np.std(data)
standardized_data = (data - mean) / std_dev
print("standardized data = ",standardized_data)
# Log Transformation
log_transformed_data = np.log(data)
print("log_transformed_data = ",log_transformed_data)
Sampling
Simple Random Sampling: Select a random sample of a specified size from a dataset. When sampling without
replacement, each item selected is not returned to the population.
Bootstrap Sampling: Bootstrap sampling involves sampling with replacement to create multiple datasets. This is
often used for estimating statistics’ variability. # Simple Random Sampling W
In [48]: np.random.randint(0,100)
Out[48]: 99
[4 2 5 9 5]
In [51]: # Generates 5 random values following a Poisson distribution with a rate of 2.5
rate = 2.5
poisson_values = np.random.poisson(rate, 5)
print(poisson_values)
[4 1 3 8 2]
[[4 3 3]
[1 0 9]
[4 4 2]
[2 3 5]
[3 3 4]]
We have already seen Creating vectors, matrices, and the amazing matrix operations we can do with numpy. Now,
Let’s see even complex matrix operations.
[[-2. 1. ]
[ 1.5 -0.5]]
Yes, You can even solve linear equations with numpy features. Solve systems of linear equations using
np.linalg.solve()
# Solve Ax = b for x
x = np.linalg.solve(A, b)
print(x)
[-4.5 5. ]
Masked arrays in NumPy allow you to work with data where certain elements are invalid or missing. A mask is a
Boolean array that indicates which elements should be considered valid and which should be masked (invalid or
missing).
Masked arrays enable you to perform operations on valid data while ignoring the masked elements.
In [59]: import numpy.ma as ma
Structured arrays allow you to work with heterogeneous data, similar to a table with named columns. Each element
of a structured array can have different data types. Create your datatypes by using np.dtype and add the column
name and datatype as a tuple. Then you can pass it to your array.
b'Alice'
[30 25]
Conclusion
In this NumPy guide, we've covered essential aspects and advanced techniques for data science and numerical
computing. Remember, NumPy is a vast library with endless possibilities. What we have seen is still basic and we
can do even a lot more, explore further to unlock its full potential and elevate your data-driven solutions.