Practice Numpyarray
Practice Numpyarray
List methods
list.append(x)
Add an item to the end of the list. Equivalent to a[len(a):] = [x].
list.extend(iterable)
Extend the list by appending all the items from the iterable. Equivalent to a[len(a):] =
iterable.
list.insert(i, x)
Insert an item at a given position. The first argument is the index of the element before
which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is
equivalent to a.append(x).
list.remove(x)
Remove the first item from the list whose value is equal to x. It raises a ValueError if there is
no such item.
list.pop([i])
Remove the item at the given position in the list, and return it. If no index is specified,
a.pop() removes and returns the last item in the list. (The square brackets around the i in the
method signature denote that the parameter is optional, not that you should type square
brackets at that position. You will see this notation frequently in the Python Library
Reference.)
list.clear()
Remove all items from the list. Equivalent to del a[:].
list.count(x)
Return the number of times x appears in the list.
list.reverse()
Reverse the elements of the list in place.
list.copy()
Return a shallow copy of the list. Equivalent to a[:].
Tuple methods
tuple.count()
Returns the number of times a specified value occurs in a tuple.
tuple.index()
Searches the tuple for a specified value and returns the position of where it was found.
Set methods
set.add()
Adds an element to the set.
set.clear()
Removes all the elements from the set.
set.copy()
Returns a copy of the set.
set.difference()
Returns a set containing the difference between two or more sets.
set.difference_update()
Removes the items in this set that are also included in another, specified set.
set.discard()
Remove the specified item.
set.intersection()
Returns a set, that is the intersection of two or more sets.
set.intersection_update()
Removes the items in this set that are not present in other, specified set(s).
set.isdisjoint()
Returns whether two sets have a intersection or not.
set.issubset()
Returns whether another set contains this set or not.
set.issuperset()
Returns whether this set contains another set or not.
set.pop()
Removes an element from the set.
set.remove()
Removes the specified element.
set.symmetric_difference()
Returns a set with the symmetric differences of two sets.
set.symmetric_difference_update()
Inserts the symmetric differences from this set and another.
set.union()
Return a set containing the union of sets.
set.update()
Update the set with another set, or any other iterable.
Dictionary methods
dict.clear()
Removes all the elements from the dictionary.
dict.copy()
Returns a copy of the dictionary.
dict.fromkeys()
Returns a dictionary with the specified keys and value.
dict.get()
Returns the value of the specified key.
dict.items()
Returns a list containing a tuple for each key value pair.
dict.keys()
Returns a list containing the dictionary’s keys.
dict.pop()
Removes the element with the specified key.
dict.popitem()
Removes the last inserted key-value pair.
dict.setdefault()
Returns the value of the specified key. If the key does not exist: insert the key, with the
specified value.
dict.update()
Updates the dictionary with the specified key-value pairs.
dict.values()
Returns a list of all the values in the dictionary.
Introduction to NumPy
NumPy is a fundamental package for scientific computing in Python. It provides a high-performance
multidimensional array object and tools for working with these arrays. If you are new to NumPy, first,
ensure you have it installed with pip install numpy.
The statement "NumPy arrays are faster than lists when the operation can be vectorized" refers to
the performance benefits of using NumPy arrays over Python lists in scenarios where operations can
be applied to entire arrays (or large chunks of data) at once, rather than iterating through individual
elements.
Key Concepts:
o NumPy arrays are part of the NumPy library, which is designed for numerical
computation in Python. They are more efficient in terms of both memory and speed
compared to Python's built-in lists.
o A list in Python is a general-purpose data structure that can store different data
types, while a NumPy array is a specialized array structure optimized for storing
large amounts of homogeneous data (usually numbers) and performing
mathematical operations on it.
2. Vectorization:
o NumPy is highly optimized for these vectorized operations and can execute them in
compiled code (typically implemented in C), avoiding the overhead of Python’s
interpreter and loops.
Example to Understand the Difference:
If you want to add two lists element-wise, you'd typically write a loop to iterate through the
elements:
python
Copy code
# Python lists
list1 = [1, 2, 3]
list2 = [4, 5, 6]
# Adding element-wise
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i])
This approach is slow for large datasets because it requires looping over each element manually in
Python, which adds a lot of overhead.
With NumPy, you can achieve the same result in a single line of code without explicitly writing a loop.
The operation is vectorized:
python
Copy code
import numpy as np
# NumPy arrays
# Vectorized addition
Here, NumPy directly performs the addition on the entire array at once, using highly optimized C-
based operations.
1. Efficient Memory Layout: NumPy arrays store data in contiguous blocks of memory, which
makes it more efficient for the CPU to process compared to Python lists, which are pointers
to individual objects scattered in memory.
2. Low-Level Optimizations: NumPy uses low-level libraries (like BLAS, LAPACK) written in C or
Fortran to perform operations quickly. When operations are vectorized, these libraries can
operate in parallel, taking full advantage of CPU cache and instructions.
3. No Type Checking: Unlike Python lists, where each element's type must be checked during
operations (since a list can store mixed types), NumPy arrays are homogeneous, which
means that the type is known beforehand and doesn't need to be checked during every
operation.
Vectorization only works when the operation can be applied uniformly across all elements, such as in
mathematical operations like addition, subtraction, multiplication, etc. If an operation involves
complex control flow (like if conditions) or varying logic for different elements, vectorization may not
be possible, and the performance advantage might diminish.
Here is a quick reference table to summarize the key points of comparison between Numpy arrays
and Python lists:
In contrast, Numpy arrays are best suited for numerical data and scenarios where performance,
particularly with large datasets or arrays, is paramount.
A Visual Intro to NumPy and Data Representation – Jay Alammar – Visualizing machine learning one
concept at a time.
Numpy Array vs Python List: What's the Difference? - Sling Academy
The most basic object in NumPy is the ndarray, which stands for ‘n-dimensional array’. You can create
a NumPy array using the numpy.array() function.
import numpy as np
print(arr)
Output:
[1 2 3 4 5]
You can also create arrays of zeros or ones using np.zeros() and np.ones() respectively.
zero_array = np.zeros(5)
print(zero_array)
one_array = np.ones(5)
print(one_array)
Array Operations
Basic mathematical operations are element-wise and can be applied directly on the arrays.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element-wise addition
print(a + b)
# Element-wise subtraction
print(a - b)
# Element-wise multiplication
print(a * b)
# Element-wise division
print(a / b)
NumPy also supports array broadcasting, which allows arrays of different shapes to be combined in a
way that makes sense.
NumPy allows you to index and slice arrays in ways similar to Python lists.
# Indexing
print(arr[0])
print(arr[-1])
# Slicing
print(arr[1:4])
print(arr[:3])
print(arr[2:])
For multidimensional arrays, indexing is done with a tuple of integers. Slices allow selection of sub-
arrays:
# Multidimensional array
print(matrix[1, 1])
print(matrix[:, 1])
print(matrix[1, :])
Reshaping Arrays
NumPy arrays can be reshaped using the np.reshape() function, which returns a new array with the
same data but a new shape.
# Reshape a 1D array to a 2x3 array
print(reshaped)
Advanced Operations
Advanced operations include functions for linear algebra, statistics, and more complex array
operations. Let’s go through some examples:
Linear Algebra:
# Dot product
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.dot(a, b))
# Matrix multiplication
print(np.matmul(A, B))
Statistics:
print(arr.min())
print(arr.max())
print(arr.mean())
print(arr.std())
For more complex array operations, exploring NumPy’s universal functions, or ufuncs, can be very
helpful. They apply element-wise operations to arrays and are highly optimized.
# Exponentiation
exp = a**2
# Modulo operation
mod = b % a
print(mod) # Output: [0 1 0]
NumPy also includes a comprehensive set of mathematical functions that can perform operations on
arrays, such as np.sqrt() for square roots, np.log() for natural logarithms, and many more.
Broadcasting
Broadcasting describes how arrays of different shapes can be treated during arithmetic operations.
The smaller array is “broadcast” across the larger array so that they have compatible shapes.
Broadcasting rules apply to all element-wise operations, not just arithmetic ones.
An example of broadcasting:
e = np.array([1, 2, 3])
f=e*2
print(f) # Output: [2 4 6]
When the shapes of the arrays don’t align for broadcasting, NumPy will raise a ValueError. Here’s an
example where shapes do not match:
try:
a = np.array([1,2,3])
b = np.array([1,2])
a+b
except ValueError as e:
NumPy operations are primarily performed on ‘ndarrays’, its core data structure. Let’s create an
array:
print(data)
Output: [1 2 3 4 5]
Descriptive Statistics
Mean
mean_value = np.mean(data)
print(mean_value)
Output: 3.0
Median
median_value = np.median(data)
print(median_value)
Output: 3.0
Variance
variance_value = np.var(data)
print(variance_value)
Output: 2.0
Standard Deviation
Standard deviation is the square root of the variance, indicating the amount of variation or
dispersion in a set of values.
std_dev_value = np.std(data)
print(std_dev_value)
Output: 1.4142135623730951
For example, here’s how you can generate a set of random numbers from a normal distribution:
print('Mean:', np.mean(normal_distribution))
Correlation Coefficients
Another important statistical tool is the correlation coefficient, which measures the association
between variables.
x = np.array([1, 2, 3, 4, 5, 6])
correlation = np.corrcoef(x, y)
print(correlation)
Most datasets in the real world are multidimensional, and NumPy is perfectly equipped to handle
them.
Multi-dimensional Mean
Here’s how to calculate the mean across different axes of a multi-dimensional dataset:
Pandas
Pandas is one of the most widely used libraries in Python for data analysis and manipulation. It
provides data structures and functions designed to work with structured data in a flexible and
efficient way.
1. Data Structures: Pandas offers two primary data structures: Series (for one-dimensional
data) and DataFrame (for two-dimensional tabular data). These structures make it easy to
manage and manipulate data, similar to tables in SQL databases or data frames in R.
2. Data Cleaning: Pandas simplifies data cleaning with powerful tools for handling missing data,
filtering rows and columns, and applying functions across datasets. This is especially useful
for preparing data for analysis or machine learning.
3. Data Wrangling and Transformation: Pandas allows you to merge, join, and concatenate
datasets; pivot tables; and transform data efficiently. It has built-in functions for grouping
data (groupby), aggregating, and reshaping datasets.
4. Data Analysis and Exploration: Pandas includes functions for quick data analysis, such as
descriptive statistics (mean, median, mode, etc.), correlation, and indexing and slicing data
for exploration.
5. Data Visualization: While Pandas is not a visualization library itself, it integrates well with
libraries like Matplotlib and Seaborn. Pandas also has built-in functions for creating simple
plots directly from a DataFrame.
6. I/O Capabilities: Pandas can read from and write to various file formats like CSV, Excel, SQL,
and JSON, making it easier to move data in and out of your code environment.
7. Performance: Pandas is optimized for performance and handles large datasets more
efficiently than Python’s built-in data structures. It also supports operations that can be
vectorized, reducing the need for loops, which are generally slower in Python.
In short, Pandas is essential in Python for data science, finance, engineering, and other fields where
data manipulation and analysis are critical.
NumPy and Pandas are both foundational libraries in Python, especially for data science and analysis,
and they are closely related:
1. Data Structures:
o NumPy provides the ndarray object, which is a fast, efficient, n-dimensional array
commonly used for numerical operations.
o Pandas builds on NumPy by offering more flexible, user-friendly data structures like
Series (1D) and DataFrame (2D). Under the hood, Pandas often relies on NumPy
arrays for efficient computation.
o Both libraries are optimized for performance, but NumPy is more memory-efficient
for handling large, homogeneous (single data type) arrays.
o Pandas, while slightly less efficient for certain operations, offers more flexible
indexing and data manipulation tools suited for tabular data.
3. Data Types:
o NumPy arrays are generally homogenous, meaning all elements are of the same
data type (e.g., all integers or all floats).
o Pandas DataFrames, on the other hand, can hold different data types (e.g., integers,
floats, strings) across different columns, making it more flexible for real-world data
that might include mixed types.
4. Functionality:
o Pandas is geared more toward data manipulation and analysis. It offers higher-level
functions for filtering, aggregating, pivoting, and time series handling that go beyond
basic mathematical operations.
5. Interoperability:
o Pandas DataFrames are essentially wrappers around NumPy arrays, meaning you can
easily convert a Pandas DataFrame to a NumPy array using .values (or .to_numpy())
and vice versa.
o Many operations in Pandas directly leverage NumPy functions, making it easy to
combine these two libraries. For instance, you can use NumPy’s mathematical
functions directly on Pandas objects.
6. Use Cases:
o NumPy is the go-to library when working with multi-dimensional arrays or matrices,
such as in mathematical or deep learning applications.
o Pandas is more suited for working with structured data in a table format, especially
for data wrangling, cleaning, and quick analysis.
In summary, Pandas is built on top of NumPy and extends its capabilities to work efficiently with
structured, real-world data, combining the speed of NumPy with added functionality for data
manipulation.