0% found this document useful (0 votes)
7 views

Practice Numpyarray

Uploaded by

krithikb87
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Practice Numpyarray

Uploaded by

krithikb87
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Basic Python Data Structures – Lists, tuples, sets, dictionaries

List methods

 list.append(x)
Add an item to the end of the list. Equivalent to a[len(a):] = [x].

 list.extend(iterable)
Extend the list by appending all the items from the iterable. Equivalent to a[len(a):] =
iterable.

 list.insert(i, x)
Insert an item at a given position. The first argument is the index of the element before
which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is
equivalent to a.append(x).

 list.remove(x)
Remove the first item from the list whose value is equal to x. It raises a ValueError if there is
no such item.

 list.pop([i])
Remove the item at the given position in the list, and return it. If no index is specified,
a.pop() removes and returns the last item in the list. (The square brackets around the i in the
method signature denote that the parameter is optional, not that you should type square
brackets at that position. You will see this notation frequently in the Python Library
Reference.)

 list.clear()
Remove all items from the list. Equivalent to del a[:].

 list.index(x[, start[, end]])


Return zero-based index in the list of the first item whose value is equal to x. Raises a
ValueError if there is no such item. The optional arguments start and end are interpreted as
in the slice notation and are used to limit the search to a particular subsequence of the list.
The returned index is computed relative to the beginning of the full sequence rather than
the start argument.

 list.count(x)
Return the number of times x appears in the list.

 list.sort(*, key=None, reverse=False)


Sort the items of the list in place (the arguments can be used for sort customization, see
sorted() for their explanation).

 list.reverse()
Reverse the elements of the list in place.

 list.copy()
Return a shallow copy of the list. Equivalent to a[:].

Tuple methods

 tuple.count()
Returns the number of times a specified value occurs in a tuple.

 tuple.index()
Searches the tuple for a specified value and returns the position of where it was found.

Set methods

 set.add()
Adds an element to the set.

 set.clear()
Removes all the elements from the set.

 set.copy()
Returns a copy of the set.

 set.difference()
Returns a set containing the difference between two or more sets.

 set.difference_update()
Removes the items in this set that are also included in another, specified set.

 set.discard()
Remove the specified item.

 set.intersection()
Returns a set, that is the intersection of two or more sets.

 set.intersection_update()
Removes the items in this set that are not present in other, specified set(s).

 set.isdisjoint()
Returns whether two sets have a intersection or not.

 set.issubset()
Returns whether another set contains this set or not.
 set.issuperset()
Returns whether this set contains another set or not.

 set.pop()
Removes an element from the set.

 set.remove()
Removes the specified element.

 set.symmetric_difference()
Returns a set with the symmetric differences of two sets.

 set.symmetric_difference_update()
Inserts the symmetric differences from this set and another.

 set.union()
Return a set containing the union of sets.

 set.update()
Update the set with another set, or any other iterable.

Dictionary methods

 dict.clear()
Removes all the elements from the dictionary.

 dict.copy()
Returns a copy of the dictionary.

 dict.fromkeys()
Returns a dictionary with the specified keys and value.

 dict.get()
Returns the value of the specified key.

 dict.items()
Returns a list containing a tuple for each key value pair.

 dict.keys()
Returns a list containing the dictionary’s keys.

 dict.pop()
Removes the element with the specified key.

 dict.popitem()
Removes the last inserted key-value pair.

 dict.setdefault()
Returns the value of the specified key. If the key does not exist: insert the key, with the
specified value.

 dict.update()
Updates the dictionary with the specified key-value pairs.

 dict.values()
Returns a list of all the values in the dictionary.
Introduction to NumPy
NumPy is a fundamental package for scientific computing in Python. It provides a high-performance
multidimensional array object and tools for working with these arrays. If you are new to NumPy, first,
ensure you have it installed with pip install numpy.

The statement "NumPy arrays are faster than lists when the operation can be vectorized" refers to
the performance benefits of using NumPy arrays over Python lists in scenarios where operations can
be applied to entire arrays (or large chunks of data) at once, rather than iterating through individual
elements.

Key Concepts:

1. NumPy Arrays vs. Python Lists:

o NumPy arrays are part of the NumPy library, which is designed for numerical
computation in Python. They are more efficient in terms of both memory and speed
compared to Python's built-in lists.

o A list in Python is a general-purpose data structure that can store different data
types, while a NumPy array is a specialized array structure optimized for storing
large amounts of homogeneous data (usually numbers) and performing
mathematical operations on it.

2. Vectorization:

o Vectorization refers to performing operations on entire arrays (or large segments)


without the need for explicit loops. Instead of processing each element one at a
time, a vectorized operation applies the operation in bulk, leveraging low-level
optimizations.

o NumPy is highly optimized for these vectorized operations and can execute them in
compiled code (typically implemented in C), avoiding the overhead of Python’s
interpreter and loops.
Example to Understand the Difference:

Using Python Lists (non-vectorized):

If you want to add two lists element-wise, you'd typically write a loop to iterate through the
elements:

python

Copy code

# Python lists

list1 = [1, 2, 3]

list2 = [4, 5, 6]

# Adding element-wise

result = []

for i in range(len(list1)):

result.append(list1[i] + list2[i])

print(result) # Output: [5, 7, 9]

This approach is slow for large datasets because it requires looping over each element manually in
Python, which adds a lot of overhead.

Using NumPy Arrays (vectorized):

With NumPy, you can achieve the same result in a single line of code without explicitly writing a loop.
The operation is vectorized:

python

Copy code

import numpy as np

# NumPy arrays

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

# Vectorized addition

result = arr1 + arr2


print(result) # Output: [5, 7, 9]

Here, NumPy directly performs the addition on the entire array at once, using highly optimized C-
based operations.

Why NumPy Arrays Are Faster:

1. Efficient Memory Layout: NumPy arrays store data in contiguous blocks of memory, which
makes it more efficient for the CPU to process compared to Python lists, which are pointers
to individual objects scattered in memory.

2. Low-Level Optimizations: NumPy uses low-level libraries (like BLAS, LAPACK) written in C or
Fortran to perform operations quickly. When operations are vectorized, these libraries can
operate in parallel, taking full advantage of CPU cache and instructions.

3. No Type Checking: Unlike Python lists, where each element's type must be checked during
operations (since a list can store mixed types), NumPy arrays are homogeneous, which
means that the type is known beforehand and doesn't need to be checked during every
operation.

When the Operation Can Be Vectorized:

Vectorization only works when the operation can be applied uniformly across all elements, such as in
mathematical operations like addition, subtraction, multiplication, etc. If an operation involves
complex control flow (like if conditions) or varying logic for different elements, vectorization may not
be possible, and the performance advantage might diminish.

Here is a quick reference table to summarize the key points of comparison between Numpy arrays
and Python lists:

Feature Python List Numpy Array

Memory Efficiency Lower Higher

Performance Slower for large data Faster for numerical ops

Functionality General and flexible Numerical and optimized

Type Homogeneity Heterogeneous Homogeneous

Size Mutability Mutable Fixed Size

In contrast, Numpy arrays are best suited for numerical data and scenarios where performance,
particularly with large datasets or arrays, is paramount.

A Visual Intro to NumPy and Data Representation – Jay Alammar – Visualizing machine learning one
concept at a time.
Numpy Array vs Python List: What's the Difference? - Sling Academy

Creating NumPy Arrays

The most basic object in NumPy is the ndarray, which stands for ‘n-dimensional array’. You can create
a NumPy array using the numpy.array() function.

import numpy as np

# Creating a simple NumPy array

arr = np.array([1, 2, 3, 4, 5])

print(arr)

Output:

[1 2 3 4 5]

You can also create arrays of zeros or ones using np.zeros() and np.ones() respectively.

# Create an array of zeros

zero_array = np.zeros(5)

print(zero_array)

# Create an array of ones

one_array = np.ones(5)

print(one_array)

Array Operations

Basic mathematical operations are element-wise and can be applied directly on the arrays.

# Create two arrays

a = np.array([1, 2, 3])

b = np.array([4, 5, 6])

# Element-wise addition

print(a + b)

# Element-wise subtraction

print(a - b)
# Element-wise multiplication

print(a * b)

# Element-wise division

print(a / b)

NumPy also supports array broadcasting, which allows arrays of different shapes to be combined in a
way that makes sense.

Indexing and Slicing

NumPy allows you to index and slice arrays in ways similar to Python lists.

# Indexing

arr = np.array([1, 2, 3, 4, 5])

print(arr[0])

print(arr[-1])

# Slicing

print(arr[1:4])

print(arr[:3])

print(arr[2:])

For multidimensional arrays, indexing is done with a tuple of integers. Slices allow selection of sub-
arrays:

# Multidimensional array

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(matrix[1, 1])

print(matrix[:, 1])

print(matrix[1, :])

Reshaping Arrays

NumPy arrays can be reshaped using the np.reshape() function, which returns a new array with the
same data but a new shape.
# Reshape a 1D array to a 2x3 array

original = np.array([1, 2, 3, 4, 5, 6])

reshaped = original.reshape((2, 3))

print(reshaped)

Advanced Operations

Advanced operations include functions for linear algebra, statistics, and more complex array
operations. Let’s go through some examples:

Linear Algebra:

# Dot product

a = np.array([1, 2, 3])

b = np.array([4, 5, 6])

print(np.dot(a, b))

# Matrix multiplication

A = np.array([[1, 2], [3, 4]])

B = np.array([[5, 6], [7, 8]])

print(np.matmul(A, B))

Statistics:

# Minimum and maximum

arr = np.array([1, 2, 3, 4, 5])

print(arr.min())

print(arr.max())

# Mean and standard deviation

print(arr.mean())

print(arr.std())

For more complex array operations, exploring NumPy’s universal functions, or ufuncs, can be very
helpful. They apply element-wise operations to arrays and are highly optimized.

More Complex Operations


NumPy also supports other more complex arithmetic operations. For example, we can perform
exponentiation and modulo operations:

# Exponentiation

exp = a**2

print(exp) # Output: [1 4 9 16]

# Modulo operation

mod = b % a

print(mod) # Output: [0 1 0]

NumPy also includes a comprehensive set of mathematical functions that can perform operations on
arrays, such as np.sqrt() for square roots, np.log() for natural logarithms, and many more.

Broadcasting

Broadcasting describes how arrays of different shapes can be treated during arithmetic operations.
The smaller array is “broadcast” across the larger array so that they have compatible shapes.
Broadcasting rules apply to all element-wise operations, not just arithmetic ones.

An example of broadcasting:

# Broadcasting an array with a scalar

e = np.array([1, 2, 3])

f=e*2

print(f) # Output: [2 4 6]

Here, the scalar value 2 is broadcast across the array e.

Handling Different Shapes

When the shapes of the arrays don’t align for broadcasting, NumPy will raise a ValueError. Here’s an
example where shapes do not match:

# This will result in a ValueError

try:

a = np.array([1,2,3])

b = np.array([1,2])

a+b

except ValueError as e:

print(e) # Dimensions must be equal


Basics of NumPy Arrays

NumPy operations are primarily performed on ‘ndarrays’, its core data structure. Let’s create an
array:

data = np.array([1, 2, 3, 4, 5])

print(data)

Output: [1 2 3 4 5]

Descriptive Statistics

Now let’s discuss some fundamental statistical operations.

Mean

The mean, or average, is a measure of the central tendency of a dataset.

mean_value = np.mean(data)

print(mean_value)

Output: 3.0

Median

Median gives the middle value of the dataset.

median_value = np.median(data)

print(median_value)

Output: 3.0

Variance

Variance measures the spread of the data from the mean

variance_value = np.var(data)

print(variance_value)

Output: 2.0

Standard Deviation

Standard deviation is the square root of the variance, indicating the amount of variation or
dispersion in a set of values.

std_dev_value = np.std(data)

print(std_dev_value)

Output: 1.4142135623730951

Random Numbers and Distributions


NumPy can also generate random numbers and random sampling from various distributions which is
often useful in statistical analysis.

Generating Random Numbers

For example, here’s how you can generate a set of random numbers from a normal distribution:

normal_distribution = np.random.normal(0, 1, size=1000)

Descriptive Statistics on Distributions

Let’s calculate the mean and standard deviation of these numbers:

print('Mean:', np.mean(normal_distribution))

print('Standard deviation:', np.std(normal_distribution))

Correlation Coefficients

Another important statistical tool is the correlation coefficient, which measures the association
between variables.

x = np.array([1, 2, 3, 4, 5, 6])

y = np.array([2, 3.5, 3, 4.5, 6, 5.5])

correlation = np.corrcoef(x, y)

print(correlation)

Working with Multidimensional Data

Most datasets in the real world are multidimensional, and NumPy is perfectly equipped to handle
them.

Multi-dimensional Mean

Here’s how to calculate the mean across different axes of a multi-dimensional dataset:

multi_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print('Mean of entire dataset:', np.mean(multi_data))

print('Mean of each column:', np.mean(multi_data, axis=0))

print('Mean of each row:', np.mean(multi_data, axis=1))


Exploring data using Pandas — Geo-Python site documentation

Pandas
Pandas is one of the most widely used libraries in Python for data analysis and manipulation. It
provides data structures and functions designed to work with structured data in a flexible and
efficient way.

Here are the main reasons why Pandas is popular in Python:

1. Data Structures: Pandas offers two primary data structures: Series (for one-dimensional
data) and DataFrame (for two-dimensional tabular data). These structures make it easy to
manage and manipulate data, similar to tables in SQL databases or data frames in R.

2. Data Cleaning: Pandas simplifies data cleaning with powerful tools for handling missing data,
filtering rows and columns, and applying functions across datasets. This is especially useful
for preparing data for analysis or machine learning.

3. Data Wrangling and Transformation: Pandas allows you to merge, join, and concatenate
datasets; pivot tables; and transform data efficiently. It has built-in functions for grouping
data (groupby), aggregating, and reshaping datasets.

4. Data Analysis and Exploration: Pandas includes functions for quick data analysis, such as
descriptive statistics (mean, median, mode, etc.), correlation, and indexing and slicing data
for exploration.

5. Data Visualization: While Pandas is not a visualization library itself, it integrates well with
libraries like Matplotlib and Seaborn. Pandas also has built-in functions for creating simple
plots directly from a DataFrame.

6. I/O Capabilities: Pandas can read from and write to various file formats like CSV, Excel, SQL,
and JSON, making it easier to move data in and out of your code environment.
7. Performance: Pandas is optimized for performance and handles large datasets more
efficiently than Python’s built-in data structures. It also supports operations that can be
vectorized, reducing the need for loops, which are generally slower in Python.

In short, Pandas is essential in Python for data science, finance, engineering, and other fields where
data manipulation and analysis are critical.

How numpy related to pandas

NumPy and Pandas are both foundational libraries in Python, especially for data science and analysis,
and they are closely related:

1. Data Structures:

o NumPy provides the ndarray object, which is a fast, efficient, n-dimensional array
commonly used for numerical operations.

o Pandas builds on NumPy by offering more flexible, user-friendly data structures like
Series (1D) and DataFrame (2D). Under the hood, Pandas often relies on NumPy
arrays for efficient computation.

2. Efficiency and Speed:

o Both libraries are optimized for performance, but NumPy is more memory-efficient
for handling large, homogeneous (single data type) arrays.

o Pandas, while slightly less efficient for certain operations, offers more flexible
indexing and data manipulation tools suited for tabular data.

3. Data Types:

o NumPy arrays are generally homogenous, meaning all elements are of the same
data type (e.g., all integers or all floats).

o Pandas DataFrames, on the other hand, can hold different data types (e.g., integers,
floats, strings) across different columns, making it more flexible for real-world data
that might include mixed types.

4. Functionality:

o NumPy is often used for mathematical computations and scientific operations. It


includes functions for linear algebra, Fourier transforms, and random number
generation.

o Pandas is geared more toward data manipulation and analysis. It offers higher-level
functions for filtering, aggregating, pivoting, and time series handling that go beyond
basic mathematical operations.

5. Interoperability:

o Pandas DataFrames are essentially wrappers around NumPy arrays, meaning you can
easily convert a Pandas DataFrame to a NumPy array using .values (or .to_numpy())
and vice versa.
o Many operations in Pandas directly leverage NumPy functions, making it easy to
combine these two libraries. For instance, you can use NumPy’s mathematical
functions directly on Pandas objects.

6. Use Cases:

o NumPy is the go-to library when working with multi-dimensional arrays or matrices,
such as in mathematical or deep learning applications.

o Pandas is more suited for working with structured data in a table format, especially
for data wrangling, cleaning, and quick analysis.

In summary, Pandas is built on top of NumPy and extends its capabilities to work efficiently with
structured, real-world data, combining the speed of NumPy with added functionality for data
manipulation.

You might also like