SlideShare a Scribd company logo
UNIT 3: BASICS OF NUMPY
1
NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION
NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific
computing. It provides support for arrays (multi-dimensional, homogeneous data structures)
and a wide range of mathematical functions to perform vectorized computations efficiently. This
guide will cover some of the basics of working with NumPy arrays and performing vectorized
computations.
Installing NumPy
Before using NumPy, you need to make sure it's installed. You can install it using pip:
pip install numpy
2
Importing NumPy
To use NumPy in your Python code, you should import it:
import numpy as np
By convention, it's common to import NumPy as np for brevity.
Creating NumPy Arrays
You can create NumPy arrays using various methods:
1. From Python Lists:
arr = np.array([1, 2, 3, 4, 5])
2. Using NumPy Functions:
zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements
ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1
3
3. Using NumPy's Range Function:
range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]
4
BASIC ARRAY OPERATIONS
Once you have NumPy arrays, you can perform various operations on them:
1. Element-wise Operations:
NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication,
and division:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]
5
2. Indexing and Slicing:
You can access individual elements and slices of NumPy arrays using indexing and slicing:
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])
3. Array Shape and Reshaping:
You can check and change the shape of NumPy arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)
4. Aggregation Functions:
NumPy provides functions to compute statistics on arrays:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr) # Calculate the mean (average)
max_val = np.max(arr) # Find the maximum value
min_val = np.min(arr) # Find the minimum value
6
VECTORIZED COMPUTATION
Vectorized computation in Python refers to performing operations on entire arrays or sequences
of data without the need for explicit loops. This approach leverages highly optimized, low-level
code to achieve faster and more efficient computations. The primary library for vectorized
computation in Python is NumPy.
Traditional Loop-Based Computation
In traditional Python programming, you might use explicit loops to perform operations on arrays
or lists. For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
7
Vectorized Computation with NumPy
NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how
how you can achieve the same result using NumPy:
import numpy as np
# Using NumPy for element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
# Result: array([5, 7, 9])
8
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures:
the DataFrame and the Series. These data structures are designed to handle structured data, making it easier
to work with datasets in a tabular format.
DataFrame:
 A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.
 It consists of rows and columns, where each column can have a different data type (e.g., integers, floats,
strings, or even custom data types).
 You can think of a DataFrame as a collection of Series objects, where each Series is a column.
 DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data
cleaning, exploration, and transformation. 9
Here's a basic example of how to create a DataFrame using Pandas:
10
Series:
 A Series is a one-dimensional labeled array that can hold data of any data type.
 It is like a column in a DataFrame or a single variable in statistics.
 Series objects are commonly used for time series data, as well as other one-dimensional data.
Key characteristics of a Pandas Series:
 Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning
all the data within a Series must be of the same data type. For example, if you create a Series with integer
values, all values within that Series will be integers.
 Labeled Data: Series have two parts: the data itself and an associated index. The index provides labels or
names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can
specify custom labels if needed.
 Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but does not have
columns or rows like a DataFrame.
11
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Display the Series
print(series)
0 10
1 20
2 30
3 40
4 50
dtype: int64
12
Some common tasks you can perform with Pandas:
 Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL
databases, and more.
 Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and
transforming data types.
 Data Selection: Easily select specific rows and columns of interest using various indexing techniques.
 Data Aggregation: Perform group by operations, calculate statistics, and aggregate data based on specific
criteria.
 Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and
Seaborn to create informative plots and charts.
13
A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous
tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data
manipulation and analysis in Python.
Here's how you can work with DataFrames in Python using Pandas:
1. Import Pandas:
First, you need to import the Pandas library.
import pandas as pd
2. Creating a DataFrame:
You can create a DataFrame in several ways. Here are a few
common methods:
From a dictionary:
data = {'Column1': [value1, value2, ...],
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
DataFrame
14
• From a list of lists:
data = [[value1, value2],
[value3, value4]]
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
• From a CSV file:
df = pd.read_csv('file.csv')
3. Viewing Data:
You can use various methods to view and explore your DataFrame:
df.head(): Displays the first few rows of the DataFrame.
df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts. 15
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple columns
df[df['Column1'] > 5] # Filter rows based on a condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For
example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a
new column
df.at[index, 'Column1'] = new_value # Update a specific
value
df = df.append({'Column1': value1, 'Column2': value2},
ignore_index=True) # Append a new row
16
6. Data Analysis:
Pandas provides various functions for data analysis, such
as describe(), groupby(), agg(), and more.
7. Saving Data:
You can save the DataFrame to a CSV file or other formats:
df.to_csv('output.csv', index=False)
17
INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING
In Pandas, the Index object is a fundamental component of both Series and DataFrame data
structures.
It provides the labels or names for the rows or columns of your data. You can use indexing,
selection, and filtering techniques with these indexes to access specific data points or subsets of
your data. Here's how you can work with index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels. You can use .loc[] for label-
based indexing and .iloc[] for integer-based indexing.
• Label-based indexing:
df.loc['label'] # Access a specific row by its label
df.loc['label', 'column_name'] # Access a specific element
by label and column name
18
• Integer-based indexing:
df.iloc[0] # Access the first row
df.iloc[0, 1] # Access an element by row and column index
2. Selection:
You can use various methods to select specific data based on conditions or criteria.
• Select rows based on a condition:
19
df[df['Column'] > 5] # Select rows where 'Column' is greater than 5
• Select rows by multiple conditions:
df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
20
3. Filtering:
Filtering allows you to create a boolean mask based on a condition and then apply that mask to your
DataFrame to select rows meeting the condition.
Create a boolean mask:
condition = df['Column'] > 5
Apply the mask to the DataFrame:
filtered_df = df[condition]
4. Setting a New Index:
You can set a specific column as the index of your DataFrame using the .set_index() method.
df.set_index('Column_Name', inplace=True)
21
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you can use the
.reset_index() method.
df.reset_index(inplace=True)
6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data
structures.
df.set_index(['Index1', 'Index2'], inplace=True)
Index objects in Pandas are versatile and powerful for working with data because they enable you to
access and manipulate your data in various ways, whether it's for data retrieval, filtering, or
restructuring.
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
22
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series an
DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels o
the objects involved in the operation, which ensures that the result of the operation maintains data integrity and
aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas:
1. Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between tw
Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the dat
based on common labels and performs the operation only on matching labels.
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
result = series1 + series2
In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't matc
between series1 and series2.
23
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values.
3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them. The alignment occurs both
for rows (based on the index) and columns (based on column names).
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])
result = df1 + df2
In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and
df2.
4. Handling Missing Data:
You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or
columns with missing data.
result_filled = result.fillna(0) # Replace NaN with 0
result_dropped = result.dropna() # Remove rows or columns with NaN values
24
5. Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to
match the shape of the Series.
series = pd.Series([1, 2, 3])
scalar = 2
result = series * scalar
In this example, result will be a Series with values [2, 4, 6].
Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work
with datasets of different shapes without needing to manually align them. It ensures that operations are
performed in a way that maintains the integrity and structure of your data.
25
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike
Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the
same data type). Here's how arithmetic and data alignment work in NumPy:
Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the
arrays being operated on. This means that if you perform an operation between two NumPy arrays of
different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5])
result = arr1 + arr2
In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
26
Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:
If the arrays have a different number of dimensions, pad the smaller shape with ones on the left side.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are
compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error.
Handling Missing Data:
In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with
mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is
possible.
Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default. This means that each element in the resulting
array is the result of applying the operation to the corresponding elements in the input arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 * arr2
In this case, result will be [4, 10, 18].
WHAT IS VECTORIZATION ?
 Vectorization is used to speed up the Python code without using loop.
 Using such a function can help in minimizing the running time of code efficiently.
 Various operations are being performed over vector such as dot product of vectors which is also known
as scalar product as it produces single output, outer products which results in square matrix of
dimension equal to length X length of the vectors, Element wise multiplication which products the
element of same indexes and dimension of the matrix remain unchanged.
27
28
APPLYING FUNCTIONS AND MAPPING
In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques,
including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use
the np.vectorize() function for mapping operations. Here's an overview of these approaches:
Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire
arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be
applied element-wise to arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Applying a function element-wise
result = np.square(arr) # Square each element
In this example, the np.square() function is applied element-wise to the arr array.
29
HOW TO CREATE YOUR OWN UFUNC
To create your own ufunc(Universal Functions), you have to define a function, like you do with normal
functions in Python, then you add it to your NumPy ufunc library with the frompyfunc() method.
 ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.
 They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for
computation.
 ufuncs also take additional arguments, like:
The frompyfunc() method takes the following arguments:
1.function - the name of the function.
2.inputs - the number of input arguments (arrays).
3.outputs - the number of output arrays.
30
31
32
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional
array. This is useful when you want to apply a function to each row or column of a 2D array.
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6]])
# Apply a function along the rows (axis=1)
def sum_of_row(row):
return np.sum(row)
result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)
In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
33
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be
applied element-wise to NumPy arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Apply the vectorized function to the array
result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
34
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Map the function to each element
result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more complex
mapping operations.
These methods allow you to apply functions and perform mapping operations efficiently on NumPy
arrays, making it a powerful library for numerical and scientific computing tasks.
35
SORTING AND RANKING
Sorting and ranking are common data manipulation operations in data analysis and are widely supported in
Python through libraries like NumPy and Pandas. These operations help organize data in a desired order or
rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries:
Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.
np.sort(): This function returns a new sorted array without modifying the original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
sorted_arr = np.sort(arr)
 np. sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices.
The figure shows how the algorithm transforms an unsorted array [10, 6, 8, 2, 5, 4, 9, 1] into a sorted
array [1, 2, 4, 5, 6, 8, 9, 10] .
36
37
np.argsort(): This function returns the indices that would sort the array. You can use these indices to sort the
original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
sorted_arr = arr[indices]
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify the column(s)
to sort by and the sorting order.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35]}
df = pd.DataFrame(data)
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age', ascending=True)
38
NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.
You can then use these rankings to create a ranked array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method. You can specify the sorting order and how to handle
ties (e.g., assigning the average rank to tied values).
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 30]}
df = pd.DataFrame(data)
# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
Ranking in NumPy:
39
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS
1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.
import numpy as np
data = np.array([25, 30, 22, 35, 28])
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)
40
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
3. Correlation and Covariance:
You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().
correlation_matrix = np.corrcoef(data1, data2)
covariance_matrix = np.cov(data1, data2)
41
CORRELATION AND COVARIANCE
In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov()
functions, respectively. These functions are useful for analyzing relationships and dependencies between
variables. Here's how to use them:
Computing Correlation Coefficient (Correlation):
The correlation coefficient measures the strength and direction of a linear relationship between two variables.
It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear
correlation.
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
42
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)
# The correlation coefficient is in the (0, 1) element of the matrix
correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
43
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values indicate a positive
relationship (both variables increase or decrease together), while negative values indicate an inverse
relationship (one variable increases as the other decreases).
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
# Compute the covariance between x and y
covariance_matrix = np.cov(x, y)
# The covariance is in the (0, 1) element of the matrix
covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.
Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and
covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns,
you can compute the correlation matrix or covariance matrix for all pairs of variables.

More Related Content

PPTX
Unit 3_Numpy_VP.pptx
PPTX
Unit 3_Numpy_VP.pptx
PPTX
Lecture 9.pptx
PPTX
introduction to data structures in pandas
PPTX
Python-for-Data-Analysis.pptx
PPTX
Data Visualization_pandas in hadoop.pptx
PPTX
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
PPTX
python for data anal gh i o fytysis creation.pptx
Unit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptx
Lecture 9.pptx
introduction to data structures in pandas
Python-for-Data-Analysis.pptx
Data Visualization_pandas in hadoop.pptx
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
python for data anal gh i o fytysis creation.pptx

Similar to Unit 3_Numpy_Vsp.pptx (20)

PPTX
Numpy in python, Array operations using numpy and so on
PPTX
dvdxsfdxfdfdfdffddvfbgbesseesesgesesseseggesges
PPTX
pandasppt with informative topics coverage.pptx
PPTX
PANDAS IN PYTHON (Series and DataFrame)
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
Python for data analysis
PPTX
Introduction to a Python Libraries and python frameworks
PPTX
PPT on Data Science Using Python
PPT
SASasasASSSasSSSSSasasaSASsasASASasasASs
PPTX
Introducing Pandas Objects.pptx
PPTX
interenship.pptx
PPTX
Pandas Dataframe reading data Kirti final.pptx
PDF
Python for Data Analysis.pdf
PPTX
Python-for-Data-Analysis.pptx
PPTX
Python-for-Data-Analysis.pptx
PPTX
More on Pandas.pptx
PDF
Python-for-Data-Analysis.pdf
PPTX
2. Data Preprocessing with Numpy and Pandas.pptx
PDF
Lecture on Python Pandas for Decision Making
Numpy in python, Array operations using numpy and so on
dvdxsfdxfdfdfdffddvfbgbesseesesgesesseseggesges
pandasppt with informative topics coverage.pptx
PANDAS IN PYTHON (Series and DataFrame)
Chapter 5-Numpy-Pandas.pptx python programming
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Python for data analysis
Introduction to a Python Libraries and python frameworks
PPT on Data Science Using Python
SASasasASSSasSSSSSasasaSASsasASASasasASs
Introducing Pandas Objects.pptx
interenship.pptx
Pandas Dataframe reading data Kirti final.pptx
Python for Data Analysis.pdf
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
More on Pandas.pptx
Python-for-Data-Analysis.pdf
2. Data Preprocessing with Numpy and Pandas.pptx
Lecture on Python Pandas for Decision Making

More from prakashvs7 (15)

PPTX
Python lambda.pptx
PPTX
Unit 4_Working with Graphs _python (2).pptx
PPTX
unit 5_Real time Data Analysis vsp.pptx
PPTX
unit 4-1.pptx
PPT
unit 3.ppt
PDF
final Unit 1-1.pdf
DOCX
PCCF-UNIT 2-1 new.docx
PPTX
AI UNIT-4 Final (2).pptx
PPTX
AI UNIT-3 FINAL (1).pptx
PPTX
AI-UNIT 1 FINAL PPT (2).pptx
PPTX
DS-UNIT 3 FINAL.pptx
PPTX
DS - Unit 2 FINAL (2).pptx
PPTX
DS-UNIT 1 FINAL (2).pptx
PPT
Php unit i
PPTX
The process
Python lambda.pptx
Unit 4_Working with Graphs _python (2).pptx
unit 5_Real time Data Analysis vsp.pptx
unit 4-1.pptx
unit 3.ppt
final Unit 1-1.pdf
PCCF-UNIT 2-1 new.docx
AI UNIT-4 Final (2).pptx
AI UNIT-3 FINAL (1).pptx
AI-UNIT 1 FINAL PPT (2).pptx
DS-UNIT 3 FINAL.pptx
DS - Unit 2 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptx
Php unit i
The process

Recently uploaded (20)

PPTX
IMMUNIZATION PROGRAMME pptx
PDF
LDMMIA Reiki Yoga Workshop 15 MidTerm Review
PDF
LDMMIA Reiki Yoga S2 L3 Vod Sample Preview
PDF
Cell Biology Basics: Cell Theory, Structure, Types, and Organelles | BS Level...
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PPTX
Introduction and Scope of Bichemistry.pptx
PDF
Piense y hagase Rico - Napoleon Hill Ccesa007.pdf
PDF
Module 3: Health Systems Tutorial Slides S2 2025
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Onica Farming 24rsclub profitable farm business
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Types of Literary Text: Poetry and Prose
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
How to Manage Loyalty Points in Odoo 18 Sales
PDF
Open folder Downloads.pdf yes yes ges yes
DOCX
UPPER GASTRO INTESTINAL DISORDER.docx
PPTX
How to Manage Bill Control Policy in Odoo 18
IMMUNIZATION PROGRAMME pptx
LDMMIA Reiki Yoga Workshop 15 MidTerm Review
LDMMIA Reiki Yoga S2 L3 Vod Sample Preview
Cell Biology Basics: Cell Theory, Structure, Types, and Organelles | BS Level...
Cardiovascular Pharmacology for pharmacy students.pptx
Introduction and Scope of Bichemistry.pptx
Piense y hagase Rico - Napoleon Hill Ccesa007.pdf
Module 3: Health Systems Tutorial Slides S2 2025
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Onica Farming 24rsclub profitable farm business
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
COMPUTERS AS DATA ANALYSIS IN PRECLINICAL DEVELOPMENT.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Types of Literary Text: Poetry and Prose
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
How to Manage Loyalty Points in Odoo 18 Sales
Open folder Downloads.pdf yes yes ges yes
UPPER GASTRO INTESTINAL DISORDER.docx
How to Manage Bill Control Policy in Odoo 18

Unit 3_Numpy_Vsp.pptx

  • 1. UNIT 3: BASICS OF NUMPY 1
  • 2. NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific computing. It provides support for arrays (multi-dimensional, homogeneous data structures) and a wide range of mathematical functions to perform vectorized computations efficiently. This guide will cover some of the basics of working with NumPy arrays and performing vectorized computations. Installing NumPy Before using NumPy, you need to make sure it's installed. You can install it using pip: pip install numpy 2
  • 3. Importing NumPy To use NumPy in your Python code, you should import it: import numpy as np By convention, it's common to import NumPy as np for brevity. Creating NumPy Arrays You can create NumPy arrays using various methods: 1. From Python Lists: arr = np.array([1, 2, 3, 4, 5]) 2. Using NumPy Functions: zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements ones_arr = np.ones(3) # Creates an array of ones with 3 elements rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1 3
  • 4. 3. Using NumPy's Range Function: range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8] 4
  • 5. BASIC ARRAY OPERATIONS Once you have NumPy arrays, you can perform various operations on them: 1. Element-wise Operations: NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication, and division: a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) c = a + b # Element-wise addition: [5, 7, 9] d = a * b # Element-wise multiplication: [4, 10, 18] 5
  • 6. 2. Indexing and Slicing: You can access individual elements and slices of NumPy arrays using indexing and slicing: arr = np.array([0, 1, 2, 3, 4, 5]) element = arr[2] # Access element at index 2 (value: 2) sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4]) 3. Array Shape and Reshaping: You can check and change the shape of NumPy arrays: arr = np.array([[1, 2, 3], [4, 5, 6]]) shape = arr.shape # Get the shape (2, 3) reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2) 4. Aggregation Functions: NumPy provides functions to compute statistics on arrays: arr = np.array([1, 2, 3, 4, 5]) mean = np.mean(arr) # Calculate the mean (average) max_val = np.max(arr) # Find the maximum value min_val = np.min(arr) # Find the minimum value 6
  • 7. VECTORIZED COMPUTATION Vectorized computation in Python refers to performing operations on entire arrays or sequences of data without the need for explicit loops. This approach leverages highly optimized, low-level code to achieve faster and more efficient computations. The primary library for vectorized computation in Python is NumPy. Traditional Loop-Based Computation In traditional Python programming, you might use explicit loops to perform operations on arrays or lists. For example: # Using loops to add two lists element-wise list1 = [1, 2, 3] list2 = [4, 5, 6] result = [] for i in range(len(list1)): result.append(list1[i] + list2[i]) # Result: [5, 7, 9] 7
  • 8. Vectorized Computation with NumPy NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how how you can achieve the same result using NumPy: import numpy as np # Using NumPy for element-wise addition arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 + arr2 # Result: array([5, 7, 9]) 8
  • 9. INTRODUCTION TO PANDAS DATA STRUCTURES Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: the DataFrame and the Series. These data structures are designed to handle structured data, making it easier to work with datasets in a tabular format. DataFrame:  A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.  It consists of rows and columns, where each column can have a different data type (e.g., integers, floats, strings, or even custom data types).  You can think of a DataFrame as a collection of Series objects, where each Series is a column.  DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data cleaning, exploration, and transformation. 9
  • 10. Here's a basic example of how to create a DataFrame using Pandas: 10
  • 11. Series:  A Series is a one-dimensional labeled array that can hold data of any data type.  It is like a column in a DataFrame or a single variable in statistics.  Series objects are commonly used for time series data, as well as other one-dimensional data. Key characteristics of a Pandas Series:  Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning all the data within a Series must be of the same data type. For example, if you create a Series with integer values, all values within that Series will be integers.  Labeled Data: Series have two parts: the data itself and an associated index. The index provides labels or names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can specify custom labels if needed.  Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but does not have columns or rows like a DataFrame. 11
  • 12. import pandas as pd # Create a Series from a list data = [10, 20, 30, 40, 50] series = pd.Series(data) # Display the Series print(series) 0 10 1 20 2 30 3 40 4 50 dtype: int64 12
  • 13. Some common tasks you can perform with Pandas:  Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL databases, and more.  Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and transforming data types.  Data Selection: Easily select specific rows and columns of interest using various indexing techniques.  Data Aggregation: Perform group by operations, calculate statistics, and aggregate data based on specific criteria.  Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and Seaborn to create informative plots and charts. 13
  • 14. A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data manipulation and analysis in Python. Here's how you can work with DataFrames in Python using Pandas: 1. Import Pandas: First, you need to import the Pandas library. import pandas as pd 2. Creating a DataFrame: You can create a DataFrame in several ways. Here are a few common methods: From a dictionary: data = {'Column1': [value1, value2, ...], 'Column2': [value1, value2, ...]} df = pd.DataFrame(data) DataFrame 14
  • 15. • From a list of lists: data = [[value1, value2], [value3, value4]] df = pd.DataFrame(data, columns=['Column1', 'Column2']) • From a CSV file: df = pd.read_csv('file.csv') 3. Viewing Data: You can use various methods to view and explore your DataFrame: df.head(): Displays the first few rows of the DataFrame. df.tail(): Displays the last few rows of the DataFrame. df.shape: Returns the number of rows and columns. df.columns: Returns the column names. df.info(): Provides information about the DataFrame, including data types and non-null counts. 15
  • 16. 4. Selecting Data: You can select specific columns or rows from a DataFrame using indexing or filtering. For example: df['Column1'] # Select a specific column df[['Column1', 'Column2']] # Select multiple columns df[df['Column1'] > 5] # Filter rows based on a condition 5. Modifying Data: You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For example: df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column df.at[index, 'Column1'] = new_value # Update a specific value df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row 16
  • 17. 6. Data Analysis: Pandas provides various functions for data analysis, such as describe(), groupby(), agg(), and more. 7. Saving Data: You can save the DataFrame to a CSV file or other formats: df.to_csv('output.csv', index=False) 17
  • 18. INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING In Pandas, the Index object is a fundamental component of both Series and DataFrame data structures. It provides the labels or names for the rows or columns of your data. You can use indexing, selection, and filtering techniques with these indexes to access specific data points or subsets of your data. Here's how you can work with index objects in Pandas: 1. Indexing: Indexing allows you to access specific elements or rows in your data using labels. You can use .loc[] for label- based indexing and .iloc[] for integer-based indexing. • Label-based indexing: df.loc['label'] # Access a specific row by its label df.loc['label', 'column_name'] # Access a specific element by label and column name 18
  • 19. • Integer-based indexing: df.iloc[0] # Access the first row df.iloc[0, 1] # Access an element by row and column index 2. Selection: You can use various methods to select specific data based on conditions or criteria. • Select rows based on a condition: 19 df[df['Column'] > 5] # Select rows where 'Column' is greater than 5 • Select rows by multiple conditions: df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
  • 20. 20 3. Filtering: Filtering allows you to create a boolean mask based on a condition and then apply that mask to your DataFrame to select rows meeting the condition. Create a boolean mask: condition = df['Column'] > 5 Apply the mask to the DataFrame: filtered_df = df[condition] 4. Setting a New Index: You can set a specific column as the index of your DataFrame using the .set_index() method. df.set_index('Column_Name', inplace=True)
  • 21. 21 5. Resetting the Index: If you've set a column as the index and want to revert to the default integer-based index, you can use the .reset_index() method. df.reset_index(inplace=True) 6. Multi-level Indexing: You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data structures. df.set_index(['Index1', 'Index2'], inplace=True) Index objects in Pandas are versatile and powerful for working with data because they enable you to access and manipulate your data in various ways, whether it's for data retrieval, filtering, or restructuring.
  • 22. ARITHMETIC AND DATA ALIGNMENT IN PANDAS 22 Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series an DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels o the objects involved in the operation, which ensures that the result of the operation maintains data integrity and aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas: 1. Automatic Alignment: When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between tw Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the dat based on common labels and performs the operation only on matching labels. series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C']) series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D']) result = series1 + series2 In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't matc between series1 and series2.
  • 23. 23 2. Missing Data (NaN): When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values. 3. DataFrame Alignment: The same principles apply to DataFrames when performing operations between them. The alignment occurs both for rows (based on the index) and columns (based on column names). df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y']) df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z']) result = df1 + df2 In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and df2. 4. Handling Missing Data: You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or columns with missing data. result_filled = result.fillna(0) # Replace NaN with 0 result_dropped = result.dropna() # Remove rows or columns with NaN values
  • 24. 24 5. Alignment with Broadcasting: Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to match the shape of the Series. series = pd.Series([1, 2, 3]) scalar = 2 result = series * scalar In this example, result will be a Series with values [2, 4, 6]. Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work with datasets of different shapes without needing to manually align them. It ensures that operations are performed in a way that maintains the integrity and structure of your data.
  • 25. 25 ARITHMETIC AND DATA ALIGNMENT IN NUMPY NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the same data type). Here's how arithmetic and data alignment work in NumPy: Automatic Alignment: NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the arrays being operated on. This means that if you perform an operation between two NumPy arrays of different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise. import numpy as np arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5]) result = arr1 + arr2 In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
  • 26. 26 Broadcasting Rules: NumPy follows specific rules when broadcasting arrays: If the arrays have a different number of dimensions, pad the smaller shape with ones on the left side. Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are compatible. If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error. Handling Missing Data: In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is possible. Element-Wise Operations: NumPy performs arithmetic operations element-wise by default. This means that each element in the resulting array is the result of applying the operation to the corresponding elements in the input arrays. arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 * arr2 In this case, result will be [4, 10, 18].
  • 27. WHAT IS VECTORIZATION ?  Vectorization is used to speed up the Python code without using loop.  Using such a function can help in minimizing the running time of code efficiently.  Various operations are being performed over vector such as dot product of vectors which is also known as scalar product as it produces single output, outer products which results in square matrix of dimension equal to length X length of the vectors, Element wise multiplication which products the element of same indexes and dimension of the matrix remain unchanged. 27
  • 28. 28 APPLYING FUNCTIONS AND MAPPING In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques, including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use the np.vectorize() function for mapping operations. Here's an overview of these approaches: Vectorized Functions: NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be applied element-wise to arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Applying a function element-wise result = np.square(arr) # Square each element In this example, the np.square() function is applied element-wise to the arr array.
  • 29. 29
  • 30. HOW TO CREATE YOUR OWN UFUNC To create your own ufunc(Universal Functions), you have to define a function, like you do with normal functions in Python, then you add it to your NumPy ufunc library with the frompyfunc() method.  ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.  They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.  ufuncs also take additional arguments, like: The frompyfunc() method takes the following arguments: 1.function - the name of the function. 2.inputs - the number of input arguments (arrays). 3.outputs - the number of output arrays. 30
  • 31. 31
  • 32. 32 ‘np.apply_along_axis(): You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional array. This is useful when you want to apply a function to each row or column of a 2D array. import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) # Apply a function along the rows (axis=1) def sum_of_row(row): return np.sum(row) result = np.apply_along_axis(sum_of_row, axis=1, arr=arr) In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
  • 33. 33 np.vectorize(): The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be applied element-wise to NumPy arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Apply the vectorized function to the array result = vectorized_func(arr) This approach is useful when you have a custom function that you want to apply to an array.
  • 34. 34 Mapping with np.vectorize(): You can use np.vectorize() to map a function to each element of an array. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Map the function to each element result = vectorized_func(arr) This approach is similar to applying a function element-wise but can be used for more complex mapping operations. These methods allow you to apply functions and perform mapping operations efficiently on NumPy arrays, making it a powerful library for numerical and scientific computing tasks.
  • 35. 35 SORTING AND RANKING Sorting and ranking are common data manipulation operations in data analysis and are widely supported in Python through libraries like NumPy and Pandas. These operations help organize data in a desired order or rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries: Sorting in NumPy: In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions. np.sort(): This function returns a new sorted array without modifying the original array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) sorted_arr = np.sort(arr)
  • 36.  np. sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices. The figure shows how the algorithm transforms an unsorted array [10, 6, 8, 2, 5, 4, 9, 1] into a sorted array [1, 2, 4, 5, 6, 8, 9, 10] . 36
  • 37. 37 np.argsort(): This function returns the indices that would sort the array. You can use these indices to sort the original array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) indices = np.argsort(arr) sorted_arr = arr[indices] Sorting in Pandas: In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify the column(s) to sort by and the sorting order. import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 35]} df = pd.DataFrame(data) # Sort by 'Age' column in ascending order sorted_df = df.sort_values(by='Age', ascending=True)
  • 38. 38 NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements. You can then use these rankings to create a ranked array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) indices = np.argsort(arr) ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0 Ranking in Pandas: In Pandas, you can rank data using the rank() method. You can specify the sorting order and how to handle ties (e.g., assigning the average rank to tied values). import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 30]} df = pd.DataFrame(data) # Rank by 'Age' column in descending order and assign average rank to tied values df['Rank'] = df['Age'].rank(ascending=False, method='average') Ranking in NumPy:
  • 39. 39 SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS 1. Summary Statistics: NumPy provides functions to compute summary statistics directly on arrays. import numpy as np data = np.array([25, 30, 22, 35, 28]) mean = np.mean(data) median = np.median(data) std_dev = np.std(data) variance = np.var(data)
  • 40. 40 2. Percentiles and Quartiles: You can compute specific percentiles and quartiles using the np.percentile() function. percentile_25 = np.percentile(data, 25) percentile_75 = np.percentile(data, 75) 3. Correlation and Covariance: You can compute correlation and covariance between arrays using np.corrcoef() and np.cov(). correlation_matrix = np.corrcoef(data1, data2) covariance_matrix = np.cov(data1, data2)
  • 41. 41 CORRELATION AND COVARIANCE In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov() functions, respectively. These functions are useful for analyzing relationships and dependencies between variables. Here's how to use them: Computing Correlation Coefficient (Correlation): The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation. import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6])
  • 42. 42 # Compute the correlation coefficient between x and y correlation_matrix = np.corrcoef(x, y) # The correlation coefficient is in the (0, 1) element of the matrix correlation_coefficient = correlation_matrix[0, 1] In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
  • 43. 43 Computing Covariance: Covariance measures the degree to which two variables change together. Positive values indicate a positive relationship (both variables increase or decrease together), while negative values indicate an inverse relationship (one variable increases as the other decreases). import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6]) # Compute the covariance between x and y covariance_matrix = np.cov(x, y) # The covariance is in the (0, 1) element of the matrix covariance = covariance_matrix[0, 1] In this example, covariance will contain the covariance between x and y. Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns, you can compute the correlation matrix or covariance matrix for all pairs of variables.