0% found this document useful (0 votes)

96 views

NumPy and Pandas Tutorial

Uploaded by

omvati343

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

NumPy and Pandas Tutorial

Uploaded by

omvati343

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

NumPy and Pandas for Data Analysis AI ML Training

NumPy Tutorial
Introduction

NumPy (Numerical Python) is a library for the Python programming language, adding support
for large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.

Installation

To install NumPy, use the following command:

pip install numpy

Basic Operations

Importing NumPy

import numpy as np

Creating Arrays

# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print(array_1d)

# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(array_2d)

# Create an array with zeros

zeros_array = np.zeros((3, 4))
print(zeros_array)

# Create an array with ones

ones_array = np.ones((2, 3))
print(ones_array)

# Create an identity matrix

identity_matrix = np.eye(3)
print(identity_matrix)

# Create an array with a range of values

range_array = np.arange(10, 20, 2)
print(range_array)

# Create an array with evenly spaced values

linspace_array = np.linspace(0, 1, 5)
print(linspace_array)

Array Operations

# Arithmetic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 1 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

print(a + b) # Addition
print(a - b) # Subtraction
print(a * b) # Element-wise multiplication
print(a / b) # Element-wise division

# Matrix multiplication
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
print(np.dot(matrix_a, matrix_b))

# Broadcasting
array_broadcast = np.array([1, 2, 3])
print(array_broadcast + 1) # Adds 1 to each element

# Statistical operations
print(np.mean(a)) # Mean
print(np.median(a)) # Median
print(np.std(a)) # Standard deviation
print(np.sum(a)) # Sum
print(np.min(a)) # Minimum
print(np.max(a)) # Maximum

Indexing and Slicing

array = np.array([1, 2, 3, 4, 5, 6])

# Indexing
print(array[0]) # First element
print(array[-1]) # Last element

# Slicing
print(array[1:4]) # Elements from index 1 to 3
print(array[:3]) # First three elements
print(array[3:]) # Elements from index 3 to end
print(array[::2]) # Every second element

Reshaping Arrays

array = np.arange(1, 10)

reshaped_array = array.reshape((3, 3))
print(reshaped_array)

# Flattening arrays
flattened_array = reshaped_array.flatten()
print(flattened_array)

Pandas Tutorial
Introduction

Pandas is a library providing high-performance, easy-to-use data structures and data analysis
tools for the Python programming language.

Installation

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 2 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

To install Pandas, use the following command:

pip install pandas

Basic Operations

Importing Pandas

import pandas as pd

Creating DataFrames

# Create a DataFrame from a dictionary

data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
print(df)

# Create a DataFrame from a CSV file

df_from_csv = pd.read_csv('path_to_csv_file.csv')
print(df_from_csv)

Viewing Data

# Display the first few rows

print(df.head())

# Display the last few rows

print(df.tail())

# Display the data types of columns

print(df.dtypes)

# Display the shape of the DataFrame

print(df.shape)

# Display summary statistics

print(df.describe())

Selecting Data

# Select a single column

print(df['Name'])

# Select multiple columns

print(df[['Name', 'City']])

# Select rows by index

print(df.iloc[0]) # First row
print(df.iloc[0:2]) # First two rows

# Select rows by label

print(df.loc[0]) # First row
print(df.loc[0:2]) # First three rows (inclusive)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 3 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

# Conditional selection
print(df[df['Age'] > 30])

Adding and Dropping Columns

# Add a new column

df['Country'] = ['USA', 'France', 'Germany', 'UK']
print(df)

# Drop a column
df = df.drop('Country', axis=1)
print(df)

Handling Missing Data

# Create a DataFrame with missing values

data_with_nan = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, None, 35, 32],
'City': ['New York', 'Paris', None, 'London']
}
df_nan = pd.DataFrame(data_with_nan)
print(df_nan)

# Drop rows with missing values

df_dropped_nan = df_nan.dropna()
print(df_dropped_nan)

# Fill missing values

df_filled_nan = df_nan.fillna({'Age': df_nan['Age'].mean(), 'City':
'Unknown'})
print(df_filled_nan)

Grouping and Aggregating Data

# Group by a column and calculate mean

print(df.groupby('City').mean())

# Group by multiple columns and calculate sum

print(df.groupby(['City', 'Name']).sum())

Merging DataFrames

# Create two DataFrames

df1 = pd.DataFrame({'Name': ['John', 'Anna'], 'Age': [28, 24]})
df2 = pd.DataFrame({'Name': ['Peter', 'Linda'], 'City': ['Berlin',
'London']})

# Concatenate DataFrames
df_concat = pd.concat([df1, df2], ignore_index=True)
print(df_concat)

# Merge DataFrames
df_merge = pd.merge(df1, df2, on='Name', how='inner')
print(df_merge)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 4 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

Exporting Data

# Export DataFrame to CSV

df.to_csv('output.csv', index=False)

# Export DataFrame to Excel

df.to_excel('output.xlsx', index=False)

Advanced Pandas Tutorial

Handling Time Series Data

Pandas provides robust support for time series data. Here's how to work with it.

Creating Time Series Data

# Create a date range

date_range = pd.date_range(start='2023-01-01', periods=10, freq='D')
print(date_range)

# Create a DataFrame with time series data

time_series_data = {
'Date': date_range,
'Value': np.random.randn(10)
}
df_time_series = pd.DataFrame(time_series_data)
df_time_series.set_index('Date', inplace=True)
print(df_time_series)

Resampling Time Series Data

# Resample to weekly frequency and calculate the mean

df_resampled = df_time_series.resample('W').mean()
print(df_resampled)

# Resample to monthly frequency and calculate the sum

df_resampled_monthly = df_time_series.resample('M').sum()
print(df_resampled_monthly)

Working with Categorical Data

# Create a DataFrame with categorical data
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'City': ['New York', 'Paris', 'Berlin', 'London'],
'Gender': ['Male', 'Female', 'Male', 'Female']
}
df_categorical = pd.DataFrame(data)

# Convert a column to categorical type

df_categorical['Gender'] = df_categorical['Gender'].astype('category')
print(df_categorical)

# Get the categories and codes

print(df_categorical['Gender'].cat.categories)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 5 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

print(df_categorical['Gender'].cat.codes)

Pivot Tables
# Create a DataFrame
data = {
'Name': ['John', 'Anna', 'John', 'Anna', 'John', 'Anna'],
'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
'Sales': [150, 200, 130, 210, 170, 220]
}
df_sales = pd.DataFrame(data)

# Create a pivot table

pivot_table = df_sales.pivot_table(values='Sales', index='Name',
columns='Month', aggfunc='sum')
print(pivot_table)

Handling Large Datasets

# Read a large CSV file in chunks
chunk_size = 1000
chunks = pd.read_csv('large_dataset.csv', chunksize=chunk_size)

# Process each chunk

for chunk in chunks:
# Perform operations on the chunk
print(chunk.shape)

Applying Functions

Using apply()

# Create a DataFrame
data = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
df = pd.DataFrame(data)

# Define a function
def add_one(x):
return x + 1

# Apply the function to each element

print(df.applymap(add_one))

# Apply the function to each column

print(df.apply(lambda x: x + 1))

# Apply the function to each row

print(df.apply(lambda x: x + 1, axis=1))

Joining DataFrames
# Create two DataFrames
df1 = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 6 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

'value': [1, 2, 3, 4]
})
df2 = pd.DataFrame({
'key': ['B', 'D', 'E', 'F'],
'value': [5, 6, 7, 8]
})

# Inner join
inner_joined = pd.merge(df1, df2, on='key', how='inner')
print(inner_joined)

# Left join
left_joined = pd.merge(df1, df2, on='key', how='left')
print(left_joined)

# Right join
right_joined = pd.merge(df1, df2, on='key', how='right')
print(right_joined)

# Outer join
outer_joined = pd.merge(df1, df2, on='key', how='outer')
print(outer_joined)

Window Functions
# Create a DataFrame with time series data
data = {
'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
'Value': np.random.randn(10)
}
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)

# Calculate rolling mean

rolling_mean = df['Value'].rolling(window=3).mean()
print(rolling_mean)

# Calculate expanding sum

expanding_sum = df['Value'].expanding().sum()
print(expanding_sum)

# Calculate exponentially weighted mean

ewm_mean = df['Value'].ewm(span=3).mean()
print(ewm_mean)

Handling JSON Data

# Create a JSON string
json_str = '''
[
{"Name": "John", "Age": 28, "City": "New York"},
{"Name": "Anna", "Age": 24, "City": "Paris"},
{"Name": "Peter", "Age": 35, "City": "Berlin"}
]
'''

# Read JSON string into DataFrame

df_json = pd.read_json(json_str)
print(df_json)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 7 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

# Export DataFrame to JSON

df_json.to_json('output.json', orient='records', lines=True)

Advanced Indexing with MultiIndex

# Create a MultiIndex DataFrame
arrays = [
['A', 'A', 'B', 'B'],
['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df_multi = pd.DataFrame({'value': [1, 2, 3, 4]}, index=index)
print(df_multi)

# Accessing data in MultiIndex DataFrame

print(df_multi.loc['A'])
print(df_multi.loc[('A', 'one')])

Combining DataFrames with concat and append

# Create DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})

# Concatenate DataFrames
concatenated = pd.concat([df1, df2], ignore_index=True)
print(concatenated)

# Append DataFrames
appended = df1.append(df2, ignore_index=True)
print(appended)

Performance Tips
# Use vectorized operations instead of loops
data = pd.DataFrame({
'A': range(1000000),
'B': range(1000000)
})

# Inefficient way: Using loops

data['C'] = [x + y for x, y in zip(data['A'], data['B'])]

# Efficient way: Using vectorized operations

data['C'] = data['A'] + data['B']

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 8 |Pa ge

Program Budgetary Needs Assessment
100% (1)
Program Budgetary Needs Assessment
2 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
YCM Series - Catalog PUBL-8441 (0418)
No ratings yet
YCM Series - Catalog PUBL-8441 (0418)
36 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Report
No ratings yet
Report
18 pages
Learning_NumPy_and_pandas
No ratings yet
Learning_NumPy_and_pandas
3 pages
dav 2 unit
No ratings yet
dav 2 unit
55 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Python
No ratings yet
Python
32 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
data science
No ratings yet
data science
42 pages
Pandas
No ratings yet
Pandas
5 pages
Class 1 - 2024 Business Analytics
No ratings yet
Class 1 - 2024 Business Analytics
8 pages
Unit 5
No ratings yet
Unit 5
27 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Practical_1
No ratings yet
Practical_1
5 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
Pandas
No ratings yet
Pandas
12 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Py PPT 06
No ratings yet
Py PPT 06
33 pages
jenisha INTERNSHIP REPORT-2.docx (1)
No ratings yet
jenisha INTERNSHIP REPORT-2.docx (1)
19 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
DS FINAL
No ratings yet
DS FINAL
46 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
What is pandas
No ratings yet
What is pandas
9 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Python CA2
No ratings yet
Python CA2
11 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Pandas NumPy Practice Questions
No ratings yet
Pandas NumPy Practice Questions
2 pages
Pandas
No ratings yet
Pandas
4 pages
RAW Data
No ratings yet
RAW Data
22 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
Pandas_Notes
No ratings yet
Pandas_Notes
6 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
Pandas
No ratings yet
Pandas
28 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Lesson 07 Data Manipulation With Pandas
No ratings yet
Lesson 07 Data Manipulation With Pandas
82 pages
Datascience Lab Manual
No ratings yet
Datascience Lab Manual
46 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
graph algorithms-final
No ratings yet
graph algorithms-final
158 pages
Unit 4_Query Processing
No ratings yet
Unit 4_Query Processing
49 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Recursion
No ratings yet
Recursion
12 pages
communication channels
No ratings yet
communication channels
7 pages
IS LECTURE 1
No ratings yet
IS LECTURE 1
37 pages
NNAI BAI-205 UNIT 1
No ratings yet
NNAI BAI-205 UNIT 1
107 pages
CSUnit1[1]
No ratings yet
CSUnit1[1]
124 pages
Engineering problem solving with C Fourth Edition Delores Maria Etter pdf download
100% (3)
Engineering problem solving with C Fourth Edition Delores Maria Etter pdf download
68 pages
D1.6. ISN - Contiki and Tiny OS
No ratings yet
D1.6. ISN - Contiki and Tiny OS
15 pages
VAYNER3 - Web3
No ratings yet
VAYNER3 - Web3
20 pages
Robotic Bomb Detection and Diffusion
No ratings yet
Robotic Bomb Detection and Diffusion
5 pages
Application of Drug Information Retrieval and Storage
No ratings yet
Application of Drug Information Retrieval and Storage
37 pages
Ferrules
No ratings yet
Ferrules
2 pages
Final Reviewer MS
No ratings yet
Final Reviewer MS
6 pages
Sih Team Prime Presentation 20231101000148
No ratings yet
Sih Team Prime Presentation 20231101000148
4 pages
Taking VRV Controls To The Next Level: Norman Pennant
No ratings yet
Taking VRV Controls To The Next Level: Norman Pennant
45 pages
7289 PDF
No ratings yet
7289 PDF
30 pages
Multiaxis Roughing-1: The Turn Edge
No ratings yet
Multiaxis Roughing-1: The Turn Edge
47 pages
TOGAF+9 2+course+2020
No ratings yet
TOGAF+9 2+course+2020
482 pages
NeurIPS 2023 Embersim a Large Scale Databank for Boosting Similarity Search in Malware Analysis Paper Datasets and Benchmarks
No ratings yet
NeurIPS 2023 Embersim a Large Scale Databank for Boosting Similarity Search in Malware Analysis Paper Datasets and Benchmarks
22 pages
Algebra 1 Unit 8
No ratings yet
Algebra 1 Unit 8
9 pages
Federal Signal
No ratings yet
Federal Signal
44 pages
3.4 Lda
No ratings yet
3.4 Lda
12 pages
Leigh X-FLAM Cables
No ratings yet
Leigh X-FLAM Cables
46 pages
Thermaltake Core w200 Manual
No ratings yet
Thermaltake Core w200 Manual
28 pages
MR688B-PV-RS232 User Guide
No ratings yet
MR688B-PV-RS232 User Guide
8 pages
The Ultimate Guide To ECP-742 Ericsson Certified Professional - Radio Network Design
No ratings yet
The Ultimate Guide To ECP-742 Ericsson Certified Professional - Radio Network Design
2 pages
Key Points CQI 9
No ratings yet
Key Points CQI 9
42 pages
Doip
No ratings yet
Doip
15 pages
4G Terminal Communication Protocol
No ratings yet
4G Terminal Communication Protocol
9 pages
ITFS-FHCM RobertHalf Field HCM, Benefits & Payroll5
No ratings yet
ITFS-FHCM RobertHalf Field HCM, Benefits & Payroll5
32 pages
Inbound 7029920683442893925
No ratings yet
Inbound 7029920683442893925
9 pages
Template PPT Seminar Proposal Gratis
No ratings yet
Template PPT Seminar Proposal Gratis
16 pages
CT7S1
No ratings yet
CT7S1
1 page
CNN_Unit
No ratings yet
CNN_Unit
52 pages

NumPy and Pandas Tutorial

Uploaded by

NumPy and Pandas Tutorial

Uploaded by

NumPy and Pandas for Data Analysis AI ML Training

To install NumPy, use the following command:

pip install numpy

# Create an array with zeros

# Create an array with ones

# Create an identity matrix

# Create an array with a range of values

# Create an array with evenly spaced values

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 1 |Pa ge

Indexing and Slicing

array = np.array([1, 2, 3, 4, 5, 6])

array = np.arange(1, 10)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 2 |Pa ge

To install Pandas, use the following command:

pip install pandas

# Create a DataFrame from a dictionary

# Create a DataFrame from a CSV file

# Display the first few rows

# Display the last few rows

# Display the data types of columns

# Display the shape of the DataFrame

# Display summary statistics

# Select a single column

# Select multiple columns

# Select rows by index

# Select rows by label

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 3 |Pa ge

Adding and Dropping Columns

# Add a new column

Handling Missing Data

# Create a DataFrame with missing values

# Drop rows with missing values

# Fill missing values

Grouping and Aggregating Data

# Group by a column and calculate mean

# Group by multiple columns and calculate sum

# Create two DataFrames

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 4 |Pa ge

# Export DataFrame to CSV

# Export DataFrame to Excel

Advanced Pandas Tutorial

Creating Time Series Data

# Create a date range

# Create a DataFrame with time series data

Resampling Time Series Data

# Resample to weekly frequency and calculate the mean

# Resample to monthly frequency and calculate the sum

Working with Categorical Data

# Convert a column to categorical type

# Get the categories and codes

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 5 |Pa ge

# Create a pivot table

Handling Large Datasets

# Process each chunk

# Apply the function to each element

# Apply the function to each column

# Apply the function to each row

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 6 |Pa ge

# Calculate rolling mean

# Calculate expanding sum

# Calculate exponentially weighted mean

Handling JSON Data

# Read JSON string into DataFrame

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 7 |Pa ge

# Export DataFrame to JSON

Advanced Indexing with MultiIndex

# Accessing data in MultiIndex DataFrame

Combining DataFrames with concat and append

# Inefficient way: Using loops

# Efficient way: Using vectorized operations

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 8 |Pa ge

You might also like