0% found this document useful (0 votes)
3 views7 pages

PPS - Unit 5 (Imp Topics)

NumPy is a core library for scientific computing in Python, centered around the ndarray object, which holds homogeneous data in multi-dimensional arrays. It provides efficient array creation, slicing, and manipulation, while Pandas offers data structures like Series and DataFrame for data analysis, including handling missing values. Understanding the differences between NumPy and Pandas is crucial for effective data manipulation and analysis in Python.

Uploaded by

mk7023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

PPS - Unit 5 (Imp Topics)

NumPy is a core library for scientific computing in Python, centered around the ndarray object, which holds homogeneous data in multi-dimensional arrays. It provides efficient array creation, slicing, and manipulation, while Pandas offers data structures like Series and DataFrame for data analysis, including handling missing values. Understanding the differences between NumPy and Pandas is crucial for effective data manipulation and analysis in Python.

Uploaded by

mk7023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

NumPy ndarray

NumPy (Numerical Python) is a fundamental library for scientific computing in Python.


The heart of NumPy is the ndarray (N-dimensional array) object, which is a fast, flexible
container for large datasets of homogeneous (same type) data [1]. Here are the key
features and explanations for beginners:

 Homogeneous Data: All elements in an ndarray must be of the same data type
(e.g., all integers or all floats)[1].

 Shape: The shape of an ndarray is a tuple indicating the size along each dimension
(e.g., a 2x3 array has shape (2,3))[1].

 dtype: This attribute tells you the type of data stored (e.g., int64, float64) [1].

 ndim: This attribute tells you the number of dimensions (axes) the array has [1].

Creating and Inspecting an ndarray

import numpy as np # Import the numpy library

data1 = [6, 7.5, 8, 0, 1] # Define a Python list


arr1 = np.array(data1) # Convert the list to a NumPy array (ndarray)
print(arr1) # Output: [6. 7.5 8. 0. 1.]
print(arr1.dtype) # Output: float64 (data type of elements)
print(arr1.shape) # Output: (5,) (1D array with 5 elements)
print(arr1.ndim) # Output: 1 (one-dimensional)

 Comment: Here, np.array() converts a Python list into an ndarray. The dtype, shape,
and ndim attributes help you understand the structure of your data [1].

Multidimensional ndarrays

data2 = [[1,2,3,4], [5,6,7,8]] # List of lists for 2D array


arr2 = np.array(data2)
print(arr2)
# Output:
# [[1 2 3 4]
# [5 6 7 8]]
print(arr2.dtype) # int64
print(arr2.shape) # (2, 4) -> 2 rows, 4 columns
print(arr2.ndim) # 2 (two-dimensional)
print(arr2.shape[^0]) # 2 (number of rows)
print(arr2.shape[^1]) # 4 (number of columns)

 Comment: This creates a 2D array (matrix). The shape (2,4) means 2 rows and 4
columns[1].

Array Creation Functions

NumPy provides functions like arange, zeros, ones, and reshape to create arrays efficiently:

arr = np.arange(10) # Array from 0 to 9


arr2d = arr.reshape(2, 5) # Reshape to 2 rows, 5 columns
zeros = np.zeros((3, 4)) # 3x4 array of zeros
ones = np.ones((2, 2)) # 2x2 array of ones

 Comment: These functions help you quickly generate arrays for computation [1].

Slicing Arrays in NumPy

Slicing is a way to extract sub-parts of an array, similar to slicing lists in Python but
extended to multiple dimensions[1].

Basic Slicing Syntax

 array[start:end] extracts elements from index start to end-1.

 array[start:end:step] adds a step size.

Example: 1D Slicing

arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
print(arr[2:7:2]) # Output: [2 4 6]

 Comment: Slices from index 2 to 6 (since end is exclusive), taking every 2nd
element[1].

Example: 2D Slicing

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


print(arr2d[:, 2]) # Output: [3 6 9] (last column)
print(arr2d[1, 1:]) # Output: [5 6] (last two elements of middle row)

 Comment: : means "all rows" or "all columns" depending on position [1].


Slicing Modifies the Original Array

Slicing in NumPy returns a view, not a copy. Modifying the slice changes the original
array.

arr = np.arange(10)
arr_slice = arr[5:8]
arr_slice[:] = 100
print(arr) # The original array is changed at indices 5, 6, 7

 Comment: This is different from Python lists, and is very efficient for large data [1].

Dealing with Rows and Columns in Pandas

Pandas is a powerful library for data manipulation and analysis. Its main data structures
are Series (1D) and DataFrame (2D)[1].

Creating a DataFrame

import pandas as pd

data = {
'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age': [27, 24, 22, 32],
'Address': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification': ['Msc', 'MA', 'MCA', 'Phd']
}
df = pd.DataFrame(data)
print(df)

 Comment: DataFrames are like tables with rows and columns [1].

Selecting Columns

print(df['Name']) # Selects the 'Name' column as a Series


print(df[['Name', 'Qualification']]) # Selects multiple columns as a DataFrame

 Comment: Use single or double brackets for single/multiple columns [1].

Selecting Rows

 By Index:
print(df.loc[^0]) # Select row by index label
print(df.iloc[^1]) # Select row by integer position

 By Condition:

print(df[df['Age'] > 25]) # Select rows where Age > 25

Adding and Deleting Columns

 Add:

df['Salary'] = [50000, 60000, 70000, 80000] # Add new column

 Delete:

df = df.drop('Salary', axis=1) # Remove column

Renaming Columns

df = df.rename(columns={'Qualification': 'Degree'})

Working with Missing Data

Real-world datasets often have missing values. Pandas provides tools to handle them[1].

Checking for Missing Values

import numpy as np

dict = {
'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]
}
df = pd.DataFrame(dict)
print(df.isnull()) # True where value is missing
print(df.notnull()) # True where value is present

Filling Missing Values

df_filled = df.fillna(0) # Replace NaN with 0


Dropping Missing Values

df_dropped = df.dropna() # Remove rows with any NaN

 Comment: Handling missing data is crucial for accurate analysis [1].

Applying Functions to DataFrames

Pandas allows you to apply functions to rows or columns using apply()[1].

Syntax

DataFrame.apply(func, axis=0)

 func: The function to apply.

 axis=0: Apply function to each column (default).

 axis=1: Apply function to each row.

Example

import pandas as pd

df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})

# Sum of each column


print(df.apply(np.sum, axis=0))

# Sum of each row


print(df.apply(np.sum, axis=1))

 Comment: You can use built-in functions or custom functions with apply()[1].

Comparison between NumPy and Pandas

Feature NumPy Pandas

Data Structure ndarray (N-dimensional array) Series (1D), DataFrame (2D)


Data Type Homogeneous (all elements Heterogeneous (different types
same type) per column)

Speed Very fast for numerical Slightly slower due to more


computations features

Indexing Integer-based, slices Label-based and integer-based

Use-case Numerical calculations, linear Data analysis, tabular data,


algebra statistics

Memory Usage Lower (no metadata) Higher (stores labels, more


metadata)

Operations Vectorized, element-wise Powerful group-by, merge, pivot,


etc.

File I/O Limited (text/binary) Extensive (CSV, Excel, SQL,


JSON, etc.)

 Summary: Use NumPy for efficient numerical computations and Pandas for data
analysis with labeled, tabular data[1].

Other Python Libraries

Python's ecosystem is rich with libraries for various purposes [1]:

 TensorFlow: Deep learning and high-level computations, especially for neural


networks.

 Matplotlib: Data visualization (plots, graphs, charts).

 Pandas: Data analysis and manipulation (tabular data).

 NumPy: Numerical computing (arrays, matrices).

 SciPy: Scientific and technical computing, built on NumPy.

 Scrapy: Web scraping and data extraction from websites.

 Scikit-learn: Machine learning (classification, regression, clustering).

 PyGame: Game development.

 PyTorch: Deep learning, tensor computations with GPU support.

 PyBrain: Reinforcement learning and neural networks, beginner-friendly.


These libraries make Python powerful for data science, machine learning, visualization,
and more[1].

Summary Table: Key Concepts and Keywords

Concept Key Features/Keywords Example Code/Explanation

NumPy ndarray Homogeneous, shape, dtype, np.array([^1])


ndim, vectorization

Slicing arrays in NumPy start🔚step, views, arr[1:5], arr2d[:,2]


multidimensional slicing

Rows/Columns in Pandas DataFrame, Series, loc, iloc, df['Name'], df.loc


column selection

Working with Missing Data isnull(), notnull(), fillna(), df.isnull(), df.fillna(0)


dropna(), NaN

Applying Functions apply(), axis, custom functions, df.apply(np.sum, axis=0)


np.sum, np.mean

NumPy vs Pandas Speed, structure, use-case, See comparison table above


memory, operations

Other Python Libraries TensorFlow, Matplotlib, SciPy, Used for ML, plotting,
Scikit-learn, PyTorch computation, etc.

In summary, NumPy and Pandas are essential for efficient data manipulation and
analysis in Python. NumPy's ndarray is optimized for numerical operations, while Pandas'
DataFrame and Series provide powerful tools for handling labeled, tabular data, including
missing values and function application. Understanding these basics, along with the
differences between the libraries and their integration with other Python tools, is
foundational for anyone starting in data science or scientific computing [1].

1. UNIT-5.pptx

You might also like