PPS - Unit 5 (Imp Topics)
PPS - Unit 5 (Imp Topics)
Homogeneous Data: All elements in an ndarray must be of the same data type
(e.g., all integers or all floats)[1].
Shape: The shape of an ndarray is a tuple indicating the size along each dimension
(e.g., a 2x3 array has shape (2,3))[1].
dtype: This attribute tells you the type of data stored (e.g., int64, float64) [1].
ndim: This attribute tells you the number of dimensions (axes) the array has [1].
Comment: Here, np.array() converts a Python list into an ndarray. The dtype, shape,
and ndim attributes help you understand the structure of your data [1].
Multidimensional ndarrays
Comment: This creates a 2D array (matrix). The shape (2,4) means 2 rows and 4
columns[1].
NumPy provides functions like arange, zeros, ones, and reshape to create arrays efficiently:
Comment: These functions help you quickly generate arrays for computation [1].
Slicing is a way to extract sub-parts of an array, similar to slicing lists in Python but
extended to multiple dimensions[1].
Example: 1D Slicing
arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
print(arr[2:7:2]) # Output: [2 4 6]
Comment: Slices from index 2 to 6 (since end is exclusive), taking every 2nd
element[1].
Example: 2D Slicing
Slicing in NumPy returns a view, not a copy. Modifying the slice changes the original
array.
arr = np.arange(10)
arr_slice = arr[5:8]
arr_slice[:] = 100
print(arr) # The original array is changed at indices 5, 6, 7
Comment: This is different from Python lists, and is very efficient for large data [1].
Pandas is a powerful library for data manipulation and analysis. Its main data structures
are Series (1D) and DataFrame (2D)[1].
Creating a DataFrame
import pandas as pd
data = {
'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age': [27, 24, 22, 32],
'Address': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification': ['Msc', 'MA', 'MCA', 'Phd']
}
df = pd.DataFrame(data)
print(df)
Comment: DataFrames are like tables with rows and columns [1].
Selecting Columns
Selecting Rows
By Index:
print(df.loc[^0]) # Select row by index label
print(df.iloc[^1]) # Select row by integer position
By Condition:
Add:
Delete:
Renaming Columns
df = df.rename(columns={'Qualification': 'Degree'})
Real-world datasets often have missing values. Pandas provides tools to handle them[1].
import numpy as np
dict = {
'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]
}
df = pd.DataFrame(dict)
print(df.isnull()) # True where value is missing
print(df.notnull()) # True where value is present
Syntax
DataFrame.apply(func, axis=0)
Example
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
Comment: You can use built-in functions or custom functions with apply()[1].
Summary: Use NumPy for efficient numerical computations and Pandas for data
analysis with labeled, tabular data[1].
Other Python Libraries TensorFlow, Matplotlib, SciPy, Used for ML, plotting,
Scikit-learn, PyTorch computation, etc.
In summary, NumPy and Pandas are essential for efficient data manipulation and
analysis in Python. NumPy's ndarray is optimized for numerical operations, while Pandas'
DataFrame and Series provide powerful tools for handling labeled, tabular data, including
missing values and function application. Understanding these basics, along with the
differences between the libraries and their integration with other Python tools, is
foundational for anyone starting in data science or scientific computing [1].
1. UNIT-5.pptx