NUMPY LEARNINNG PLAN
1. Introduction to NumPy
• What is NumPy?
o Learn about NumPy, its importance in scientific computing and data analysis.
• Installation of NumPy
o Install NumPy using pip install numpy.
2. NumPy Basics
• Understanding Arrays
o What are NumPy arrays (ndarrays) and their advantages over Python lists.
• Creating Arrays
o array(), zeros(), ones(), empty(), full(), arange(), linspace(), eye(), and random().
• Array Data Types
o Specifying data types (dtype) while creating arrays.
• Array Attributes
o shape, dtype, ndim, size, and itemsize.
3. Array Operations
• Basic Operations
o Arithmetic operations (addition, subtraction, multiplication, division, modulus).
• Universal Functions (ufuncs)
o np.add(), np.subtract(), np.multiply(), np.divide(), trigonometric functions.
4. Indexing and Slicing
• Array Indexing
o Accessing elements using positive and negative indices.
• Slicing Arrays
o Understanding how to slice arrays (arr[start:end:step]).
• Fancy Indexing and Boolean Masking
o Accessing arrays using lists or boolean masks.
• Modifying Arrays
o How to modify specific elements or sections of arrays.
5. Reshaping and Resizing Arrays
• Reshape Arrays
o Using reshape(), flatten() to modify array shapes.
• Concatenating and Splitting Arrays
o concatenate(), vstack(), hstack(), split(), and hsplit().
6. Statistical and Mathematical Functions
• Basic Statistics
o mean(), median(), sum(), min(), max(), std(), var().
• Aggregation Functions
o Cumulative sum (cumsum()), cumulative product (cumprod()).
• Random Sampling
o Random number generation with np.random (e.g., rand(), randn(), seed()).
7. Working with Matrices
• Matrix Operations
o Creating matrices, dot(), matrix multiplication, transpose (T), determinant, inverse.
• Solving Linear Equations
o Use np.linalg for linear algebra operations like solve().
8. Advanced Array Manipulation
• Stacking and Unstacking
o np.stack(), np.hstack(), np.vstack().
• Array Transpose and Swapping Axes
o transpose(), swapaxes().
• Sorting and Searching
o Sorting arrays (sort()), searching (where(), argmax(), argmin()).
PANDAS LEARNING PLAN
1. Introduction to pandas
• What is pandas?
o Overview of pandas and its role in data manipulation and analysis.
• Installation of pandas
o Install pandas using pip install pandas.
2. pandas Basics
• Understanding Data Structures
o Series (1D) and DataFrame (2D).
• Creating Data Structures
o Create Series and DataFrames from lists, dictionaries, and NumPy arrays.
• Attributes of DataFrames
o Learn about shape, index, columns, dtypes.
3. DataFrame Operations
• Indexing and Selecting Data
o Access rows/columns using loc[], iloc[], and direct indexing.
• Column and Row Manipulation
o Adding, renaming, and dropping columns and rows.
• Basic Operations on DataFrames
o Arithmetic operations
4. Handling Missing Data
• Identifying Missing Data
o Use isnull(), notnull(), and sum() to detect missing values.
• Handling Missing Values
o Drop missing values with dropna().
o Fill missing values using fillna() (forward-fill, backward-fill, filling with specific values).
• Imputation Techniques
o Imputation with mean, median, mode.
5. Data Cleaning Techniques
• Removing Duplicates
o Use drop_duplicates() to remove duplicate rows.
• String Operations
o Apply string functions (str.strip(), str.lower(), str.contains()).
• Data Type Conversion
o Convert data types using astype().
• Replacing Values
o Replace values using replace() for cleaning inconsistent or erroneous data.
6. Data Transformation
• Renaming Columns
o Use rename() to modify column names.
• Mapping Values
o Map specific values in columns using map() and replace().
• Handling Categorical Data
o Convert categorical variables into numerical format using get_dummies() or
pd.Categorical().
7. Data Merging and Joining
• Concatenation
o Concatenate data using concat() (vertical and horizontal).
• Merging and Joining DataFrames
o Use merge() to join DataFrames on keys.
• Handling Duplicates in Merging
o Learn strategies for handling overlapping or duplicate data during merges.
8. Sorting and Filtering Data
• Sorting Data
o Sort DataFrames by values or index using sort_values() and sort_index().
• Filtering Data
o Apply conditions to filter data (e.g., df[df['column'] > value]).
• Querying Data
o Use query() for efficient data querying based on conditions.
9. Aggregation and Grouping
• Grouping Data
o Group data using groupby() and apply aggregation functions (sum(), mean(),
count()).
• Pivot Tables
o Create pivot tables with pivot_table().
• Multi-level Indexing
o Work with hierarchical indexing and perform grouped operations.
10. Time Series Data
• Working with Dates and Times
o Parse dates with pd.to_datetime().
• Resampling and Frequency Conversion
o Resample time-series data (e.g., resample(), asfreq()).
• Shifting Data
o Shift data forward or backward with shift().
12. Data Preprocessing for Machine Learning
• Feature Scaling
o Apply normalization or standardization (Min-Max Scaling, Z-Score Scaling) using
pandas or sklearn.preprocessing.
• Encoding Categorical Variables
o Handle categorical data with one-hot encoding (get_dummies()) and label encoding.
• Handling Outliers
o Detect and remove outliers using z-scores, IQR method, or domain knowledge.
• Splitting Data for Training and Testing
o Split data using train_test_split() from scikit-learn for model evaluation.
13. File I/O with pandas
• Reading Data from Files
o Load data from CSV, Excel, JSON, and SQL databases using read_csv(), read_excel(),
read_json(), read_sql().
• Writing Data to Files
o Save DataFrames to CSV, Excel, JSON, or SQL formats using to_csv(), to_excel(),
to_json(), to_sql().