Module 5.pptx - 20250608 - 201231 - 0000
Module 5.pptx - 20250608 - 201231 - 0000
Future Trends:
Edge ML, Explainable AI, Federated Learning, Quantum ML.
AI Ethics & Industry-Specific ML (Healthcare, Finance,
Retail).
Preparation of Dataset
Steps to prepare data before deploying a machine learning model:
1. Data collection: Collect the data that you will use to train your
model. This could be from a variety of sources such as databases,
CSV (comma separated values) files, or API(Appliation
Programing Interface)s.
2. Data cleaning: Check for any missing, duplicate or inconsistent
data and clean it. This may include removing any irrelevant
columns, filling in missing values, and formatting data correctly.
3. Data exploration: Explore the data to gain insights into its
distribution, relationships between features, and any outliers. Use
visualization tools to help identify patterns, anomalies and trends.
4. Data preprocessing: Prepare the data for use in the model by
normalizing or scaling the data, and transforming it into a format
that the model can understand.
5. Data splitting: Divide the data into training, validation, and testing
sets. The training set is used to train the model, the validation set is
used to fine-tune the model, and the testing set is used to evaluate
the model’s performance
6. Data augmentation: This step is optional, but it can help to
improve the model’s performance by creating new examples from the
existing data. This can include techniques such as rotating, flipping,
or cropping images.
7. Data annotation: This step is also optional, but it’s important when
working with image, video or audio data. Annotating the data is the
process of labeling the data, for example, by bounding boxes,
polygons, or points, to indicate the location of objects in the data.
Data preprocessing:
1. Getting dataset
2. Importing libraries
3. Import dataset
4. Finding missing values
5. Encoding categorical data
6. Split data in training and testing set
7. Feature scaling
1. Getting Dataset:
Finding and selecting a dataset relevant to the problem you want to
solve. Datasets can be obtained from public sources (Kaggle, UCI,
etc.) or private data collections.
2. Importing Libraries:
pandas – for data manipulation
numpy – for numerical operations
matplotlib & seaborn – for visualization
sklearn – for machine learning and preprocessing
3. Import Dataset:
Loading the dataset into a data frame using pandas (pd.read_csv(),
pd.read_excel(), etc.). This allows us to inspect the data structure
(columns, data types, sample values).
4. Finding Missing Values:
Checking for missing or null values in the dataset using
df.isnull().sum(). Handling them using:
Removal – If a row has too many missing values
Imputation – Replacing with mean, median, mode, or using
interpolation
5. Encoding Categorical Data:
Converting categorical variables into numerical format so ML models
can process them:
Label Encoding – Assigns numeric values
(e.g., "Male" → 0, "Female" → 1)
One-Hot Encoding – Creates binary columns for each category
6. Split Data into Training & Testing Sets:
Dividing the dataset into training and testing sets using train_test_split()
from sklearn.model_selection (e.g., 80% training, 20% testing).
7. Feature Scaling:
Standardizing or normalizing numerical features to bring them to the
same scale:
Standardization (StandardScaler) – Rescales data with mean = 0 and
std = 1
Normalization (MinMaxScaler) – Scales values between 0 and 1.
These preprocessing steps ensure that the dataset is clean and ready for
Testing of ANN
Testing of Artificial Neural Network (ANN):
Testing an Artificial Neural Network (ANN) involves
evaluating its performance on unseen data to measure
accuracy, generalization, and robustness.
The testing phase ensures that the trained model performs
well on new data.
Steps for ANN Testing:
1. Load Trained Model: If using a pre-trained model, load it from
disk.
2. Prepare Test Data: Ensure the test dataset is preprocessed the
same way as the training data.
3. Make Predictions: Feed the test dataset into the trained ANN
model.
4. Evaluate Performance: Use metrics like accuracy, precision,
recall, F1-score, and loss to assess model performance.
Key Metrics for Testing ANN:
1. Accuracy: Measures overall correctness.
2. Precision & Recall: Important for imbalanced datasets.
3. Loss Function: Measures prediction error.
4. Confusion Matrix: Visualizes classification results.
Python code for Testing of ANN:
import numpy as np
from tensorflow.keras.models import load_model
from sklearn.metrics import accuracy_score, classification_report
# Load the trained ANN model
model = load_model("ann_model.h5") # Replace with your model
path
# Load test data (assuming X_test and y_test are already prepared)
# X_test: Features for testing
# y_test: True labels for testing
Python code for Testing of ANN: CONTD…..
# Make predictions
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1) # Convert probabilities to
class labels
# Evaluate the modelaccuracy = accuracy_score(y_test, y_pred_classes)
print(f"ANN Test Accuracy: {accuracy:.4f}")
# Classification Reportprint("Classification Report:\n",
classification_report(y_test, y_pred_classes))
Decision Tree Classifier using python
When to Use Decision Trees:
Non-linear Relationships: When the dataset contains complex,
non-linear relationships between features.
High Interpretability: When interpretability is important, such as
in medical diagnoses.
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
results = []