0% found this document useful (0 votes)
11 views43 pages

Module 5.pptx - 20250608 - 201231 - 0000

Module 5.pptx_20250608_201231_0000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views43 pages

Module 5.pptx - 20250608 - 201231 - 0000

Module 5.pptx_20250608_201231_0000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

MODULE-5

Python Machine Learning


Essential Python Libraries-A road map for building machine learning
systems
Preparation of Dataset
Testing of ANN
Decision Tree and Naïve Bayes Classifier using python
Essential Python Libraries- A road map for
building machine learning systems
ML Roadmap includes essential resources, practical strategies, and
real-world projects to build a strong foundation in machine
learning.
What is Machine Learning?
Machine Learning (ML) is a subset of AI that enables systems to
learn from data and make predictions without explicit
programming.
Types of ML:
Supervised Learning – Uses labeled data (e.g., regression,
classification).
Unsupervised Learning – Finds patterns in unlabeled data (e.g.,
clustering, anomaly detection).
Reinforcement Learning – Agents learn via rewards and penalties.
Semi-Supervised Learning – Mix of labeled and unlabeled data.
Prerequisites:
Mathematics & Statistics – Linear algebra, calculus, probability.
Programming Skills – Python (NumPy, pandas, Scikit-learn), R,
SQL.
Data Handling – Collection, cleaning, exploratory data analysis
(EDA), and feature engineering.
ML Roadmap
1. Beginner Level:
Learn supervised (regression, classification) & unsupervised (clustering,
PCA) techniques.
Work with real-world datasets.
2. Intermediate Level:
Model selection & evaluation (cross-validation, hyperparameter
tuning).
Handling imbalanced data & using performance metrics (precision,
recall, F1-score, ROC-AUC- Area under the Receiver Operating
ML Roadmap CONTD…
3. Advanced Level:
Deep Learning – CNNs (image processing), RNNs (sequential data).
NLP – Text processing, embeddings (Word2Vec, BERT).
Computer Vision – Image classification, object detection,
segmentation.
Projects:
Beginner: Housing price prediction, digit classification.
Intermediate: Sentiment analysis, recommendation systems.
Advanced: Self-driving AI, real-time translation, GAN(Generative
Adversarial Network)s.

Future Trends:
Edge ML, Explainable AI, Federated Learning, Quantum ML.
AI Ethics & Industry-Specific ML (Healthcare, Finance,
Retail).
Preparation of Dataset
Steps to prepare data before deploying a machine learning model:
1. Data collection: Collect the data that you will use to train your
model. This could be from a variety of sources such as databases,
CSV (comma separated values) files, or API(Appliation
Programing Interface)s.
2. Data cleaning: Check for any missing, duplicate or inconsistent
data and clean it. This may include removing any irrelevant
columns, filling in missing values, and formatting data correctly.
3. Data exploration: Explore the data to gain insights into its
distribution, relationships between features, and any outliers. Use
visualization tools to help identify patterns, anomalies and trends.
4. Data preprocessing: Prepare the data for use in the model by
normalizing or scaling the data, and transforming it into a format
that the model can understand.
5. Data splitting: Divide the data into training, validation, and testing
sets. The training set is used to train the model, the validation set is
used to fine-tune the model, and the testing set is used to evaluate
the model’s performance
6. Data augmentation: This step is optional, but it can help to
improve the model’s performance by creating new examples from the
existing data. This can include techniques such as rotating, flipping,
or cropping images.
7. Data annotation: This step is also optional, but it’s important when
working with image, video or audio data. Annotating the data is the
process of labeling the data, for example, by bounding boxes,
polygons, or points, to indicate the location of objects in the data.
Data preprocessing:
1. Getting dataset
2. Importing libraries
3. Import dataset
4. Finding missing values
5. Encoding categorical data
6. Split data in training and testing set
7. Feature scaling
1. Getting Dataset:
Finding and selecting a dataset relevant to the problem you want to
solve. Datasets can be obtained from public sources (Kaggle, UCI,
etc.) or private data collections.
2. Importing Libraries:
pandas – for data manipulation
numpy – for numerical operations
matplotlib & seaborn – for visualization
sklearn – for machine learning and preprocessing
3. Import Dataset:
Loading the dataset into a data frame using pandas (pd.read_csv(),
pd.read_excel(), etc.). This allows us to inspect the data structure
(columns, data types, sample values).
4. Finding Missing Values:
Checking for missing or null values in the dataset using
df.isnull().sum(). Handling them using:
Removal – If a row has too many missing values
Imputation – Replacing with mean, median, mode, or using
interpolation
5. Encoding Categorical Data:
Converting categorical variables into numerical format so ML models
can process them:
Label Encoding – Assigns numeric values
(e.g., "Male" → 0, "Female" → 1)
One-Hot Encoding – Creates binary columns for each category
6. Split Data into Training & Testing Sets:
Dividing the dataset into training and testing sets using train_test_split()
from sklearn.model_selection (e.g., 80% training, 20% testing).
7. Feature Scaling:
Standardizing or normalizing numerical features to bring them to the
same scale:
Standardization (StandardScaler) – Rescales data with mean = 0 and
std = 1
Normalization (MinMaxScaler) – Scales values between 0 and 1.
These preprocessing steps ensure that the dataset is clean and ready for
Testing of ANN
Testing of Artificial Neural Network (ANN):
Testing an Artificial Neural Network (ANN) involves
evaluating its performance on unseen data to measure
accuracy, generalization, and robustness.
The testing phase ensures that the trained model performs
well on new data.
Steps for ANN Testing:
1. Load Trained Model: If using a pre-trained model, load it from
disk.
2. Prepare Test Data: Ensure the test dataset is preprocessed the
same way as the training data.
3. Make Predictions: Feed the test dataset into the trained ANN
model.
4. Evaluate Performance: Use metrics like accuracy, precision,
recall, F1-score, and loss to assess model performance.
Key Metrics for Testing ANN:
1. Accuracy: Measures overall correctness.
2. Precision & Recall: Important for imbalanced datasets.
3. Loss Function: Measures prediction error.
4. Confusion Matrix: Visualizes classification results.
Python code for Testing of ANN:
import numpy as np
from tensorflow.keras.models import load_model
from sklearn.metrics import accuracy_score, classification_report
# Load the trained ANN model
model = load_model("ann_model.h5") # Replace with your model
path
# Load test data (assuming X_test and y_test are already prepared)
# X_test: Features for testing
# y_test: True labels for testing
Python code for Testing of ANN: CONTD…..
# Make predictions
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1) # Convert probabilities to
class labels
# Evaluate the modelaccuracy = accuracy_score(y_test, y_pred_classes)
print(f"ANN Test Accuracy: {accuracy:.4f}")
# Classification Reportprint("Classification Report:\n",
classification_report(y_test, y_pred_classes))
Decision Tree Classifier using python
When to Use Decision Trees:
Non-linear Relationships: When the dataset contains complex,
non-linear relationships between features.
High Interpretability: When interpretability is important, such as
in medical diagnoses.

When to Use Naive Bayes:


Text Classification: Widely used in spam filtering, sentiment
analysis, and document classification due to its ability to handle
high-dimensional data.
Simple and Fast Solutions: Naive Bayes is a good choice when
you need a quick solution that works well with small datasets.
Decision Tree Classifier:
1. Install Required Libraries:
pip install numpy pandas scikit-learn matplotlib
2. Import Libraries: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTree Classifier, plot_tree
from sklearn.metrics import accuracy_score,
classification_report
3. Load Dataset (Example: Iris Dataset)

from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.2, random_state=42)
4. Train the Decision Tree Model:

# Create Decision Tree classifier


clf = DecisionTreeClassifier(criterion='gini',
max_depth=3, random_state=42)

# Train the model


clf.fit(X_train, y_train)
5. Make Predictions:
y_pred = clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print(classification_report(y_test, y_pred))
6. Visualize the Decision Tree:
plt.figure(figsize=(10, 6))
plot_tree(clf, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True)
plt.show()
Naïve Bayes Classifier using python
Types of Naïve Bayes Classifiers:
1. Gaussian NB → Used for continuous data (e.g., normal
distribution like Iris dataset).
2. Multinomial NB → Used for discrete counts, like text
classification.
3. Bernoulli NB → Used for binary features, e.g., spam
detection (presence/absence of words).
Naïve Bayes Classifier for Text Classification:

from sklearn.feature_extraction.text import CountVectorizer


from sklearn.naive_bayes import MultinomialNB

# Sample text data


texts = ["I love programming", "Python is amazing", "I hate
bugs", "Debugging is fun", "I dislike errors"]
labels = [1, 1, 0, 1, 0] # 1 = Positive, 0 = Negative

# Convert text to feature vectors


vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts
Naïve Bayes Classifier for Text Classification: CONTD…
#Train a Naïve Bayes classifierX_train, X_test, y_train,
y_test = train_test_split(X, labels, test_size=0.2,
random_state=42)
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)
# Make predictions
y_pred = nb_classifier.predict(X_test)
# Evaluate
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')
Develop a python frame work that trains all three models ANN,DT,NB
on the same dataset and automatically generated a performance
comparison report
Key Features of This Framework:

Loads any CSV dataset

Preprocesses the data

Splits into training/testing

Trains all 3 models

Generates accuracy, precision, recall, F1-score

Displays a comparison report in table format


import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder,
StandardScaler
from sklearn.metrics import accuracy_score,
precision_score, recall_score, f1_score,
classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from tabulate import tabulate
def load_and_prepare_data(filepath):
data = pd.read_csv(filepath)

# Separate features and label


X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Encode categorical labels if necessary


if y.dtype == 'object':
le = LabelEncoder()
y = le.fit_transform(y)

# Handle categorical features if any


for col in X.select_dtypes(include=['object']).columns:
X[col] = LabelEncoder().fit_transform(X[col])
# Feature Scaling
scaler = StandardScaler()
X = scaler.fit_transform(X)

return train_test_split(X, y, test_size=0.3, random_state=42)

def evaluate_model(model, X_test, y_test):


y_pred = model.predict(X_test)
return {
'Accuracy': accuracy_score(y_test, y_pred),
'Precision': precision_score(y_test, y_pred,
average='weighted', zero_division=0),
'Recall': recall_score(y_test, y_pred, average='weighted',
zero_division=0),
'F1-Score': f1_score(y_test, y_pred, average='weighted',
zero_division=0)
}
def train_and_compare_models(X_train, X_test, y_train,
y_test):
models = {
'ANN': MLPClassifier(hidden_layer_sizes=(100,),
max_iter=500, random_state=42),
'Decision Tree': DecisionTreeClassifier(random_state=42),
'Naive Bayes': GaussianNB()
}

results = []

for name, model in models.items():


model.fit(X_train, y_train)
metrics = evaluate_model(model, X_test, y_test)
results.append([name] + list(metrics.values()))

headers = ["Model", "Accuracy", "Precision", "Recall", "F1-


Score"]
print("\nPerformance Comparison Report:")
print(tabulate(results, headers=headers, tablefmt="grid"))
# --- Main Execution ---
if __name__ == "__main__":
filepath = "your_dataset.csv" # Replace with your dataset path
X_train, X_test, y_train, y_test = load_and_prepare_data(filepath)
train_and_compare_models(X_train, X_test, y_train, y_test)

Performance Comparison Report:

MODEL ACURACY PRECISION RECALL F1-SCORE


ANN 0.91 0.91 0.91 0.91
DECCISION TREE 0.88 0.88 0.88 0.88
NAÏVE BAYES 0.84 0.83 0.84 0.83
Thank You

You might also like