0% found this document useful (0 votes)
8 views10 pages

Phyton

The document outlines the principles of supervised learning, focusing on training classifier models using labeled data. It details key concepts such as features, labels, training/testing data, and the steps involved in model development, including data preprocessing, model selection, and evaluation. Additionally, it provides examples of classification tasks and practical coding implementations using PyCaret for model training and evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Phyton

The document outlines the principles of supervised learning, focusing on training classifier models using labeled data. It details key concepts such as features, labels, training/testing data, and the steps involved in model development, including data preprocessing, model selection, and evaluation. Additionally, it provides examples of classification tasks and practical coding implementations using PyCaret for model training and evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Experiment-2.

Mark as done

Experiment-2.2

Understand supervised learning to train and develop classifier models. CO2, CO4

Tools/ Platforms Used:

Google Colaboratory

Theory:

Supervised Learning
Supervised learning is a type of machine learning where the model learns from labeled data. In this approach, the dataset
provided to the model contains input features (independent variables) and corresponding target labels (dependent variable). The
model learns the relationship between the inputs and the outputs to make predictions on new, unseen data.

Key Concepts in Supervised Learning:

1. Features and Labels:

Features (X): Independent variables that act as input to the model.


Labels (Y): Dependent variables or the output the model needs to predict.
2. Training and Testing:

Training Data: The subset of the dataset used to train the model.
Testing Data: The subset used to evaluate the model's performance.
The dataset is typically split into 70–80% training data and 20–30% testing data.
3. Objective:
The goal is to minimize the error between the predicted and actual outputs and to generalize well to unseen data.

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 1 of 10
:
Classification in Supervised Learning:
Classification is a supervised learning task where the output variable is categorical. Examples include:

Binary Classification: Predicting one of two categories (e.g., spam or not spam).
Multi-class Classification: Predicting one of multiple categories (e.g., types of fruits).

Steps to Train and Develop Classifier Models:

1. Data Preprocessing:

Clean the data (handle missing values, outliers).


Encode categorical variables.
Normalize or standardize numerical features.
2. Feature Selection and Engineering:

Select relevant features to improve model performance.


Create new features from existing ones if necessary.
3. Model Selection:

Choose an appropriate classification algorithm, such as:


Logistic Regression
Decision Trees
Random Forest
Support Vector Machines (SVM)
Neural Networks
4. Training:

Fit the selected algorithm to the training dataset using fit().


5. Evaluation:

Use metrics like accuracy, precision, recall, F1-score, and ROC-AUC to assess the model’s performance.
6. Hyperparameter Tuning:

Optimize the model's performance by adjusting hyperparameters using techniques like Grid Search or Random
Search.

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 2 of 10
:
Common Use Cases of Classification Models:

Email spam detection.


Fraud detection in financial systems.
Disease diagnosis in healthcare.
Sentiment analysis of customer reviews.

The coding example will help you understand how to implement these concepts practically.
# Import required libraries
from pycaret.datasets import get_data # To load example datasets

from pycaret.classification import * # To perform classification tasks using PyCaret

# Load available datasets list


dataSets = get_data('index') # Fetches the index of all available datasets in PyCaret

# Use this to explore and select appropriate datasets for analysis

# Load the diabetes dataset


diabetesDataSet = get_data("diabetes") # Loads the "diabetes" dataset, which is a binary classification problem

# The target column, "Class variable", has two classes (binary values)

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 3 of 10
:
# Set up the classification environment
s = setup(data=diabetesDataSet, target='Class variable')

# Initializes the PyCaret classification environment

# Specifies the dataset and the target column to be used for training

# Create a Random Forest model


rfModel = create_model('rf')

# Trains a Random Forest classifier model using the default hyperparameters

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 4 of 10
:
# Plot the confusion matrix
plot_model(rfModel, plot='confusion_matrix')

# Visualizes the confusion matrix for the Random Forest model to evaluate its performance

# Plot other default visualizations


plot_model(rfModel)

# Generates standard evaluation plots like ROC curve, Precision-Recall curve, etc.

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 5 of 10
:
# Save the trained Random Forest model to a file
sm = save_model(rfModel, 'rfModelFile')

# Saves the trained model to a file named 'rfModelFile.pkl' for future use

# Plot feature importance


plot_model(rfModel, plot='feature')

# Visualizes the importance of features in making predictions with the Random Forest model

# Prepare a new dataset for predictions

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 6 of 10
:
newDataSet = get_data("diabetes").iloc[:10]

# Loads a fresh copy of the diabetes dataset and selects the first 10 rows for testing

# Make predictions on the new dataset


newPredictions = predict_model(rfModel, data=newDataSet)

# Uses the trained Random Forest model to predict the class labels for the new data

# Display the predictions


newPredictions

# Outputs the predictions, including the class labels and probabilities for the new dataset

Additional Resources
1. Openclass Room Tutorials: https://openclassrooms.com/en/courses/6389626-train-a-supervised-machine-
learning-model/6405911-build-and-evaluate-a-classification-model
2. Datacamp Tutorial: https://www.datacamp.com/blog/classification-machine-learning
3. GeeksforGeeks - https://www.geeksforgeeks.org/basic-concept-classification-data-mining/

Video Links
1.Machine Learning in Python: Building a Classification Model

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 7 of 10
:
2.Random Forest Algorithm Explained with Python

3.Machine Learning Algorithms

4.PyCaret Tutorial: Splitting Data into Training and Testing Sets

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 8 of 10
:
TEXT BOOKS/REFERENCE BOOKS
TEXT BOOKS

T1: Data Science from Scratch, Joel Grus, Shroff Publisher Publisher /O’Reilly Publisher Media, 2019.

https://drive.google.com/file/d/1qv89LVaEshX9hcmSS9KDMsvBP-UYC78h/view?usp=sharing

T2: Artificial Intelligence: A Modern Approach, 3rd Edition, by Stuart Russell and Peter Norvig, Pearson Publisher, 2010.

https://drive.google.com/file/d/1G-s5fsBh5rLMdWmIYvyeI2zclcDCAA_D/view?usp=sharing

T3: Machine Learning, Tom Mitchell, McGraw Hill, 2017.

https://drive.google.com/file/d/1IBgLq2GvyEXURAPfSDm-Eep94X0vYXDb/view?usp=sharing

REFERENCE BOOKS

RB1: Philipp Janert, Data Analysis with Open-Source Tools, Shroff Publisher Publisher /O’Reilly Publisher Media.

https://drive.google.com/file/d/1SVtjE5XEih7_aU433_cAJKiDF41-KuzU/view?usp=sharing

RB2: Andreas C. Müller & Sarah Guido ,Introduction to Machine Learning with Python,published by O'Reilly Media

https://www.nrigroupindia.com/e-
book/Introduction%20to%20Machine%20Learning%20with%20Python%20(%20PDFDrive.com%20)-min.pdf

RB3: Ms.Anitha Patibandla, Dr.B.Jyothi, Ms.K.Bhavana,ARTIFICIAL INTELLIGENCE & MACHINE LEARNING,Lecture notes

https://mrcet.com/downloads/digital_notes/ECE/III%20Year/AI%20&%20ML%20DIGITAL%20NOTES.pdf

Last modified: Monday, 6 January 2025, 10:57 AM

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 9 of 10
:
Contact us

! "

Follow us

    

You are logged in as BHAVNEET KAUR . (Log out)


Switch to the standard theme

https://lms.cuchd.in/mod/page/view.php?id=1883200 05/03/25, 9 59 PM
Page 10 of 10
:

You might also like