0% found this document useful (0 votes)
12 views10 pages

Phython 3

The document outlines Experiment-2.3, which focuses on building a classification model using various machine learning algorithms in Google Colaboratory. It covers key concepts such as training/testing data, features, evaluation metrics, and preprocessing, along with popular algorithms like Logistic Regression, Decision Trees, and Random Forest. Additionally, it details the steps for setting up a machine learning pipeline using Pycaret, including data normalization, feature selection, and outlier removal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

Phython 3

The document outlines Experiment-2.3, which focuses on building a classification model using various machine learning algorithms in Google Colaboratory. It covers key concepts such as training/testing data, features, evaluation metrics, and preprocessing, along with popular algorithms like Logistic Regression, Decision Trees, and Random Forest. Additionally, it details the steps for setting up a machine learning pipeline using Pycaret, including data normalization, feature selection, and outlier removal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Experiment-2.

Mark as done

Experiment-2.3

Build a classification model by using different machine learning algorithms. CO2, CO4

Tools/ Platforms Used: Google Colaboratory

Theory:
Classification is a supervised machine learning task where the goal is to predict the category or class of a given data point based
on input features. Various machine learning algorithms can be used to build classification models, each with its strengths and
weaknesses. Below is a brief overview of key concepts and algorithms used in classification:

Key Concepts in Classification


1. Training and Testing Data:

The dataset is split into training and testing sets to evaluate the model's performance on unseen data.
2. Features and Target Variable:

Features are the input variables (independent variables), and the target variable (dependent variable) is the class to
be predicted.
3. Evaluation Metrics:

Accuracy, precision, recall, F1-score, confusion matrix, ROC-AUC, etc., are used to assess model performance.
4. Preprocessing:

Data must be cleaned, normalized, and encoded (for categorical features) before training.

Popular Machine Learning Algorithms for Classification


1. Logistic Regression: Suitable for binary classification tasks; uses a logistic function to model the probability of a class.

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 1 of 10
:
2. Decision Tree: Splits data based on feature values; provides interpretable results but can overfit on small datasets.

3. Random Forest: An ensemble of decision trees; reduces overfitting and improves generalization.

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 2 of 10
:
4. Support Vector Machine (SVM): Finds the optimal hyperplane to separate classes; works well with small datasets and
high-dimensional spaces.

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 3 of 10
:
5. K-Nearest Neighbors (KNN): Classifies based on the majority vote of nearest neighbors; simple but computationally
expensive for large datasets.

6. Naive Bayes: Based on Bayes' Theorem; works well with categorical data and text classification.

7. Gradient Boosting (e.g., XGBoost, LightGBM): Builds an ensemble of weak learners (usually decision trees) and optimizes
the model iteratively.

Flow diagram of gradient boosting machine learning method. The ensemble... | Download Scientific Diagram

Creating a Machine Learning pipeline using Pycaret involves the following steps

Step 1: Getting the data


a) Getting list of datasets available in pycaret

!pip install pycaret &> /dev/null

from pycaret.datasets import get_data

dataSets = get_data('index') #shows list of 56 datasets

b) Loading the diabetes dataset


diabetesDataSet = get_data("diabetes")

# This is binary classification dataset. The values in "Class variable" have two (binary) values.

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 4 of 10
:
Step 3 : Setting up Pycaret Environment for all classification algorithms
from pycaret.classification import *

model_setup = setup(data=diabetesDataSet, target='Class variable')

The above is a list of 60 parameters along with their default set values, which may be changed if required.

The setup() function initializes the environment in pycaret and creates the transformation pipeline to prepare the data for
modeling and deployment. setup() must be called before executing any other function in pycaret. It takes two mandatory
parameters: a pandas dataframe and the name of the target column. Most of this part of the configuration is done
automatically, but some parameters can be set manually. For example:

· The default division ratio is 70:30 , but can be changed with "train_size".

· K-fold cross-validation is set to 10 by default

Step 4 : Building and comparing the Model Performance


best_model = compare_models()

# The above command lists out the Machine Learning algorithms existing in the Pycaret library in descending order of
their performances as shown in next figure

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 5 of 10
:
It can be observed from the above figure that the Logistic Regression model has the best performance.

step 4 : Building and comparing the Model Performance using Data Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is
to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the
ranges of values or losing information.

## Commonly used techniques: clipping, log scaling, z- score, minmax, maxabs, robust

model_setup = setup(data=diabetesDataSet, target='Class variable', norm alize = True, normalize_method = 'zscore',


silent=True)

best_model = compare_models()

Step 5 : Building and comparing the Model Performance using Feature Selection
Feature Selection is one of the core concepts in machine learning which hugely impacts the performance of your model.
The data features that you use to train your machine learning models have a huge influence on the performance you can
achieve. The goal of feature selection in machine learning is to find the best set of features that allows one to build useful
models of studied phenomena. Threshold used for feature selection (including newly created polynomial features). A
higher value will result in a higher feature space. It is recommended to do multiple trials with different values of
feature_selection_threshold.

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 6 of 10
:
model_setup = setup(data=diabetesDataSet, target='Class variable', feat ure_selection = True,
feature_selection_threshold = 0.6, silent=True) best_model = compare_models()

Step 6 : Building and comparing the Model Performance using Outlier Removal
Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other
data. These are called outliers and often machine learning modeling and model skill in general can be improved by
understanding and even removing these outlier values. outliers_threshold = 0.05 is the default value.

model_setup = setup(data=diabetesDataSet, target='Class variable', remo ve_outliers = True, outliers_threshold = 0.05,


silent=True)

best_model = compare_models()

Similarly, Transformation, PCA, or any of the combinations can be used in the set up.

Additional Resources
1. Openclass Room Tutorials: https://openclassrooms.com/en/courses/6389626-train-a-supervised-machine-
learning-model/6405911-build-and-evaluate-a-classification-model
2. Datacamp Tutorial: https://www.datacamp.com/blog/classification-machine-learning
3. GeeksforGeeks - https://www.geeksforgeeks.org/basic-concept-classification-data-mining/

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 7 of 10
:
Video Links
1. Classification In Machine Learning

2. Top 6 Machine Learning Algorithms for Beginners | Classification

3. How to build a classification model by using different machine learning algorithms

4. PyCaret Tutorial: Creating Model for the Classification Task

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 8 of 10
:
TEXT BOOKS/REFERENCE BOOKS
TEXT BOOKS

T1: Data Science from Scratch, Joel Grus, Shroff Publisher Publisher /O’Reilly Publisher Media, 2019.

https://drive.google.com/file/d/1qv89LVaEshX9hcmSS9KDMsvBP-UYC78h/view?usp=sharing

T2: Artificial Intelligence: A Modern Approach, 3rd Edition, by Stuart Russell and Peter Norvig, Pearson Publisher, 2010.

https://drive.google.com/file/d/1G-s5fsBh5rLMdWmIYvyeI2zclcDCAA_D/view?usp=sharing

T3: Machine Learning, Tom Mitchell, McGraw Hill, 2017.

https://drive.google.com/file/d/1IBgLq2GvyEXURAPfSDm-Eep94X0vYXDb/view?usp=sharing

REFERENCE BOOKS

RB1: Philipp Janert, Data Analysis with Open-Source Tools, Shroff Publisher Publisher /O’Reilly Publisher Media.

https://drive.google.com/file/d/1SVtjE5XEih7_aU433_cAJKiDF41-KuzU/view?usp=sharing

RB2: Andreas C. Müller & Sarah Guido ,Introduction to Machine Learning with Python,published by O'Reilly Media

https://www.nrigroupindia.com/e-
book/Introduction%20to%20Machine%20Learning%20with%20Python%20(%20PDFDrive.com%20)-min.pdf

RB3: Ms.Anitha Patibandla, Dr.B.Jyothi, Ms.K.Bhavana,ARTIFICIAL INTELLIGENCE & MACHINE LEARNING,Lecture notes

https://mrcet.com/downloads/digital_notes/ECE/III%20Year/AI%20&%20ML%20DIGITAL%20NOTES.pdf

Last modified: Monday, 6 January 2025, 11:30 AM

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 9 of 10
:
Contact us

! "

Follow us

    

You are logged in as BHAVNEET KAUR . (Log out)


Switch to the standard theme

https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 10 of 10
:

You might also like