Phython 3
Phython 3
Mark as done
Experiment-2.3
Build a classification model by using different machine learning algorithms. CO2, CO4
Theory:
Classification is a supervised machine learning task where the goal is to predict the category or class of a given data point based
on input features. Various machine learning algorithms can be used to build classification models, each with its strengths and
weaknesses. Below is a brief overview of key concepts and algorithms used in classification:
The dataset is split into training and testing sets to evaluate the model's performance on unseen data.
2. Features and Target Variable:
Features are the input variables (independent variables), and the target variable (dependent variable) is the class to
be predicted.
3. Evaluation Metrics:
Accuracy, precision, recall, F1-score, confusion matrix, ROC-AUC, etc., are used to assess model performance.
4. Preprocessing:
Data must be cleaned, normalized, and encoded (for categorical features) before training.
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 1 of 10
:
2. Decision Tree: Splits data based on feature values; provides interpretable results but can overfit on small datasets.
3. Random Forest: An ensemble of decision trees; reduces overfitting and improves generalization.
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 2 of 10
:
4. Support Vector Machine (SVM): Finds the optimal hyperplane to separate classes; works well with small datasets and
high-dimensional spaces.
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 3 of 10
:
5. K-Nearest Neighbors (KNN): Classifies based on the majority vote of nearest neighbors; simple but computationally
expensive for large datasets.
6. Naive Bayes: Based on Bayes' Theorem; works well with categorical data and text classification.
7. Gradient Boosting (e.g., XGBoost, LightGBM): Builds an ensemble of weak learners (usually decision trees) and optimizes
the model iteratively.
Flow diagram of gradient boosting machine learning method. The ensemble... | Download Scientific Diagram
Creating a Machine Learning pipeline using Pycaret involves the following steps
# This is binary classification dataset. The values in "Class variable" have two (binary) values.
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 4 of 10
:
Step 3 : Setting up Pycaret Environment for all classification algorithms
from pycaret.classification import *
The above is a list of 60 parameters along with their default set values, which may be changed if required.
The setup() function initializes the environment in pycaret and creates the transformation pipeline to prepare the data for
modeling and deployment. setup() must be called before executing any other function in pycaret. It takes two mandatory
parameters: a pandas dataframe and the name of the target column. Most of this part of the configuration is done
automatically, but some parameters can be set manually. For example:
· The default division ratio is 70:30 , but can be changed with "train_size".
# The above command lists out the Machine Learning algorithms existing in the Pycaret library in descending order of
their performances as shown in next figure
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 5 of 10
:
It can be observed from the above figure that the Logistic Regression model has the best performance.
step 4 : Building and comparing the Model Performance using Data Normalization
Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is
to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the
ranges of values or losing information.
## Commonly used techniques: clipping, log scaling, z- score, minmax, maxabs, robust
best_model = compare_models()
Step 5 : Building and comparing the Model Performance using Feature Selection
Feature Selection is one of the core concepts in machine learning which hugely impacts the performance of your model.
The data features that you use to train your machine learning models have a huge influence on the performance you can
achieve. The goal of feature selection in machine learning is to find the best set of features that allows one to build useful
models of studied phenomena. Threshold used for feature selection (including newly created polynomial features). A
higher value will result in a higher feature space. It is recommended to do multiple trials with different values of
feature_selection_threshold.
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 6 of 10
:
model_setup = setup(data=diabetesDataSet, target='Class variable', feat ure_selection = True,
feature_selection_threshold = 0.6, silent=True) best_model = compare_models()
Step 6 : Building and comparing the Model Performance using Outlier Removal
Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other
data. These are called outliers and often machine learning modeling and model skill in general can be improved by
understanding and even removing these outlier values. outliers_threshold = 0.05 is the default value.
best_model = compare_models()
Similarly, Transformation, PCA, or any of the combinations can be used in the set up.
Additional Resources
1. Openclass Room Tutorials: https://openclassrooms.com/en/courses/6389626-train-a-supervised-machine-
learning-model/6405911-build-and-evaluate-a-classification-model
2. Datacamp Tutorial: https://www.datacamp.com/blog/classification-machine-learning
3. GeeksforGeeks - https://www.geeksforgeeks.org/basic-concept-classification-data-mining/
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 7 of 10
:
Video Links
1. Classification In Machine Learning
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 8 of 10
:
TEXT BOOKS/REFERENCE BOOKS
TEXT BOOKS
T1: Data Science from Scratch, Joel Grus, Shroff Publisher Publisher /O’Reilly Publisher Media, 2019.
https://drive.google.com/file/d/1qv89LVaEshX9hcmSS9KDMsvBP-UYC78h/view?usp=sharing
T2: Artificial Intelligence: A Modern Approach, 3rd Edition, by Stuart Russell and Peter Norvig, Pearson Publisher, 2010.
https://drive.google.com/file/d/1G-s5fsBh5rLMdWmIYvyeI2zclcDCAA_D/view?usp=sharing
https://drive.google.com/file/d/1IBgLq2GvyEXURAPfSDm-Eep94X0vYXDb/view?usp=sharing
REFERENCE BOOKS
RB1: Philipp Janert, Data Analysis with Open-Source Tools, Shroff Publisher Publisher /O’Reilly Publisher Media.
https://drive.google.com/file/d/1SVtjE5XEih7_aU433_cAJKiDF41-KuzU/view?usp=sharing
RB2: Andreas C. Müller & Sarah Guido ,Introduction to Machine Learning with Python,published by O'Reilly Media
https://www.nrigroupindia.com/e-
book/Introduction%20to%20Machine%20Learning%20with%20Python%20(%20PDFDrive.com%20)-min.pdf
RB3: Ms.Anitha Patibandla, Dr.B.Jyothi, Ms.K.Bhavana,ARTIFICIAL INTELLIGENCE & MACHINE LEARNING,Lecture notes
https://mrcet.com/downloads/digital_notes/ECE/III%20Year/AI%20&%20ML%20DIGITAL%20NOTES.pdf
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 9 of 10
:
Contact us
! "
Follow us
https://lms.cuchd.in/mod/page/view.php?id=1883201 05/03/25, 9 59 PM
Page 10 of 10
: