0% found this document useful (0 votes)

19 views

Lab Manual

Uploaded by

wanaw46278

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Lab Manual

Uploaded by

wanaw46278

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Artificial Intelligence

Lab Manual
Program: BS(AI/CS/SE)

Prepared by: Syed Azeem Inam

1
Table of Content

S. No. Topic Page

1. Introduction to Google Colab, Python Libraries & Data 3–4

2. K-Nearest Neighbor Algorithm & Its Implementation 5–6

3. Naïve-Bayes Algorithm & Its Implementation 7 – 10

4. Support Vector Machine & Its Implementation 11-12

5. K-Means Algorithm & Its Implementation 13-14

6. K-Medoids Algorithm & Its Implementation 15

7. Agglomerative Hierarchical Clustering & Its Implementation 16

8. RegEx & Its Implementation 17

9. Text Processing & Its Implementation 18

2
Lab No. 01
Google Colab:
Google Colab, short for Google Colaboratory, is a popular cloud-based platform that allows you
to write and execute Python code in a web-based interactive environment. It's a great tool for data
analysis, machine learning, and collaborative coding projects. In this introduction, we'll cover the
basics of Google Colab, Python libraries, and working with data.

Google Colab offers several advantages:

 Free Cloud Computing: Colab provides free access to a GPU (Graphics Processing Unit)
and TPU (Tensor Processing Unit), which can significantly speed up computations,
especially for machine learning tasks.
 No Setup Required: You don't need to install Python, libraries, or configure your
environment. Everything is pre-configured and ready to use.
 Collaboration: You can easily share your Colab notebooks with others, making it a great
tool for collaborative work.
 Integration with Google Drive: You can save your Colab notebooks directly to Google
Drive, making it easy to organize and share your work.
Python Libraries
Python is a versatile programming language, and its strength in data analysis and machine learning
comes from a vast ecosystem of libraries. Here are some key libraries you'll often use:
 NumPy: For numerical operations, handling arrays, and performing mathematical
operations efficiently.
 Pandas: For data manipulation and analysis, including data cleaning, exploration, and
transformation.
 Matplotlib and Seaborn: For data visualization, creating plots and charts to better
understand your data.
 Scikit-Learn: A machine learning library that provides tools for classification, regression,
clustering, and more.
 TensorFlow and PyTorch: Deep learning libraries for building and training neural
networks.

3
Working with Data:
In data analysis and machine learning, working with data is fundamental. Here's a simplified
overview of the data workflow:
 Data Collection: Acquire data from various sources, such as files, databases, APIs, or web
scraping.
 Data Preprocessing: Clean and prepare the data by handling missing values, scaling,
encoding categorical variables, and more.
 Exploratory Data Analysis (EDA): Use libraries like Pandas and visualization tools to
understand the data's characteristics, distributions, and relationships.
 Data Modeling: Build, train, and evaluate machine learning models using libraries like
Scikit-Learn, TensorFlow, or PyTorch.
 Model Evaluation: Assess the model's performance using metrics like accuracy, precision,
recall, or custom evaluation criteria.
 Deployment: If the model is satisfactory, deploy it to production for real-world use.

Lab Task
Write the details of the following python libraries:
1. Numpy
2. Pandas
3. Matplotlib
4. Sci-Kit Learn
5. Seaborn

4
Lab No. 02
Supervised Learning:
It is the learning where the value or result that we want to predict is within the training data (labeled
data) and the value which is in data that we want to study is known as Target or Dependent Variable
or Response Variable.
All the other columns in the dataset are known as the Feature or Predictor Variable or Independent
Variable.
Supervised Learning is classified into two categories:
 Classification: Here our target variable consists of the categories.
 Regression: Here our target variable is continuous and we usually try to find out the line
of the curve.
As we have understood that to carry out supervised learning we need labeled data. How we can
get labeled data? There are various ways to get labeled data:
 Historical labeled Data
 Experiment to get data: We can perform experiments to generate labeled data like A/B
Testing.
 Crowd-sourcing
Now it’s time to understand algorithms that can be used to solve supervised machine learning
problem. In this post, we will be using popular scikit-learn package.

k-Nearest Neighbor Algorithm:

This algorithm is used to solve the classification model problems. K-nearest neighbor or K-NN
algorithm basically creates an imaginary boundary to classify the data. When new data points come
in, the algorithm will try to predict that to the nearest of the boundary line.Therefore, larger k value
means smother curves of separation resulting in less complex models. Whereas, smaller k value
tends to overfit the data and resulting in complex models.
Note: It’s very important to have the right k-value when analyzing the dataset to avoid overfitting
and underfitting of the dataset.
Using the k-nearest neighbor algorithm we fit the historical data (or train the model) and predict
the future.

5
Lab Task:
Implement K-Nearest Neighbor algorithm for the Iris dataset. The implementation should
involve the following steps:
1. The k-nearest neighbor algorithm is imported from the scikit-learn package.
2. Create feature and target variables.
3. Split data into training and test data.
4. Generate a k-NN model using neighbors value.
5. Train or fit the data into the model.
6. Predict the future.

6
Lab No. 03
Naive Bayes Classification:
Naive Bayes is the most straightforward and fast classification algorithm, which is suitable for a
large chunk of data. Naive Bayes classifier is successfully used in various applications such as
spam filtering, text classification, sentiment analysis, and recommender systems. It uses Bayes
theorem of probability for prediction of unknown class.

Classification Workflow
Whenever you perform classification, the first step is to understand the problem and identify
potential features and label. Features are those characteristics or attributes which affect the results
of the label. For example, in the case of a loan distribution, bank managers identify the customer’s
occupation, income, age, location, previous loan history, transaction history, and credit score.
These characteristics are known as features that help the model classify customers.

The classification has two phases, a learning phase and the evaluation phase. In the learning phase,
the classifier trains its model on a given dataset, and in the evaluation phase, it tests the classifier's
performance. Performance is evaluated on the basis of various parameters such as accuracy, error,
precision, and recall.

7
What is Naive Bayes Classifier?
Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the
simplest supervised learning algorithms. Naive Bayes classifier is the fast, accurate and reliable
algorithm. Naive Bayes classifiers have high accuracy and speed on large datasets.
Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of
other features. For example, a loan applicant is desirable or not depending on his/her income,
previous loan and transaction history, age, and location. Even if these features are interdependent,
these features are still considered independently. This assumption simplifies computation, and
that's why it is considered as naive. This assumption is called class conditional independence.

P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior
probability of h.
P(D): the probability of the data (regardless of the hypothesis). This is known as the prior
probability.
P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.
P(D|h): the probability of data d given that the hypothesis h was true. This is known as posterior
probability.
How Naive Bayes Classifier Works?
Let’s understand the working of Naive Bayes through an example. Given an example of weather
conditions and playing sports. You need to calculate the probability of playing sports. Now, you
need to classify whether players will play or not, based on the weather condition. Naive Bayes
classifier calculates the probability of an event in the following steps:
 Step 1: Calculate the prior probability for given class labels
 Step 2: Find Likelihood probability with each attribute for each class
 Step 3: Put these value in Bayes Formula and calculate posterior probability.
 Step 4: See which class has a higher probability, given the input belongs to the higher
probability class.
For simplifying prior and posterior probability calculation, you can use the two tables frequency
and likelihood tables. Both of these tables will help you to calculate the prior and posterior

8
probability. The Frequency table contains the occurrence of labels for all features. There are two
likelihood tables. Likelihood Table 1 is showing prior probabilities of labels and Likelihood Table
2 is showing the posterior probability.

Now suppose you want to calculate the probability of playing when the weather is overcast.
Probability of playing:
P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast) .....................(1)
Calculate Prior Probabilities:
P(Overcast) = 4/14 = 0.29
P(Yes)= 9/14 = 0.64
Calculate Posterior Probabilities:
P(Overcast |Yes) = 4/9 = 0.44
Put Prior and Posterior probabilities in equation (1)
P (Yes | Overcast) = 0.44 * 0.64 / 0.29 = 0.98(Higher)
Similarly, you can calculate the probability of not playing:
Probability of not playing:
P(No | Overcast) = P(Overcast | No) P(No) / P (Overcast) .....................(2)
Calculate Prior Probabilities:
P(Overcast) = 4/14 = 0.29

9
P(No)= 5/14 = 0.36
Calculate Posterior Probabilities:
P(Overcast |No) = 0/9 = 0
Put Prior and Posterior probabilities in equation (2)
P (No | Overcast) = 0 * 0.36 / 0.29 = 0
The probability of a 'Yes' class is higher. So you can determine here if the weather is overcast than
players will play the sport.

Lab Task:
Generate synthetic data using scikit-learn and train and evaluate the Gaussian Naive Bayes
algorithm. Use the following outline for the data generation.
 Create a dataset with six features, three classes, and 800 samples using the
`make_classification` function.
 Use matplotlib.pyplot’s `scatter` function to visualize the dataset.
 Split the dataset into training and testing for model evaluation.
 Build a generic Gaussian Naive Bayes and train it on a training dataset.
 Predict the values for the test dataset and use them to calculate accuracy and F1 score.
 Visualize the confusion matrix.

10
Lab No. 04
Introduction
SVM is a powerful supervised algorithm that works best on smaller datasets but on complex ones.
Support Vector Machine, abbreviated as SVM can be used for both regression and classification
tasks, but generally, they work best in classification problems. They were very famous around the
time they were created, during the 1990s, and keep on being the go-to method for a high-
performing algorithm with a little tuning.

What is a Support Vector Machine?

It is a supervised machine learning problem where we try to find a hyperplane that best separates

the two classes. Note: Don’t get confused between SVM and logistic regression. Both the

algorithms try to find the best hyperplane, but the main difference is logistic regression is a

probabilistic approach whereas support vector machine is based on statistical approaches.

Types of Support Vector Machine Algorithms
1. Linear SVM
When the data is perfectly linearly separable only then we can use Linear SVM. Perfectly linearly

separable means that the data points can be classified into 2 classes by using a single straight line(if

2D).
2. Non-Linear SVM
When the data is not linearly separable then we can use Non-Linear SVM, which means when the

data points cannot be separated into 2 classes by using a straight line (if 2D) then we use some

advanced techniques like kernel tricks to classify them. In most real-world applications we do not

find linearly separable datapoints hence we use kernel trick to solve them.
Important Terms
Now let’s define two main terms which will be repeated again and again in this article:

 Support Vectors: These are the points that are closest to the hyperplane. A separating line

will be defined with the help of these data points.

11
 Margin: it is the distance between the hyperplane and the observations closest to the

hyperplane (support vectors). In SVM large margin is considered a good margin. There are

two types of margins hard margin and soft margin. I will talk more about these two in the

later section.

Lab Task:
Implement Support Vector Machine (SVM) algorithm for the Iris dataset. The implementation
should involve the following steps:
1. The SVM algorithm is imported from the scikit-learn package.
2. Create feature and target variables.
3. Split data into training and test data.
4. Generate a SVM values.
5. Train or fit the data into the model.
6. Predict the future.

12
Lab No. 05
K-means
K-means is an unsupervised learning method for clustering data points. The algorithm iteratively
divides data points into K clusters by minimizing the variance in each cluster. Here, we will show
you how to estimate the best value for K using the elbow method, then use K-means clustering to
group the data points into clusters.

How does it work?

First, each data point is randomly assigned to one of the K clusters. Then, we compute the centroid
(functionally the center) of each cluster and reassign each data point to the cluster with the closest
centroid. We repeat this process until the cluster assignments for each data point are no longer
changing.
K-means clustering requires us to select K, the number of clusters we want to group the data into.
The elbow method lets us graph the inertia (a distance-based metric) and visualize the point at
which it starts decreasing linearly. This point is referred to as the "elbow" and is a good estimate
for the best value for K based on our data.

Lab Task:
Implement k-Means algorithm for any dataset. The implementation should involve the following
steps:
1. The k-Means algorithm is imported from the scikit-learn package.
2. Make clusters.
3. Implement k-Means algorithm.
4. Visualize the final outcome.

13
Lab No. 06
K-Medoids
K-Medoids is an unsupervised learning method for clustering data points. The algorithm iteratively
divides data points into K clusters by minimizing the variance in each cluster. Here, we will show
you how to estimate the best value for K using the elbow method, then use K-means clustering to
group the data points into clusters.

How does it work?

First, each data point is randomly assigned to one of the K clusters. Then, we compute the centroid
(functionally the center) of each cluster and reassign each data point to the cluster with the closest
centroid. We repeat this process until the cluster assignments for each data point are no longer
changing.
K-Medoids clustering requires us to select K, the number of clusters we want to group the data
into. The elbow method lets us graph the inertia (a distance-based metric) and visualize the point
at which it starts decreasing linearly. This point is referred to as the "elbow" and is a good estimate
for the best value for K based on our data.

Lab Task:
Implement k-Medoids algorithm for any dataset. The implementation should involve the following
steps:
1. Import all the relevant libraries.
2. Make a k-Medoid class.
3. For a set of random number implement k-Medoid algorithm.
4. Visualize the final outcome.

14
Lab No. 07
Introduction
In data mining and statistics, hierarchical clustering analysis is a method of clustering analysis that
seeks to build a hierarchy of clusters i.e. tree-type structure based on the hierarchy.
In machine learning, clustering is the unsupervised learning technique that groups the data based
on similarity between the set of data. There are different-different types of clustering algorithms
in machine learning. Connectivity-based clustering: This type of clustering algorithm builds the
cluster based on the connectivity between the data points. Example: Hierarchical clustering
Centroid-based clustering: This type of clustering algorithm forms around the centroids of the data
points. Example: K-Means clustering, K-Mode clustering
Distribution-based clustering: This type of clustering algorithm is modeled using statistical
distributions. It assumes that the data points in a cluster are generated from a particular probability
distribution, and the algorithm aims to estimate the parameters of the distribution to group similar
data points into clusters Example: Gaussian Mixture Models (GMM)
Density-based clustering: This type of clustering algorithm groups together data points that are in
high-density concentrations and separates points in low-concentrations regions. The basic idea is
that it identifies regions in the data space that have a high density of data points and groups those
points together into clusters. Example: DBSCAN(Density-Based Spatial Clustering of
Applications with Noise)

Hierarchical Agglomerative Clustering

It is also known as the bottom-up approach or hierarchical agglomerative clustering (HAC). A
structure that is more informative than the unstructured set of clusters returned by flat clustering.
This clustering algorithm does not require us to prespecify the number of clusters. Bottom-up
algorithms treat each data as a singleton cluster at the outset and then successively agglomerate
pairs of clusters until all clusters have been merged into a single cluster that contains all data.

Lab Task:
Implement Agglomerative Hierarchical Clustering algorithm for any dataset. The implementation
should involve the following steps:
1. Import all the relevant libraries.
2. Generate a random dataset.
3. Decide the number of clusters.
4. Deploy agglomerative hierarchical clustering algorithm.
5. Print the class labels.

15
Lab No. 08
Regular Expression

Lab Task:
Write a program to look for lines of the form “New Revision: 39772” for the file mbox.txt. The
text file is uploaded on MS Team along with the task.

16
Lab No. 09
Text Processing:
Whenever we have textual data, we need to apply several pre-processing steps to the data to transform
words into numerical features that work with machine learning algorithms. The pre-processing steps for a
problem depend mainly on the domain and the problem itself, hence, we don’t need to apply all steps to
every problem. We will be using the NLTK (Natural Language Toolkit) library here.

# import the necessary libraries

import nltk
import string
import re

def text_lowercase(text):
return text.lower()

input_str = "Hey, did you know that the summer break is coming? Amazing right !! It's only 5 more days
!!"
text_lowercase(input_str)

Example:
Input: “Hey, did you know that the summer break is coming? Amazing right!! It’s only 5 more days!!”
Output: “hey, did you know that the summer break is coming? amazing right!! it’s only 5 more days!!”

Remove numbers:
We can either remove numbers or convert the numbers into their textual representations. We can use
regular expressions to remove the numbers.
# Remove numbers
def remove_numbers(text):
result = re.sub(r'\d+', '', text)
return result

input_str = "There are 3 balls in this bag, and 12 in the other one."
remove_numbers(input_str)

Lab Task:
Convert the numbers into words:
Input: “There are 3 balls in this bag, and 12 in the other one.”
Output: ‘There are balls in this bag, and in the other one.’

MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
DWM Exp5 C49
No ratings yet
DWM Exp5 C49
12 pages
9699457926machine Learning Lab
No ratings yet
9699457926machine Learning Lab
55 pages
AI lab6 (1)
No ratings yet
AI lab6 (1)
7 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
mlintro-2
No ratings yet
mlintro-2
28 pages
Topics in Module-3-: ML & Cloud Computing For Iot
No ratings yet
Topics in Module-3-: ML & Cloud Computing For Iot
149 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
AIML Lab Improvement
No ratings yet
AIML Lab Improvement
20 pages
ML Lab Manual
No ratings yet
ML Lab Manual
47 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
KLS'S Vishwanathrao Deshpande Institute of Technology, Haliyal
No ratings yet
KLS'S Vishwanathrao Deshpande Institute of Technology, Haliyal
17 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Report Print
No ratings yet
Report Print
22 pages
ML & Cloud Computing For Iot: Topics in Module-3
No ratings yet
ML & Cloud Computing For Iot: Topics in Module-3
38 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
ML Lab Record
No ratings yet
ML Lab Record
27 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
mlintro-3
No ratings yet
mlintro-3
28 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
MACHINE LEARNING LAB manual
No ratings yet
MACHINE LEARNING LAB manual
48 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Algorithm
No ratings yet
Algorithm
27 pages
ML Lab Manual Arpan
No ratings yet
ML Lab Manual Arpan
48 pages
AAM book
No ratings yet
AAM book
159 pages
ML Practicals
No ratings yet
ML Practicals
10 pages
ML Lab Manual (IT-804)
No ratings yet
ML Lab Manual (IT-804)
49 pages
Practical # 9
No ratings yet
Practical # 9
4 pages
ML Merge
No ratings yet
ML Merge
145 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Vtu ML Lab Manual
67% (3)
Vtu ML Lab Manual
47 pages
Ml Lab Manual (5cs4-23)
No ratings yet
Ml Lab Manual (5cs4-23)
53 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
Unit 4
No ratings yet
Unit 4
26 pages
AIML
No ratings yet
AIML
5 pages
Classification
No ratings yet
Classification
50 pages
SK Learn
No ratings yet
SK Learn
9 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
mlintro-4
No ratings yet
mlintro-4
28 pages
AI Unit-4
No ratings yet
AI Unit-4
58 pages
DM assignment 2
No ratings yet
DM assignment 2
23 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Unit 5
No ratings yet
Unit 5
28 pages
Machine Learning Toolbox
No ratings yet
Machine Learning Toolbox
10 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Module 01- ML-21EC744
No ratings yet
Module 01- ML-21EC744
20 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
Dl Highlights
No ratings yet
Dl Highlights
6 pages
Unit 3 ML
No ratings yet
Unit 3 ML
28 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
Questions Answers Chapter Wise
No ratings yet
Questions Answers Chapter Wise
4 pages
COMP9417 Review Notes
No ratings yet
COMP9417 Review Notes
10 pages

Lab Manual

Uploaded by

Lab Manual

Uploaded by

Artificial Intelligence

Prepared by: Syed Azeem Inam

S. No. Topic Page

1. Introduction to Google Colab, Python Libraries & Data 3–4

2. K-Nearest Neighbor Algorithm & Its Implementation 5–6

3. Naïve-Bayes Algorithm & Its Implementation 7 – 10

4. Support Vector Machine & Its Implementation 11-12

5. K-Means Algorithm & Its Implementation 13-14

6. K-Medoids Algorithm & Its Implementation 15

7. Agglomerative Hierarchical Clustering & Its Implementation 16

8. RegEx & Its Implementation 17

9. Text Processing & Its Implementation 18

Google Colab offers several advantages:

k-Nearest Neighbor Algorithm:

What is a Support Vector Machine?

probabilistic approach whereas support vector machine is based on statistical approaches.

will be defined with the help of these data points.

How does it work?

How does it work?

Hierarchical Agglomerative Clustering

# import the necessary libraries

You might also like