0% found this document useful (0 votes)
19 views

Lab Manual

Uploaded by

wanaw46278
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lab Manual

Uploaded by

wanaw46278
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Artificial Intelligence

Lab Manual
Program: BS(AI/CS/SE)

Prepared by: Syed Azeem Inam

1
Table of Content

S. No. Topic Page

1. Introduction to Google Colab, Python Libraries & Data 3–4

2. K-Nearest Neighbor Algorithm & Its Implementation 5–6

3. Naïve-Bayes Algorithm & Its Implementation 7 – 10

4. Support Vector Machine & Its Implementation 11-12

5. K-Means Algorithm & Its Implementation 13-14

6. K-Medoids Algorithm & Its Implementation 15

7. Agglomerative Hierarchical Clustering & Its Implementation 16

8. RegEx & Its Implementation 17

9. Text Processing & Its Implementation 18

2
Lab No. 01
Google Colab:
Google Colab, short for Google Colaboratory, is a popular cloud-based platform that allows you
to write and execute Python code in a web-based interactive environment. It's a great tool for data
analysis, machine learning, and collaborative coding projects. In this introduction, we'll cover the
basics of Google Colab, Python libraries, and working with data.

Google Colab offers several advantages:


 Free Cloud Computing: Colab provides free access to a GPU (Graphics Processing Unit)
and TPU (Tensor Processing Unit), which can significantly speed up computations,
especially for machine learning tasks.
 No Setup Required: You don't need to install Python, libraries, or configure your
environment. Everything is pre-configured and ready to use.
 Collaboration: You can easily share your Colab notebooks with others, making it a great
tool for collaborative work.
 Integration with Google Drive: You can save your Colab notebooks directly to Google
Drive, making it easy to organize and share your work.
Python Libraries
Python is a versatile programming language, and its strength in data analysis and machine learning
comes from a vast ecosystem of libraries. Here are some key libraries you'll often use:
 NumPy: For numerical operations, handling arrays, and performing mathematical
operations efficiently.
 Pandas: For data manipulation and analysis, including data cleaning, exploration, and
transformation.
 Matplotlib and Seaborn: For data visualization, creating plots and charts to better
understand your data.
 Scikit-Learn: A machine learning library that provides tools for classification, regression,
clustering, and more.
 TensorFlow and PyTorch: Deep learning libraries for building and training neural
networks.

3
Working with Data:
In data analysis and machine learning, working with data is fundamental. Here's a simplified
overview of the data workflow:
 Data Collection: Acquire data from various sources, such as files, databases, APIs, or web
scraping.
 Data Preprocessing: Clean and prepare the data by handling missing values, scaling,
encoding categorical variables, and more.
 Exploratory Data Analysis (EDA): Use libraries like Pandas and visualization tools to
understand the data's characteristics, distributions, and relationships.
 Data Modeling: Build, train, and evaluate machine learning models using libraries like
Scikit-Learn, TensorFlow, or PyTorch.
 Model Evaluation: Assess the model's performance using metrics like accuracy, precision,
recall, or custom evaluation criteria.
 Deployment: If the model is satisfactory, deploy it to production for real-world use.

Lab Task
Write the details of the following python libraries:
1. Numpy
2. Pandas
3. Matplotlib
4. Sci-Kit Learn
5. Seaborn

4
Lab No. 02
Supervised Learning:
It is the learning where the value or result that we want to predict is within the training data (labeled
data) and the value which is in data that we want to study is known as Target or Dependent Variable
or Response Variable.
All the other columns in the dataset are known as the Feature or Predictor Variable or Independent
Variable.
Supervised Learning is classified into two categories:
 Classification: Here our target variable consists of the categories.
 Regression: Here our target variable is continuous and we usually try to find out the line
of the curve.
As we have understood that to carry out supervised learning we need labeled data. How we can
get labeled data? There are various ways to get labeled data:
 Historical labeled Data
 Experiment to get data: We can perform experiments to generate labeled data like A/B
Testing.
 Crowd-sourcing
Now it’s time to understand algorithms that can be used to solve supervised machine learning
problem. In this post, we will be using popular scikit-learn package.

k-Nearest Neighbor Algorithm:


This algorithm is used to solve the classification model problems. K-nearest neighbor or K-NN
algorithm basically creates an imaginary boundary to classify the data. When new data points come
in, the algorithm will try to predict that to the nearest of the boundary line.Therefore, larger k value
means smother curves of separation resulting in less complex models. Whereas, smaller k value
tends to overfit the data and resulting in complex models.
Note: It’s very important to have the right k-value when analyzing the dataset to avoid overfitting
and underfitting of the dataset.
Using the k-nearest neighbor algorithm we fit the historical data (or train the model) and predict
the future.

5
Lab Task:
Implement K-Nearest Neighbor algorithm for the Iris dataset. The implementation should
involve the following steps:
1. The k-nearest neighbor algorithm is imported from the scikit-learn package.
2. Create feature and target variables.
3. Split data into training and test data.
4. Generate a k-NN model using neighbors value.
5. Train or fit the data into the model.
6. Predict the future.

6
Lab No. 03
Naive Bayes Classification:
Naive Bayes is the most straightforward and fast classification algorithm, which is suitable for a
large chunk of data. Naive Bayes classifier is successfully used in various applications such as
spam filtering, text classification, sentiment analysis, and recommender systems. It uses Bayes
theorem of probability for prediction of unknown class.

Classification Workflow
Whenever you perform classification, the first step is to understand the problem and identify
potential features and label. Features are those characteristics or attributes which affect the results
of the label. For example, in the case of a loan distribution, bank managers identify the customer’s
occupation, income, age, location, previous loan history, transaction history, and credit score.
These characteristics are known as features that help the model classify customers.

The classification has two phases, a learning phase and the evaluation phase. In the learning phase,
the classifier trains its model on a given dataset, and in the evaluation phase, it tests the classifier's
performance. Performance is evaluated on the basis of various parameters such as accuracy, error,
precision, and recall.

7
What is Naive Bayes Classifier?
Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the
simplest supervised learning algorithms. Naive Bayes classifier is the fast, accurate and reliable
algorithm. Naive Bayes classifiers have high accuracy and speed on large datasets.
Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of
other features. For example, a loan applicant is desirable or not depending on his/her income,
previous loan and transaction history, age, and location. Even if these features are interdependent,
these features are still considered independently. This assumption simplifies computation, and
that's why it is considered as naive. This assumption is called class conditional independence.

P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior
probability of h.
P(D): the probability of the data (regardless of the hypothesis). This is known as the prior
probability.
P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.
P(D|h): the probability of data d given that the hypothesis h was true. This is known as posterior
probability.
How Naive Bayes Classifier Works?
Let’s understand the working of Naive Bayes through an example. Given an example of weather
conditions and playing sports. You need to calculate the probability of playing sports. Now, you
need to classify whether players will play or not, based on the weather condition. Naive Bayes
classifier calculates the probability of an event in the following steps:
 Step 1: Calculate the prior probability for given class labels
 Step 2: Find Likelihood probability with each attribute for each class
 Step 3: Put these value in Bayes Formula and calculate posterior probability.
 Step 4: See which class has a higher probability, given the input belongs to the higher
probability class.
For simplifying prior and posterior probability calculation, you can use the two tables frequency
and likelihood tables. Both of these tables will help you to calculate the prior and posterior

8
probability. The Frequency table contains the occurrence of labels for all features. There are two
likelihood tables. Likelihood Table 1 is showing prior probabilities of labels and Likelihood Table
2 is showing the posterior probability.

Now suppose you want to calculate the probability of playing when the weather is overcast.
Probability of playing:
P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast) .....................(1)
Calculate Prior Probabilities:
P(Overcast) = 4/14 = 0.29
P(Yes)= 9/14 = 0.64
Calculate Posterior Probabilities:
P(Overcast |Yes) = 4/9 = 0.44
Put Prior and Posterior probabilities in equation (1)
P (Yes | Overcast) = 0.44 * 0.64 / 0.29 = 0.98(Higher)
Similarly, you can calculate the probability of not playing:
Probability of not playing:
P(No | Overcast) = P(Overcast | No) P(No) / P (Overcast) .....................(2)
Calculate Prior Probabilities:
P(Overcast) = 4/14 = 0.29

9
P(No)= 5/14 = 0.36
Calculate Posterior Probabilities:
P(Overcast |No) = 0/9 = 0
Put Prior and Posterior probabilities in equation (2)
P (No | Overcast) = 0 * 0.36 / 0.29 = 0
The probability of a 'Yes' class is higher. So you can determine here if the weather is overcast than
players will play the sport.

Lab Task:
Generate synthetic data using scikit-learn and train and evaluate the Gaussian Naive Bayes
algorithm. Use the following outline for the data generation.
 Create a dataset with six features, three classes, and 800 samples using the
`make_classification` function.
 Use matplotlib.pyplot’s `scatter` function to visualize the dataset.
 Split the dataset into training and testing for model evaluation.
 Build a generic Gaussian Naive Bayes and train it on a training dataset.
 Predict the values for the test dataset and use them to calculate accuracy and F1 score.
 Visualize the confusion matrix.

10
Lab No. 04
Introduction
SVM is a powerful supervised algorithm that works best on smaller datasets but on complex ones.
Support Vector Machine, abbreviated as SVM can be used for both regression and classification
tasks, but generally, they work best in classification problems. They were very famous around the
time they were created, during the 1990s, and keep on being the go-to method for a high-
performing algorithm with a little tuning.

What is a Support Vector Machine?

It is a supervised machine learning problem where we try to find a hyperplane that best separates

the two classes. Note: Don’t get confused between SVM and logistic regression. Both the

algorithms try to find the best hyperplane, but the main difference is logistic regression is a

probabilistic approach whereas support vector machine is based on statistical approaches.


Types of Support Vector Machine Algorithms
1. Linear SVM
When the data is perfectly linearly separable only then we can use Linear SVM. Perfectly linearly

separable means that the data points can be classified into 2 classes by using a single straight line(if

2D).
2. Non-Linear SVM
When the data is not linearly separable then we can use Non-Linear SVM, which means when the

data points cannot be separated into 2 classes by using a straight line (if 2D) then we use some

advanced techniques like kernel tricks to classify them. In most real-world applications we do not

find linearly separable datapoints hence we use kernel trick to solve them.
Important Terms
Now let’s define two main terms which will be repeated again and again in this article:

 Support Vectors: These are the points that are closest to the hyperplane. A separating line

will be defined with the help of these data points.

11
 Margin: it is the distance between the hyperplane and the observations closest to the

hyperplane (support vectors). In SVM large margin is considered a good margin. There are

two types of margins hard margin and soft margin. I will talk more about these two in the

later section.

Lab Task:
Implement Support Vector Machine (SVM) algorithm for the Iris dataset. The implementation
should involve the following steps:
1. The SVM algorithm is imported from the scikit-learn package.
2. Create feature and target variables.
3. Split data into training and test data.
4. Generate a SVM values.
5. Train or fit the data into the model.
6. Predict the future.

12
Lab No. 05
K-means
K-means is an unsupervised learning method for clustering data points. The algorithm iteratively
divides data points into K clusters by minimizing the variance in each cluster. Here, we will show
you how to estimate the best value for K using the elbow method, then use K-means clustering to
group the data points into clusters.

How does it work?


First, each data point is randomly assigned to one of the K clusters. Then, we compute the centroid
(functionally the center) of each cluster and reassign each data point to the cluster with the closest
centroid. We repeat this process until the cluster assignments for each data point are no longer
changing.
K-means clustering requires us to select K, the number of clusters we want to group the data into.
The elbow method lets us graph the inertia (a distance-based metric) and visualize the point at
which it starts decreasing linearly. This point is referred to as the "elbow" and is a good estimate
for the best value for K based on our data.

Lab Task:
Implement k-Means algorithm for any dataset. The implementation should involve the following
steps:
1. The k-Means algorithm is imported from the scikit-learn package.
2. Make clusters.
3. Implement k-Means algorithm.
4. Visualize the final outcome.

13
Lab No. 06
K-Medoids
K-Medoids is an unsupervised learning method for clustering data points. The algorithm iteratively
divides data points into K clusters by minimizing the variance in each cluster. Here, we will show
you how to estimate the best value for K using the elbow method, then use K-means clustering to
group the data points into clusters.

How does it work?


First, each data point is randomly assigned to one of the K clusters. Then, we compute the centroid
(functionally the center) of each cluster and reassign each data point to the cluster with the closest
centroid. We repeat this process until the cluster assignments for each data point are no longer
changing.
K-Medoids clustering requires us to select K, the number of clusters we want to group the data
into. The elbow method lets us graph the inertia (a distance-based metric) and visualize the point
at which it starts decreasing linearly. This point is referred to as the "elbow" and is a good estimate
for the best value for K based on our data.

Lab Task:
Implement k-Medoids algorithm for any dataset. The implementation should involve the following
steps:
1. Import all the relevant libraries.
2. Make a k-Medoid class.
3. For a set of random number implement k-Medoid algorithm.
4. Visualize the final outcome.

14
Lab No. 07
Introduction
In data mining and statistics, hierarchical clustering analysis is a method of clustering analysis that
seeks to build a hierarchy of clusters i.e. tree-type structure based on the hierarchy.
In machine learning, clustering is the unsupervised learning technique that groups the data based
on similarity between the set of data. There are different-different types of clustering algorithms
in machine learning. Connectivity-based clustering: This type of clustering algorithm builds the
cluster based on the connectivity between the data points. Example: Hierarchical clustering
Centroid-based clustering: This type of clustering algorithm forms around the centroids of the data
points. Example: K-Means clustering, K-Mode clustering
Distribution-based clustering: This type of clustering algorithm is modeled using statistical
distributions. It assumes that the data points in a cluster are generated from a particular probability
distribution, and the algorithm aims to estimate the parameters of the distribution to group similar
data points into clusters Example: Gaussian Mixture Models (GMM)
Density-based clustering: This type of clustering algorithm groups together data points that are in
high-density concentrations and separates points in low-concentrations regions. The basic idea is
that it identifies regions in the data space that have a high density of data points and groups those
points together into clusters. Example: DBSCAN(Density-Based Spatial Clustering of
Applications with Noise)

Hierarchical Agglomerative Clustering


It is also known as the bottom-up approach or hierarchical agglomerative clustering (HAC). A
structure that is more informative than the unstructured set of clusters returned by flat clustering.
This clustering algorithm does not require us to prespecify the number of clusters. Bottom-up
algorithms treat each data as a singleton cluster at the outset and then successively agglomerate
pairs of clusters until all clusters have been merged into a single cluster that contains all data.

Lab Task:
Implement Agglomerative Hierarchical Clustering algorithm for any dataset. The implementation
should involve the following steps:
1. Import all the relevant libraries.
2. Generate a random dataset.
3. Decide the number of clusters.
4. Deploy agglomerative hierarchical clustering algorithm.
5. Print the class labels.

15
Lab No. 08
Regular Expression

Lab Task:
Write a program to look for lines of the form “New Revision: 39772” for the file mbox.txt. The
text file is uploaded on MS Team along with the task.

16
Lab No. 09
Text Processing:
Whenever we have textual data, we need to apply several pre-processing steps to the data to transform
words into numerical features that work with machine learning algorithms. The pre-processing steps for a
problem depend mainly on the domain and the problem itself, hence, we don’t need to apply all steps to
every problem. We will be using the NLTK (Natural Language Toolkit) library here.

# import the necessary libraries


import nltk
import string
import re

def text_lowercase(text):
return text.lower()

input_str = "Hey, did you know that the summer break is coming? Amazing right !! It's only 5 more days
!!"
text_lowercase(input_str)

Example:
Input: “Hey, did you know that the summer break is coming? Amazing right!! It’s only 5 more days!!”
Output: “hey, did you know that the summer break is coming? amazing right!! it’s only 5 more days!!”

Remove numbers:
We can either remove numbers or convert the numbers into their textual representations. We can use
regular expressions to remove the numbers.
# Remove numbers
def remove_numbers(text):
result = re.sub(r'\d+', '', text)
return result

input_str = "There are 3 balls in this bag, and 12 in the other one."
remove_numbers(input_str)

Lab Task:
Convert the numbers into words:
Input: “There are 3 balls in this bag, and 12 in the other one.”
Output: ‘There are balls in this bag, and in the other one.’

17

You might also like