Lab Manual
Lab Manual
B.TECH
INFORMATION TECHNOLOGY
Artificial Intelligence with concepts of
Machine Learning & Deep Learning
(1010103322)
College Name:
Name of Student:
Roll No:
Semester: Division:
INFORMATION TECHNOLOGY
VISION
To establish the department as a pioneer in adaptive technical education and research in
Information Technology, cultivating an environment of innovation, entrepreneurship and
lifelong learning to address the dynamic needs of the digital era.
MISSION
1. To provide adaptive technical education in Information Technology, empowering
students with skills to meet the dynamic challenges of the digital era.
2. To foster innovation and entrepreneurial spirit through research-driven learning,
preparing graduates for leadership in technology and society.
3. To cultivate a culture of lifelong learning and collaboration, addressing
multidisciplinary problems with sustainable and impactful IT solutions.
Course Outcomes (CO):
CO1: Understanding of AI concepts, tools,and problem-solving. Also get knowledge about
data Science Programming, Utilizes Python toolkits for data visualization, manipulation,
and techniques like web scraping and API usage.
CO-PO-Matrix:
CO No. PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO- 1 2 2 3 1 1 1 1 2 2 3 2
CO- 2 1 1 1 2 1 1 2 3 1
CO- 3 1 2 2 1 1 2 2
BONAFIDE CERTIFICATE
This is to certify that Mr./Ms ................................................................. with Roll No.
................................. from Semester ……… Division .……. has successfully completed his/her
laboratory experiments in Artificial Intelligence with concepts of Machine Learning & Deep
Learning (1010103322) from the department of ......................................... during the academic year
.................
Page No
Sr. Date of Date of Marks
Experiment Title Sign
No Start Completion (out of 10)
From To
Implement RandomForest to
10 classify the iris data set. Print both 47 52
correct and wrong predictions.
Python, as a versatile and widely-used programming language, offers a vast ecosystem of libraries and
frameworks to address various domains of computing. Libraries in Python are collections of pre-written
code that developers can use to simplify and speed up their programming tasks. They provide
functionalities ranging from numerical computations and data analysis to web development, machine
learning, and more.
Modularity: Libraries are modular, allowing developers to import only the functions and classes they
need.
Reusability: Instead of rewriting common functionalities, libraries enable code reuse, saving time and
effort.
Domain-Specific Solutions: Many libraries are tailored for specific applications, such as NumPy for
numerical computations or Pandas for data analysis.
Community Support: Python libraries are often open-source, supported by large communities of
developers who contribute to their growth and maintenance.
Understanding Capabilities: Each library has unique features tailored for specific tasks, and exploring
them helps developers choose the right tools.
Efficiency: Familiarity with libraries reduces the need to build solutions from scratch.
Problem Solving: Libraries often address complex problems through well-tested algorithms and methods.
1
Program Code:
2
Output:
3
4
Post Practical Questions:
3. Which Python library is most commonly used for creating machine learning models?
A. TensorFlow
B. NumPy
C. Pandas
D. Matplotlib
Conclusion:
Marks out of 10
5
Date: ____________
PRACTICAL-2
AIM:Utilize Matplotlib to create bar charts, line charts, and scatter plots for effective data
representation.
Theory: Data visualization is a fundamental step in data analysis, allowing users to gain insights,
identify patterns, and communicate findings effectively. Matplotlib, one of Python's most widely
used libraries, provides powerful tools for creating a variety of visualizations.
Core Functions:
pyplot: A submodule of Matplotlib that provides a MATLAB-like interface for creating plots.
Customization: Options to set titles, labels, legends, and colors.
6
Program Code:
7
Output:
8
9
Post Practical Questions:
Conclusion:
Marks out of 10
10
Date: ____________
PRACTICAL-3
Theory:
Data manipulation is an essential step in data analysis, enabling efficient handling,
transformation, and preparation of data for further analysis or visualization. Python toolkits like
NumPy and Pandas are powerful libraries designed to simplify these tasks.
11
Program Code:
12
Output:
13
14
Post Practical Questions:
Conclusion:
Marks out of 10
15
Date: ____________
PRACTICAL-4
Theory:
Principal Component Analysis (PCA) is one of the most commonly used techniques for dimensionality
reduction, especially for linearly separable data.
PCA is a statistical method that transforms a dataset into a new coordinate system by identifying the
directions (principal components) of maximum variance in the data. The goal is to project the data into a
lower-dimensional space while preserving the essential patterns.
Scale the data to have a mean of zero and a standard deviation of one. This ensures that features with
larger scales do not dominate the PCA.
where 𝑥 is the data point, 𝜇 is the mean, and 𝜎 is the standard deviation.
Covariance Matrix:
Compute the covariance matrix of the data to understand the relationships between features. Covariance
measures how changes in one variable are associated with changes in another.
16
Eigenvalues and Eigenvectors:
Compute the eigenvalues and eigenvectors of the covariance matrix. Eigenvectors represent the
directions (principal components) of the new feature space. Eigenvalues indicate the magnitude of
variance captured by each principal component.
Dimensionality Reduction:
Sort the eigenvectors by their corresponding eigenvalues in descending order. Select the top 𝑘
eigenvectors (based on the desired number of dimensions) to form a projection matrix. Project the
original data onto the new k-dimensional space.
Program Code:
17
Output:
18
19
Post Practical Questions:
Conclusion:
Marks out of 10
20
Date: ____________
PRACTICAL-5
21
Methods like Non-Negative Matrix Factorization (NMF) are used to approximate the matrix with
reduced dimensions.
5. Random Projection
Projects the data into a lower-dimensional space using random linear transformations.
Retains pairwise distances between points, making it computationally efficient.
Applications
Natural Language Processing:
Reducing the dimensionality of term-document matrices (TF-IDF).
Recommender Systems:
Decomposing user-item matrices to discover latent factors.
Image Processing:
Simplifying large, sparse feature matrices in image recognition tasks.
Program Code:
22
Output:
23
24
25
Post Practical Questions:
Conclusion:
Marks out of 10
26
Date: ____________
PRACTICAL-6
Theory:
Linear regression is a statistical method used to model the relationship between a dependent
variable and one or more independent variables. The goal is to find the best-fitting straight line
through the data points that represents this relationship.
y=mx+c
Where:
y is the dependent variable (output).
x is the independent variable (input).
m is the slope (how y changes with 𝑥).
b is the intercept (value of y when x=0).
Key Points:
Assumptions: The relationship is linear, errors are normally distributed, and variance of errors is
constant.
Objective: Minimize the sum of squared errors to find the best slope and intercept.
Metrics: Performance is evaluated using metrics like R-squared, Mean Squared Error (MSE), etc.
Applications: Used for predicting continuous outcomes in fields like economics, healthcare, and
engineering.
Program Code:
27
28
Output:
29
30
Post Practical Questions:
Conclusion:
Marks out of 10
31
Date: ____________
PRACTICAL-7
Theory:
Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class
0. It’s referred to as regression because it is the extension of linear regression but is mainly used
for classification problems.
Key Points:
● Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
● It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
● In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Logistic Function – Sigmoid Function
● The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
● It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
● The S-form curve is called the Sigmoid function or the logistic function.
● In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
● Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
● Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
● Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.
32
Program Code:
33
Output:
34
35
Post Practical Questions:
1. Logistic regression is best suited for which type of problem?
A. Regression
B. Clustering
C. Classification
D. Dimensionality reduction
2. Which of the following is the output of logistic regression?
A. A probability value
B. A categorical variable
C. A linear equation
D. A continuous value
3. What is the range of the sigmoid function used in logistic regression?
A. -1 to 1
B. 0 to 1
C. -∞ to ∞
D. 0 to ∞
4. Which performance metric is commonly used to evaluate logistic regression?
A. Mean Squared Error (MSE)
B. R-Squared
C. Accuracy or ROC-AUC
D. Mean Absolute Error (MAE)
5. Which Python function is used for logistic regression in scikit-learn?
A. LinearRegression()
B. LogisticRegression()
C. DecisionTreeClassifier()
D. KNeighborsClassifier()
Conclusion:
Marks out of 10
36
Date: ____________
PRACTICAL-8
AIM: Implement the Naïve Bayesian classifier for sample training dataset stored as a .CSV
file. Compute The Accuracy Of The Classifier, considering few test datasets.
Theory:
Naive Bayes classifiers are supervised machine learning algorithms used for classification tasks, based on
Bayes’ Theorem to find probabilities. This article will give you an overview as well as more advanced
use and implementation of Naive Bayes in machine learning.
37
Program Code:
38
Output:
39
40
Post Practical Questions:
1. What assumption does the Naïve Bayes classifier make about the features?
A. They are dependent on each other.
B. They are independent of each other.
C. They follow a linear relationship.
D. They are mutually exclusive.
2. Which type of data is Naïve Bayes most commonly used for?
A. Continuous data
B. Text or categorical data
C. Image data
D. Audio data
3. Which Python library provides the Naïve Bayes implementation?
A. NumPy
B. scikit-learn
C. Pandas
D. TensorFlow
4. What is computed by the Naïve Bayes classifier during classification?
A. The highest probability class
B. The lowest probability class
C. The sum of probabilities of all classes
D. The average of probabilities of all classes
5. Which performance metric is best suited to evaluate a Naïve Bayes classifier?
A. Mean Squared Error
B. Confusion Matrix
C. R-Squared Value
D. Gradient Descent
Conclusion:
Marks out of 10
41
Date: ____________
PRACTICAL-9
42
Program Code:
43
Output:
44
45
Post Practical Questions:
1. What does SVM stand for?
A. Supervised Vector Machine
B. Support Vector Machine
C. Statistical Vector Model
D. Sequential Vector Mapping
2. What is the purpose of the kernel in an SVM?
A. To add bias
B. To perform non-linear transformations
C. To calculate accuracy
D. To compute gradients
3. Which kernel is most commonly used in SVM for non-linear classification?
A. Linear
B. Polynomial
C. Radial Basis Function (RBF)
D. Sigmoid
4. In SVM, what are support vectors?
A. Data points that do not affect the decision boundary
B. Data points closest to the decision boundary
C. Data points farthest from the decision boundary
D. Randomly selected data points
5. Which library in Python is commonly used to implement SVM?
A. scikit-learn
B. TensorFlow
C. Pandas
D. NumPy
Conclusion:
Marks out of 10
46
Date: ____________
PRACTICAL-10
AIM: Implement RandomForest to classify the iris data set. Print both correct and wrong
predictions.
Theory:
Random Forest algorithm is a powerful tree learning technique in Machine Learning to make predictions
and then we do voting of all the tress to make prediction. They are widely used for classification and
regression task.
● It is a type of classifier that uses many decision trees to make predictions.
● It takes different random parts of the dataset to train each tree and then it combines the results by
averaging them. This approach helps improve the accuracy of predictions. Random Forest is
based on ensemble learning.
Imagine asking a group of friends for advice on where to go for vacation. Each friend gives their
recommendation based on their unique perspective and preferences (decision trees trained on different
subsets of data). You then make your final decision by considering the majority opinion or averaging
their suggestions (ensemble prediction).
47
As explained in image: Process starts with a dataset with rows and their corresponding class labels
(columns).
Then - Multiple Decision Trees are created from the training data. Each tree is trained on a random subset
of the data (with replacement) and a random subset of features. This process is known as bagging or
bootstrap aggregating.
Each Decision Tree in the ensemble learns to make predictions independently.
When presented with a new, unseen instance, each Decision Tree in the ensemble makes a prediction.
The final prediction is made by combining the predictions of all the Decision Trees. This is typically
done through a majority vote (for classification) or averaging (for regression).
48
Program Code:
49
Output:
50
51
Post Practical Questions:
1. Random Forest is an ensemble method based on:
A. Decision Trees
B. Neural Networks
C. Support Vector Machines
D. Clustering Algorithms
2. What does the parameter n_estimators specify in Random Forest?
A. Maximum depth of trees
B. Number of trees in the forest
C. Minimum samples per split
D. Learning rate
3. Random Forest reduces overfitting by:
A. Using regularization
B. Combining multiple decision trees
C. Applying a sigmoid activation function
D. Increasing tree depth
4. Which metric is NOT used to evaluate a Random Forest classifier?
A. Accuracy
B. F1 Score
C. ROC-AUC
D. Gradient Descent
Answer: D
5. What is the purpose of random sampling in Random Forest?
A. To reduce computation time
B. To ensure diversity among the trees
C. To improve the learning rate
D. To eliminate outliers
Conclusion:
Marks out of 10
52
Date: ____________
PRACTICAL-11
AIM: Implement K means Clustering Algorithm Dataset. Visualize their results for the
same.
Theory:
What is K-means Clustering?
Unsupervised Machine Learning is the process of teaching a computer to use unlabeled,
unclassified data and enabling the algorithm to operate on that data without supervision. Without
any previous data training, the machine’s job in this case is to organize unsorted data according to
parallels, patterns, and variations.
K means clustering, assigns data points to one of the K clusters depending on their distance from
the center of the clusters. It starts by randomly assigning the clusters centroid in the space. Then
each data point assign to one of the cluster based on its distance from centroid of the cluster. After
assigning each point to one of the cluster, new cluster centroids are assigned. This process runs
iteratively until it finds good cluster. In the analysis we assume that number of cluster is given in
advanced and we have to put points in one of the group.
In some cases, K is not clearly defined, and we have to think about the optimal number of K. K
Means clustering performs best data is well separated. When data points overlapped this
clustering is not suitable. K Means is faster as compare to other clustering technique. It provides
strong coupling between the data points. K Means cluster do not provide clear information
regarding the quality of clusters. Different initial assignment of cluster centroid may lead to
different clusters. Also, K Means algorithm is sensitive to noise. It may have stuck in local
minima.
53
(It will help if you think of items as points in an n-dimensional space). The algorithm will
categorize the items into k groups or clusters of similarity. To calculate that similarity, we will use
the Euclidean distance as a measurement.
Program Code:
54
55
Output:
56
57
Post Practical Questions:
1. k-Means clustering is an example of which type of machine learning?
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Semi-supervised learning
2. In k-Means clustering, what does the value of "k" represent?
A. Number of iterations
B. Number of clusters
C. Number of features
D. Number of outliers
3. What is the primary objective of the k-Means algorithm?
A. Minimize the sum of squared distances within clusters
B. Maximize the number of clusters
C. Minimize the dimensionality of the data
D. Maximize inter-cluster similarity
4. Which visualization method is most commonly used to plot the results of k-Means
clustering?
A. Line chart
B. Scatter plot
C. Histogram
D. Bar chart
5.Which Python function is used to perform k-Means clustering in scikit-learn?
A. KMeans()
B. DBSCAN()
C. AgglomerativeClustering()
D. LinearRegression()
Conclusion:
Marks out of 10
58
Date: ____________
PRACTICAL-12
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural
Networks are used in various datasets like images, audio, and text. Different types of Neural
Networks are used for different purposes, for example for predicting the sequence of words we
use Recurrent Neural Networks more precisely an LSTM, similarly for image classification we
use Convolution Neural networks. In this blog, we are going to build a basic building block for
CNN.
● Input Layers: It’s the layer in which we give input to our model. The number of neurons in
this layer is equal to the total number of features in our data (number of pixels in the case
of an image).
● Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can
be many hidden layers depending on our model and data size. Each hidden layer can have
different numbers of neurons which are generally greater than the number of features. The
output from each layer is computed by matrix multiplication of the output of the previous
layer with learnable weights of that layer and then by the addition of learnable biases
followed by activation function which makes the network nonlinear.
● Output Layer: The output from the hidden layer is then fed into a logistic function like
sigmoid or softmax which converts the output of each class into the probability score of
each class.
The data is fed into the model and output from each layer is obtained from the above step is called
feedforward, we then calculate the error using an error function, some common error functions are
cross-entropy, square loss error, etc. The error function measures how well the network is
performing. After that, we backpropagate into the model by calculating the derivatives. This step
is called Backpropagation which basically is used to minimize the loss.
59
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial neural networks
(ANN) which is predominantly used to extract the feature from the grid-like matrix dataset. For
example visual datasets like images or videos where data patterns play an extensive role.
CNN Architecture
The Convolutional layer applies filters to the input image to extract features, the Pooling layer
downsamples the image to reduce computation, and the fully connected layer makes the final prediction.
The network learns the optimal filters through backpropagation and gradient descent.
Program Code:
60
61
Output:
62
63
Post Practical Questions:
1. What does CNN stand for?
A. Convolutional Neural Network
B. Continuous Neural Network
C. Computational Neural Network
D. Connected Neural Network
2. What is the primary operation in CNNs used to extract features from an image?
A. Pooling
B. Convolution
C. Flattening
D. Fully connected layers
3. What is the purpose of pooling layers in CNNs?
A. To increase the dimensionality of the feature map
B. To reduce the dimensionality of the feature map
C. To apply non-linearity to the data
D. To train the CNN model
4. Which activation function is commonly used in CNNs?
A. Sigmoid
B. ReLU (Rectified Linear Unit)
C. Tanh
D. Linear
5. Which library in Python provides functions to implement CNN architectures easily?
A. NumPy
B. TensorFlow/Keras
C. Matplotlib
D. Pandas
Conclusion:
Marks out of 10
64
Date: ____________
PRACTICAL-13
Generative Adversarial Networks (GANs) can be broken down into three parts:
● Generative: To learn a generative model, which describes how data is generated in terms of a
probabilistic model.
● Adversarial: The word adversarial refers to setting one thing up against another. This means
that, in the context of GANs, the generative result is compared with the actual images in the data
set. A mechanism known as a discriminator is used to apply a model that attempts to distinguish
between real and fake images.
● Networks: Use deep neural networks as artificial intelligence (AI) algorithms for training
purposes.
Types of GANs
1. Vanilla GAN
2. Conditional GAN (CGAN)
3. Deep Convolutional GAN (DCGAN)
4. Laplacian Pyramid GAN (LAPGAN)
5. Super Resolution GAN (SRGAN)
65
Program Code:
66
67
Output:
68
69
Post Practical Questions:
1. What does GAN stand for?
A. Generative Adversarial Network
B. Generalized Artificial Network
C. Gradient-Assisted Network
D. Generative Activation Network
2. A GAN consists of which two main components?
A. Generator and Transformer
B. Generator and Discriminator
C. Generator and Regressor
D. Generator and Optimizer
3. What is the primary objective of the generator in a GAN?
A. To classify input data
B. To generate data similar to the training data
C. To minimize the loss of the discriminator
D. To reduce dimensionality
4. What type of loss function is typically used in GANs?
A. Mean Squared Error
B. Cross-Entropy Loss
C. Hinge Loss
D. Binary Cross-Entropy Loss
5. What is one common application of GANs?
A. Data clustering
B. Image generation
C. Feature extraction
D. Regression analysis
Conclusion:
Marks out of 10
70
Date: ____________
PRACTICAL-14
71
72
73
Post Practical Questions:
1. Which of the following is a current trend in AI?
A. Symbolic AI
B. Foundation Models like GPT
C. Rule-Based Systems
D. Expert Systems
2. What is the primary purpose of transformers in modern AI?
A. Image classification
B. Natural Language Processing
C. Regression modeling
D. Clustering data
3. Which technology is driving advancements in generative AI?
A. GANs and Transformers
B. SVM and Random Forest
C. PCA and Clustering
D. CNN and RNN
4. Which term refers to AI systems that can generate human-like text, code, or images?
A. Discriminative AI
B. Generative AI
C. Predictive AI
D. Clustering AI
5. What is a significant ethical concern associated with recent advancements in AI?
A. Data clustering
B. Model overfitting
C. Bias in AI algorithms
D. Inability to train deep learning models
Conclusion:
Marks out of 10
74