0% found this document useful (0 votes)
30 views80 pages

Lab Manual

The document outlines the vision and mission of the Information Technology department, focusing on adaptive technical education and research in AI, Machine Learning, and Deep Learning. It details course outcomes, practical experiments, and the importance of Python libraries for data manipulation and visualization. Additionally, it emphasizes the significance of dimensionality reduction techniques like PCA and Truncated SVD in handling high-dimensional data.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views80 pages

Lab Manual

The document outlines the vision and mission of the Information Technology department, focusing on adaptive technical education and research in AI, Machine Learning, and Deep Learning. It details course outcomes, practical experiments, and the importance of Python libraries for data manipulation and visualization. Additionally, it emphasizes the significance of dimensionality reduction techniques like PCA and Truncated SVD in handling high-dimensional data.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

SCHOOL OF TECHNOLOGY, DESIGN

AND COMPUTER APPLICATION


COLLEGE OF TECHNOLOGY

B.TECH
INFORMATION TECHNOLOGY
Artificial Intelligence with concepts of
Machine Learning & Deep Learning
(1010103322)

College Name:
Name of Student:
Roll No:
Semester: Division:
INFORMATION TECHNOLOGY

VISION
To establish the department as a pioneer in adaptive technical education and research in
Information Technology, cultivating an environment of innovation, entrepreneurship and
lifelong learning to address the dynamic needs of the digital era.

MISSION
1.​ To provide adaptive technical education in Information Technology, empowering
students with skills to meet the dynamic challenges of the digital era.
2.​ To foster innovation and entrepreneurial spirit through research-driven learning,
preparing graduates for leadership in technology and society.
3.​ To cultivate a culture of lifelong learning and collaboration, addressing
multidisciplinary problems with sustainable and impactful IT solutions.
Course Outcomes (CO):
CO1: Understanding of AI concepts, tools,and problem-solving. Also get knowledge about
data Science Programming, Utilizes Python toolkits for data visualization, manipulation,
and techniques like web scraping and API usage.

CO2: Students will know about supervised,unsupervised,and reinforcement learning,


applying classification and regression algorithms. They'll also gain proficiency in deep
learning, neural network design, and evaluation metrics (accuracy, precision, recall) across
various neural network types (CNN, RNN, LSTM, BLSTM,Feedforward).
CO3: Get to learn about different NNs and also about CNN. They get to know about recent
trends of AI.

CO-PO-Matrix:

CO No. PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2

CO- 1 2 2 3 1 1 1 1 2 2 3 2

CO- 2 1 1 1 2 1 1 2 3 1

CO- 3 1 2 2 1 1 2 2
BONAFIDE CERTIFICATE
This is to certify that Mr./Ms ................................................................. with Roll No.
................................. from Semester ……… Division .……. has successfully completed his/her
laboratory experiments in Artificial Intelligence with concepts of Machine Learning & Deep
Learning (1010103322) from the department of ......................................... during the academic year
.................

Staff in charge: ...........................​ Head of Department: .............................

Date of Examination: ...........................

Internal Examiner: ...........................​ External Examiner: .............................


TABLE OF CONTENT

Page No
Sr. Date of Date of Marks
Experiment Title Sign
No Start Completion (out of 10)
From To

Explore Following Python


1 1 5
Libraries.

Utilize Matplotlib to create bar


charts, line charts, and scatter
2 6 10
plots for effective data
representation.
Explore practicals on
3 manipulating data using Python 11 15
Toolkits (NumPy,Pandas).
Using PCA reduces the
4 dimensions of linearly separable 16 20
data.
Write a program for reducing the
5 dimensionality sparse feature 21 26
matrices.

Perform Linear Regression


6 27 31
sample dataset.

Perform Logistic Regression


7 32 36
sample dataset.

Implement the Naïve Bayesian


classifier for sample training
dataset stored as a .CSV file.
8 Compute The Accuracy Of The 37 41
Classifier, considering few test
datasets.
Implement SVM algorithm to
9 42 46
classify the iris dataset.

Implement RandomForest to
10 classify the iris data set. Print both 47 52
correct and wrong predictions.

Implement means Clustering


11 Algorithm Dataset. Visualize their 53 58
results for the same.

Case Study about CNNs


12 Implement anyone architecture of 59 64
CNN.

Case study about GANs


13 65 70
Implement anyone GAN.

Do a case study on any recent


14 71 74
trends in AI.
Date: ____________
PRACTICAL-1
AIM: Explore Following Python Libraries. (NumPy, Pandas, Matplotlib, SciPy, Scikit-learn,
TensorFlow, Keras)

Python, as a versatile and widely-used programming language, offers a vast ecosystem of libraries and
frameworks to address various domains of computing. Libraries in Python are collections of pre-written
code that developers can use to simplify and speed up their programming tasks. They provide
functionalities ranging from numerical computations and data analysis to web development, machine
learning, and more.

Key Aspects of Python Libraries:

Modularity: Libraries are modular, allowing developers to import only the functions and classes they
need.

Reusability: Instead of rewriting common functionalities, libraries enable code reuse, saving time and
effort.

Domain-Specific Solutions: Many libraries are tailored for specific applications, such as NumPy for
numerical computations or Pandas for data analysis.

Community Support: Python libraries are often open-source, supported by large communities of
developers who contribute to their growth and maintenance.

Importance of Exploring Libraries:

Understanding Capabilities: Each library has unique features tailored for specific tasks, and exploring
them helps developers choose the right tools.

Efficiency: Familiarity with libraries reduces the need to build solutions from scratch.

Problem Solving: Libraries often address complex problems through well-tested algorithms and methods.

1
Program Code:

2
Output:

3
4
Post Practical Questions:

1. Which of the following is a Python library used for numerical computation?


A. NumPy
B. Matplotlib
C. Pandas
D. Scikit-learn

2. What is the primary purpose of the Pandas library in Python?


A. Image processing
B. Data manipulation and analysis
C. Data visualization
D. Machine learning

3. Which Python library is most commonly used for creating machine learning models?
A. TensorFlow
B. NumPy
C. Pandas
D. Matplotlib

4. What does the .shape attribute in a Pandas DataFrame return?


A. The datatype of the DataFrame
B. The dimensions of the DataFrame
C. The column names of the DataFrame
D. The first few rows of the DataFrame

5. Which of the following is NOT a feature of the NumPy library?


A. Multidimensional arrays
B. Linear algebra functions
C. Interactive plotting
D. Fourier transforms

Conclusion:

Signature with Date of


Completion

Marks out of 10

5
Date: ____________
PRACTICAL-2

AIM:Utilize Matplotlib to create bar charts, line charts, and scatter plots for effective data
representation.
Theory: Data visualization is a fundamental step in data analysis, allowing users to gain insights,
identify patterns, and communicate findings effectively. Matplotlib, one of Python's most widely
used libraries, provides powerful tools for creating a variety of visualizations.

Importance of Data Visualization


Simplifying Complex Data: Visualizations make it easier to understand large datasets by
presenting information in a graphical format.
Pattern Recognition: Trends, correlations, and outliers become apparent through visual analysis.
Effective Communication: Graphs and charts provide a universal language to convey data-driven
insights to diverse audiences.
Overview of Matplotlib
Matplotlib is a versatile library for creating static, interactive, and dynamic visualizations in
Python. It is highly customizable, allowing users to adjust almost every aspect of a plot to suit
their needs.

Key Components of Matplotlib:


Figure: The entire canvas where plots are drawn.
Axes: Individual plotting areas within a figure.
Plot: The actual graph or chart drawn on the axes.
Styles: Predefined styles that can be applied to enhance the appearance of plots.

Core Functions:
pyplot: A submodule of Matplotlib that provides a MATLAB-like interface for creating plots.
Customization: Options to set titles, labels, legends, and colors.

6
Program Code:

7
Output:

8
9
Post Practical Questions:

1. Which function in Matplotlib is used to create a line plot?


A. plt.bar()
B. plt.plot()
C. plt.scatter()
D. plt.pie()

2. To display a bar chart, which function should you use in Matplotlib?


A. plt.plot()
B. plt.scatter()
C. plt.bar()
D. plt.show()

3. What does the plt.xlabel() function do in a Matplotlib plot?


A. Adds a title to the chart
B. Labels the x-axis
C. Labels the y-axis
D. Adds a legend to the chart

4. In Matplotlib, what is the purpose of the plt.scatter() function?


A. To draw a line chart
B. To plot points on a scatter plot
C. To create a pie chart
D. To plot categorical data

5. Which method is used to display a chart after creating it in Matplotlib?


A. plt.show()
B. plt.display()
C. plt.run()
D. plt.print()

Conclusion:

Signature with Date of


Completion

Marks out of 10

10
Date: ____________
PRACTICAL-3

AIM: Explore practicals on manipulating data using Python Toolkits (NumPy,Pandas).

Theory:
Data manipulation is an essential step in data analysis, enabling efficient handling,
transformation, and preparation of data for further analysis or visualization. Python toolkits like
NumPy and Pandas are powerful libraries designed to simplify these tasks.

NumPy (Numerical Python)


NumPy is a fundamental library for numerical computing in Python. It provides support for large,
multi-dimensional arrays and matrices, along with a collection of high-level mathematical
functions to operate on these arrays.

Key Features of NumPy:


Ndarray: The core data structure, ndarray, allows for efficient storage and manipulation of
numerical data.
Vectorization: Enables mathematical operations to be applied directly to entire arrays, avoiding
the need for explicit loops.
Broadcasting: Facilitates operations on arrays of different shapes.
Mathematical Functions: Includes linear algebra, Fourier transforms, random number generation,
and more.

Pandas (Panel Data)


Pandas is a high-level library designed for data manipulation and analysis. It introduces two
primary data structures:

Series: A one-dimensional array with labeled indices.


DataFrame: A two-dimensional labeled data structure akin to a spreadsheet or SQL table.
Key Features of Pandas:
Flexible Indexing: Allows indexing by row labels or numbers.
Data Cleaning: Facilitates handling of missing data and duplicates.
Data Transformation: Supports operations like merging, grouping, and reshaping.
Integration: Works seamlessly with other libraries like Matplotlib and NumPy.

Applications of Data Manipulation


Data Cleaning: Handling inconsistent or missing data.
Feature Engineering: Transforming raw data into features suitable for machine learning.
Exploratory Data Analysis (EDA): Gaining insights through aggregation, filtering, and
visualization.
Data Integration: Combining datasets from various sources.

11
Program Code:

12
Output:

13
14
Post Practical Questions:

1. Which of the following is used to create a DataFrame in Pandas?


A. pd.DataFrame()
B. pd.Series()
C. np.array()
D. pd.plot()
2. What does the function np.mean() calculate?
A. Median of an array
B. Mean of an array
C. Mode of an array
D. Variance of an array
3. In Pandas, what does the .iloc[] function do?
A. Access rows and columns by labels
B. Access rows and columns by integer indices
C. Drop columns from the DataFrame
D. Add new rows to the DataFrame
4. What is the output of df.describe() in Pandas?
A. A plot of the DataFrame
B. A summary of statistics for numerical columns
C. A list of column names
D. A histogram of numerical columns
5. Which library is commonly used alongside Pandas for efficient numerical computations?
A. Matplotlib
B. NumPy
C. TensorFlow
D. Scikit-learn

Conclusion:

Signature with Date of


Completion

Marks out of 10

15
Date: ____________
PRACTICAL-4

AIM:Using PCA reduces the dimensions of linearly separable data.

Theory:

Introduction to Dimensionality Reduction


Dimensionality reduction is a process used in data analysis to reduce the number of input variables
(features) in a dataset while retaining as much information as possible.
High-dimensional data often suffers from issues like:
Curse of Dimensionality: Increased computational complexity and difficulty in modeling.
Redundancy: Correlated features that do not add unique information.
Visualization Challenges: Difficulty in interpreting data with more than 3 dimensions.

Principal Component Analysis (PCA) is one of the most commonly used techniques for dimensionality
reduction, especially for linearly separable data.

PCA is a statistical method that transforms a dataset into a new coordinate system by identifying the
directions (principal components) of maximum variance in the data. The goal is to project the data into a
lower-dimensional space while preserving the essential patterns.

How PCA Works


Standardization:

Scale the data to have a mean of zero and a standard deviation of one. This ensures that features with
larger scales do not dominate the PCA.

Formula for standardization:

where 𝑥 is the data point, 𝜇 is the mean, and 𝜎 is the standard deviation.

Covariance Matrix:
Compute the covariance matrix of the data to understand the relationships between features. Covariance
measures how changes in one variable are associated with changes in another.

16
Eigenvalues and Eigenvectors:
Compute the eigenvalues and eigenvectors of the covariance matrix. Eigenvectors represent the
directions (principal components) of the new feature space. Eigenvalues indicate the magnitude of
variance captured by each principal component.
Dimensionality Reduction:
Sort the eigenvectors by their corresponding eigenvalues in descending order. Select the top 𝑘
eigenvectors (based on the desired number of dimensions) to form a projection matrix. Project the
original data onto the new k-dimensional space.
Program Code:

17
Output:

18
19
Post Practical Questions:

1.What does PCA stand for in machine learning?


A. Principal Cluster Analysis
B. Principal Component Analysis
C. Partial Component Analysis
D. Partition Component Analysis
2. PCA is primarily used for:
A. Data visualization
B. Dimensionality reduction
C. Feature scaling
D. Model training
3. Which of the following is maximized by PCA?
A. Variance
B. Mean
C. Correlation
D. Covariance
4. PCA transforms data into:
A. A higher-dimensional space
B. A lower-dimensional space
C. A non-linear space
D. A clustered space
5. The first principal component is the direction that:
A. Has the least variance
B. Maximizes variance
C. Minimizes correlation
D. Matches the mean of the data

Conclusion:

Signature with Date of


Completion

Marks out of 10

20
Date: ____________
PRACTICAL-5

AIM:Write a program for reducing the dimensionality sparse feature matrices.


Theory:
Sparse feature matrices are a common occurrence in data analysis, particularly in domains such as
text processing, recommender systems, and image data. A sparse matrix is one in which the
majority of elements are zero. While these matrices are efficient in terms of storage, their high
dimensionality can pose challenges for computational performance and model training.

Dimensionality reduction is a crucial preprocessing step to address these challenges. It simplifies


the dataset by reducing the number of features while retaining as much information as possible.

What are Sparse Matrices?


A sparse matrix is a representation of data where most of the elements are zeros. Examples
include:

Text Data: Represented as a term-document matrix in Natural Language Processing (NLP).


User Ratings: Sparse matrices in recommender systems.
Challenges of Sparse Matrices:
High memory usage when processed in dense format.
Increased computational complexity.
Potential for overfitting due to many irrelevant features.

Techniques for Reducing Dimensionality of Sparse Matrices


1. Principal Component Analysis (PCA)
PCA is used for reducing dimensions in dense matrices, but it can also handle sparse matrices
when combined with efficient implementations like Truncated SVD.
2. Truncated Singular Value Decomposition (Truncated SVD)
Specifically designed for sparse matrices.
Reduces dimensionality by decomposing the sparse matrix into lower-dimensional latent
components.
Retains the most significant singular values while discarding the rest.
Advantages of Truncated SVD:

Works directly with sparse matrices.


Preserves sparsity in the output.
3. Feature Selection
Techniques like removing low-variance features or selecting the top k features based on
importance can reduce the matrix size.
4. Matrix Factorization

21
Methods like Non-Negative Matrix Factorization (NMF) are used to approximate the matrix with
reduced dimensions.
5. Random Projection
Projects the data into a lower-dimensional space using random linear transformations.
Retains pairwise distances between points, making it computationally efficient.

Applications
Natural Language Processing:
Reducing the dimensionality of term-document matrices (TF-IDF).
Recommender Systems:
Decomposing user-item matrices to discover latent factors.
Image Processing:
Simplifying large, sparse feature matrices in image recognition tasks.

Program Code:

22
Output:

23
24
25
Post Practical Questions:

1. What is a sparse matrix?


A. A matrix with many non-zero elements
B. A matrix with many zero elements
C. A matrix with equal rows and columns
D. A matrix with no zero elements
2. Which library is commonly used to work with sparse matrices in Python?
A. NumPy
B. Scipy
C. Matplotlib
D. Pandas
3.Which of the following is an advantage of reducing the dimensionality of sparse matrices?
A. Increased memory usage
B. Reduced computational cost
C. Improved data redundancy
D. Higher dimensional feature space
4. Which technique is commonly used for dimensionality reduction?
A. Clustering
B. Regression
C. PCA
D. Classification
5.Sparse feature matrices are often used in:
A. Text data processing
B. Image segmentation
C. Dimensionality reduction
D. All of the above

Conclusion:

Signature with Date of


Completion

Marks out of 10

26
Date: ____________
PRACTICAL-6

AIM: Perform Linear Regression sample dataset.

Theory:
Linear regression is a statistical method used to model the relationship between a dependent
variable and one or more independent variables. The goal is to find the best-fitting straight line
through the data points that represents this relationship.

For simple linear regression, the model is represented as:

y=mx+c

Where:
y is the dependent variable (output).
x is the independent variable (input).
m is the slope (how y changes with 𝑥).
b is the intercept (value of y when x=0).

Key Points:
Assumptions: The relationship is linear, errors are normally distributed, and variance of errors is
constant.
Objective: Minimize the sum of squared errors to find the best slope and intercept.
Metrics: Performance is evaluated using metrics like R-squared, Mean Squared Error (MSE), etc.
Applications: Used for predicting continuous outcomes in fields like economics, healthcare, and
engineering.

Program Code:

27
28
Output:

29
30
Post Practical Questions:

1. Linear regression predicts which type of variable?


A. Categorical
B. Continuous
C. Binary
D. Ordinal
2. Which metric is commonly used to evaluate linear regression models?
A. Accuracy
B. Mean Squared Error (MSE)
C. F1 Score
D. Precision
3. In a simple linear regression model, the relationship is assumed to be:
A. Exponential
B. Quadratic
C. Linear
D. Non-linear
4. What does the coefficient in a linear regression model represent?
A. The error in the prediction
B. The weight of the input variable
C. The bias term
D. The residual
5. In Python, which library provides a function to perform linear regression?
A. Matplotlib
B. sklearn
C. NumPy
D. Pandas

Conclusion:

Signature with Date of


Completion

Marks out of 10

31
Date: ____________
PRACTICAL-7

AIM: Perform Logistic Regression sample dataset.

Theory:
Logistic regression is used for binary classification where we use sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.

For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class
0. It’s referred to as regression because it is the extension of linear regression but is mainly used
for classification problems.

Key Points:
●​ Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
●​ It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
●​ In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Logistic Function – Sigmoid Function
●​ The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
●​ It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
●​ The S-form curve is called the Sigmoid function or the logistic function.
●​ In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
●​ Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
●​ Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
●​ Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as “low”, “Medium”, or “High”.

32
Program Code:

33
Output:

34
35
Post Practical Questions:
1. Logistic regression is best suited for which type of problem?
A. Regression
B. Clustering
C. Classification
D. Dimensionality reduction
2. Which of the following is the output of logistic regression?
A. A probability value
B. A categorical variable
C. A linear equation
D. A continuous value
3. What is the range of the sigmoid function used in logistic regression?
A. -1 to 1
B. 0 to 1
C. -∞ to ∞
D. 0 to ∞
4. Which performance metric is commonly used to evaluate logistic regression?
A. Mean Squared Error (MSE)
B. R-Squared
C. Accuracy or ROC-AUC
D. Mean Absolute Error (MAE)
5. Which Python function is used for logistic regression in scikit-learn?
A. LinearRegression()
B. LogisticRegression()
C. DecisionTreeClassifier()
D. KNeighborsClassifier()

Conclusion:

Signature with Date of


Completion

Marks out of 10

36
Date: ____________
PRACTICAL-8

AIM: Implement the Naïve Bayesian classifier for sample training dataset stored as a .CSV
file. Compute The Accuracy Of The Classifier, considering few test datasets.

Theory:
Naive Bayes classifiers are supervised machine learning algorithms used for classification tasks, based on
Bayes’ Theorem to find probabilities. This article will give you an overview as well as more advanced
use and implementation of Naive Bayes in machine learning.

Key Features of Naive Bayes Classifiers


The main idea behind the Naive Bayes classifier is to use Bayes’ Theorem to classify data based on the
probabilities of different classes given the features of the data. It is used mostly in high-dimensional text
classification
●​ The Naive Bayes Classifier is a simple probabilistic classifier and it has very few number of
parameters which are used to build the ML models that can predict at a faster speed than other
classification algorithms.
●​ It is a probabilistic classifier because it assumes that one feature in the model is independent of
existence of another feature. In other words, each feature contributes to the predictions with no
relation between each other.
●​ Naïve Bayes Algorithm is used in spam filtration, Sentimental analysis, classifying articles and
many more.
Why it is Called Naive Bayes?
It is named as “Naive” because it assumes the presence of one feature does not affect other features.
The “Bayes” part of the name refers to for the basis in Bayes’ Theorem.

Assumption of Naive Bayes


The fundamental Naive Bayes assumption is that each feature makes an:
Feature independence: This means that when we are trying to classify something, we assume that each
feature (or piece of information) in the data does not affect any other feature.
Continuous features are normally distributed: If a feature is continuous, then it is assumed to be normally
distributed within each class.
Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a
multinomial distribution within each class.
Features are equally important: All features are assumed to contribute equally to the prediction of the
class label.
No missing data: The data should not contain any missing values.

37
Program Code:

38
Output:

39
40
Post Practical Questions:
1. What assumption does the Naïve Bayes classifier make about the features?
A. They are dependent on each other.
B. They are independent of each other.
C. They follow a linear relationship.
D. They are mutually exclusive.
2. Which type of data is Naïve Bayes most commonly used for?
A. Continuous data
B. Text or categorical data
C. Image data
D. Audio data
3. Which Python library provides the Naïve Bayes implementation?
A. NumPy
B. scikit-learn
C. Pandas
D. TensorFlow
4. What is computed by the Naïve Bayes classifier during classification?
A. The highest probability class
B. The lowest probability class
C. The sum of probabilities of all classes
D. The average of probabilities of all classes
5. Which performance metric is best suited to evaluate a Naïve Bayes classifier?
A. Mean Squared Error
B. Confusion Matrix
C. R-Squared Value
D. Gradient Descent

Conclusion:

Signature with Date of


Completion

Marks out of 10

41
Date: ____________
PRACTICAL-9

AIM: Implement SVM algorithm to classify the iris dataset.


Theory:
A Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification
and regression tasks. While it can be applied to regression problems, SVM is best suited for classification
tasks. The primary objective of the SVM algorithm is to identify the optimal hyperplane in an
N-dimensional space that can effectively separate data points into different classes in the feature space.
The algorithm ensures that the margin between the closest points of different classes, known as support
vectors, is maximized.
The dimension of the hyperplane depends on the number of features. For instance, if there are two input
features, the hyperplane is simply a line, and if there are three input features, the hyperplane becomes a
2-D plane. As the number of features increases beyond three, the complexity of visualizing the
hyperplane also increases.
Consider two independent variables, x1 and x2, and one dependent variable represented as either a blue
circle or a red circle.
●​ In this scenario, the hyperplane is a line because we are working with two features (x1 and x2).
●​ There are multiple lines (or hyperplanes) that can separate the data points.
●​ The challenge is to determine the best hyperplane that maximizes the separation margin between
the red and blue circles.

42
Program Code:

43
Output:

44
45
Post Practical Questions:
1. What does SVM stand for?
A. Supervised Vector Machine
B. Support Vector Machine
C. Statistical Vector Model
D. Sequential Vector Mapping
2. What is the purpose of the kernel in an SVM?
A. To add bias
B. To perform non-linear transformations
C. To calculate accuracy
D. To compute gradients
3. Which kernel is most commonly used in SVM for non-linear classification?
A. Linear
B. Polynomial
C. Radial Basis Function (RBF)
D. Sigmoid
4. In SVM, what are support vectors?
A. Data points that do not affect the decision boundary
B. Data points closest to the decision boundary
C. Data points farthest from the decision boundary
D. Randomly selected data points
5. Which library in Python is commonly used to implement SVM?
A. scikit-learn
B. TensorFlow
C. Pandas
D. NumPy

Conclusion:

Signature with Date of


Completion

Marks out of 10

46
Date: ____________
PRACTICAL-10

AIM: Implement RandomForest to classify the iris data set. Print both correct and wrong
predictions.
Theory:
Random Forest algorithm is a powerful tree learning technique in Machine Learning to make predictions
and then we do voting of all the tress to make prediction. They are widely used for classification and
regression task.
●​ It is a type of classifier that uses many decision trees to make predictions.
●​ It takes different random parts of the dataset to train each tree and then it combines the results by
averaging them. This approach helps improve the accuracy of predictions. Random Forest is
based on ensemble learning.
Imagine asking a group of friends for advice on where to go for vacation. Each friend gives their
recommendation based on their unique perspective and preferences (decision trees trained on different
subsets of data). You then make your final decision by considering the majority opinion or averaging
their suggestions (ensemble prediction).

47
As explained in image: Process starts with a dataset with rows and their corresponding class labels
(columns).

Then - Multiple Decision Trees are created from the training data. Each tree is trained on a random subset
of the data (with replacement) and a random subset of features. This process is known as bagging or
bootstrap aggregating.
Each Decision Tree in the ensemble learns to make predictions independently.
When presented with a new, unseen instance, each Decision Tree in the ensemble makes a prediction.
The final prediction is made by combining the predictions of all the Decision Trees. This is typically
done through a majority vote (for classification) or averaging (for regression).

Key Features of Random Forest


●​ Handles Missing Data: Automatically handles missing values during training, eliminating the
need for manual imputation.
●​ Algorithm ranks features based on their importance in making predictions offering valuable
insights for feature selection and interpretability.
●​ Scales Well with Large and Complex Data without significant performance degradation.
●​ Algorithm is versatile and can be applied to both classification tasks (e.g., predicting categories)
and regression tasks (e.g., predicting continuous values).

Assumptions of Random Forest


●​ Each tree makes its own decisions: Every tree in the forest makes its own predictions without
relying on others.
●​ Random parts of the data are used: Each tree is built using random samples and features to reduce
mistakes.
●​ Enough data is needed: Sufficient data ensures the trees are different and learn unique patterns
and variety.
●​ Different predictions improve accuracy: Combining the predictions from different trees leads to a
more accurate final results.

48
Program Code:

49
Output:

50
51
Post Practical Questions:
1. Random Forest is an ensemble method based on:
A. Decision Trees
B. Neural Networks
C. Support Vector Machines
D. Clustering Algorithms
2. What does the parameter n_estimators specify in Random Forest?
A. Maximum depth of trees
B. Number of trees in the forest
C. Minimum samples per split
D. Learning rate
3. Random Forest reduces overfitting by:
A. Using regularization
B. Combining multiple decision trees
C. Applying a sigmoid activation function
D. Increasing tree depth
4. Which metric is NOT used to evaluate a Random Forest classifier?
A. Accuracy
B. F1 Score
C. ROC-AUC
D. Gradient Descent
Answer: D
5. What is the purpose of random sampling in Random Forest?
A. To reduce computation time
B. To ensure diversity among the trees
C. To improve the learning rate
D. To eliminate outliers

Conclusion:

Signature with Date of


Completion

Marks out of 10

52
Date: ____________
PRACTICAL-11

AIM: Implement K means Clustering Algorithm Dataset. Visualize their results for the
same.
Theory:
What is K-means Clustering?
Unsupervised Machine Learning is the process of teaching a computer to use unlabeled,
unclassified data and enabling the algorithm to operate on that data without supervision. Without
any previous data training, the machine’s job in this case is to organize unsorted data according to
parallels, patterns, and variations.

K means clustering, assigns data points to one of the K clusters depending on their distance from
the center of the clusters. It starts by randomly assigning the clusters centroid in the space. Then
each data point assign to one of the cluster based on its distance from centroid of the cluster. After
assigning each point to one of the cluster, new cluster centroids are assigned. This process runs
iteratively until it finds good cluster. In the analysis we assume that number of cluster is given in
advanced and we have to put points in one of the group.

In some cases, K is not clearly defined, and we have to think about the optimal number of K. K
Means clustering performs best data is well separated. When data points overlapped this
clustering is not suitable. K Means is faster as compare to other clustering technique. It provides
strong coupling between the data points. K Means cluster do not provide clear information
regarding the quality of clusters. Different initial assignment of cluster centroid may lead to
different clusters. Also, K Means algorithm is sensitive to noise. It may have stuck in local
minima.

What is the objective of k-means clustering?


The goal of clustering is to divide the population or set of data points into a number of groups so
that the data points within each group are more comparable to one another and different from the
data points within the other groups. It is essentially a grouping of things based on how similar and
different they are to one another.

How k-means clustering works?


We are given a data set of items, with certain features, and values for these features (like a vector).
The task is to categorize those items into groups. To achieve this, we will use the K-means
algorithm, an unsupervised learning algorithm. ‘K’ in the name of the algorithm represents the
number of groups/clusters we want to classify our items into.

53
(It will help if you think of items as points in an n-dimensional space). The algorithm will
categorize the items into k groups or clusters of similarity. To calculate that similarity, we will use
the Euclidean distance as a measurement.

The algorithm works as follows:


●​ First, we randomly initialize k points, called means or cluster centroids.
●​ We categorize each item to its closest mean, and we update the mean’s coordinates, which
are the averages of the items categorized in that cluster so far.
●​ We repeat the process for a given number of iterations and at the end, we have our
clusters.
The “points” mentioned above are called means because they are the mean values of the items
categorized in them. To initialize these means, we have a lot of options. An intuitive method is to
initialize the means at random items in the data set. Another method is to initialize the means at
random values between the boundaries of the data set (if for a feature x, the items have values in
[0,3], we will initialize the means with values for x at [0,3]).

Program Code:

54
55
Output:

56
57
Post Practical Questions:
1. k-Means clustering is an example of which type of machine learning?
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. Semi-supervised learning
2. In k-Means clustering, what does the value of "k" represent?
A. Number of iterations
B. Number of clusters
C. Number of features
D. Number of outliers
3. What is the primary objective of the k-Means algorithm?
A. Minimize the sum of squared distances within clusters
B. Maximize the number of clusters
C. Minimize the dimensionality of the data
D. Maximize inter-cluster similarity
4. Which visualization method is most commonly used to plot the results of k-Means
clustering?
A. Line chart
B. Scatter plot
C. Histogram
D. Bar chart
5.Which Python function is used to perform k-Means clustering in scikit-learn?
A. KMeans()
B. DBSCAN()
C. AgglomerativeClustering()
D. LinearRegression()

Conclusion:

Signature with Date of


Completion

Marks out of 10

58
Date: ____________
PRACTICAL-12

AIM: Case Study about CNNs Implement anyone architecture of CNN.


Theory:
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture
commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that
enables a computer to understand and interpret the image or visual data.

When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural
Networks are used in various datasets like images, audio, and text. Different types of Neural
Networks are used for different purposes, for example for predicting the sequence of words we
use Recurrent Neural Networks more precisely an LSTM, similarly for image classification we
use Convolution Neural networks. In this blog, we are going to build a basic building block for
CNN.

Neural Networks: Layers and Functionality


In a regular Neural Network there are three types of layers:

●​ Input Layers: It’s the layer in which we give input to our model. The number of neurons in
this layer is equal to the total number of features in our data (number of pixels in the case
of an image).
●​ Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can
be many hidden layers depending on our model and data size. Each hidden layer can have
different numbers of neurons which are generally greater than the number of features. The
output from each layer is computed by matrix multiplication of the output of the previous
layer with learnable weights of that layer and then by the addition of learnable biases
followed by activation function which makes the network nonlinear.
●​ Output Layer: The output from the hidden layer is then fed into a logistic function like
sigmoid or softmax which converts the output of each class into the probability score of
each class.

The data is fed into the model and output from each layer is obtained from the above step is called
feedforward, we then calculate the error using an error function, some common error functions are
cross-entropy, square loss error, etc. The error function measures how well the network is
performing. After that, we backpropagate into the model by calculating the derivatives. This step
is called Backpropagation which basically is used to minimize the loss.

59
Convolution Neural Network
Convolutional Neural Network (CNN) is the extended version of artificial neural networks
(ANN) which is predominantly used to extract the feature from the grid-like matrix dataset. For
example visual datasets like images or videos where data patterns play an extensive role.

CNN Architecture

The Convolutional layer applies filters to the input image to extract features, the Pooling layer
downsamples the image to reduce computation, and the fully connected layer makes the final prediction.
The network learns the optimal filters through backpropagation and gradient descent.

Program Code:

60
61
Output:

62
63
Post Practical Questions:
1. What does CNN stand for?
A. Convolutional Neural Network
B. Continuous Neural Network
C. Computational Neural Network
D. Connected Neural Network
2. What is the primary operation in CNNs used to extract features from an image?
A. Pooling
B. Convolution
C. Flattening
D. Fully connected layers
3. What is the purpose of pooling layers in CNNs?
A. To increase the dimensionality of the feature map
B. To reduce the dimensionality of the feature map
C. To apply non-linearity to the data
D. To train the CNN model
4. Which activation function is commonly used in CNNs?
A. Sigmoid
B. ReLU (Rectified Linear Unit)
C. Tanh
D. Linear
5. Which library in Python provides functions to implement CNN architectures easily?
A. NumPy
B. TensorFlow/Keras
C. Matplotlib
D. Pandas

Conclusion:

Signature with Date of


Completion

Marks out of 10

64
Date: ____________
PRACTICAL-13

AIM: Case study about GANs Implement anyone GAN.


Theory:
What is a Generative Adversarial Network?
Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used for an
unsupervised learning. GANs are made up of two neural networks, a discriminator and a generator. They
use adversarial training to produce artificial data that is identical to actual data.
●​ The Generator attempts to fool the Discriminator, which is tasked with accurately distinguishing
between produced and genuine data, by producing random noise samples.
●​ Realistic, high-quality samples are produced as a result of this competitive interaction, which
drives both networks toward advancement.
●​ GANs are proving to be highly versatile artificial intelligence tools, as evidenced by their
extensive use in image synthesis, style transfer, and text-to-image synthesis.
●​ They have also revolutionized generative modeling.
Through adversarial training, these models engage in a competitive interplay until the generator
becomes adept at creating realistic samples, fooling the discriminator approximately half the time.

Generative Adversarial Networks (GANs) can be broken down into three parts:
●​ Generative: To learn a generative model, which describes how data is generated in terms of a
probabilistic model.
●​ Adversarial: The word adversarial refers to setting one thing up against another. This means
that, in the context of GANs, the generative result is compared with the actual images in the data
set. A mechanism known as a discriminator is used to apply a model that attempts to distinguish
between real and fake images.
●​ Networks: Use deep neural networks as artificial intelligence (AI) algorithms for training
purposes.
Types of GANs
1.​ Vanilla GAN
2.​ Conditional GAN (CGAN)
3.​ Deep Convolutional GAN (DCGAN)
4.​ Laplacian Pyramid GAN (LAPGAN)
5.​ Super Resolution GAN (SRGAN)

65
Program Code:

66
67
Output:

68
69
Post Practical Questions:
1. What does GAN stand for?
A. Generative Adversarial Network
B. Generalized Artificial Network
C. Gradient-Assisted Network
D. Generative Activation Network
2. A GAN consists of which two main components?
A. Generator and Transformer
B. Generator and Discriminator
C. Generator and Regressor
D. Generator and Optimizer
3. What is the primary objective of the generator in a GAN?
A. To classify input data
B. To generate data similar to the training data
C. To minimize the loss of the discriminator
D. To reduce dimensionality
4. What type of loss function is typically used in GANs?
A. Mean Squared Error
B. Cross-Entropy Loss
C. Hinge Loss
D. Binary Cross-Entropy Loss
5. What is one common application of GANs?
A. Data clustering
B. Image generation
C. Feature extraction
D. Regression analysis

Conclusion:

Signature with Date of


Completion

Marks out of 10

70
Date: ____________
PRACTICAL-14

AIM: Do a case study on any recent trends in AI.


Output:

71
72
73
Post Practical Questions:
1. Which of the following is a current trend in AI?
A. Symbolic AI
B. Foundation Models like GPT
C. Rule-Based Systems
D. Expert Systems
2. What is the primary purpose of transformers in modern AI?
A. Image classification
B. Natural Language Processing
C. Regression modeling
D. Clustering data
3. Which technology is driving advancements in generative AI?
A. GANs and Transformers
B. SVM and Random Forest
C. PCA and Clustering
D. CNN and RNN
4. Which term refers to AI systems that can generate human-like text, code, or images?
A. Discriminative AI
B. Generative AI
C. Predictive AI
D. Clustering AI
5. What is a significant ethical concern associated with recent advancements in AI?
A. Data clustering
B. Model overfitting
C. Bias in AI algorithms
D. Inability to train deep learning models

Conclusion:

Signature with Date of


Completion

Marks out of 10

74

You might also like