0% found this document useful (0 votes)
35 views20 pages

ML Merged

dksnksndksnsknndsndnsnkdskndnsnndsnksndsdsnknsdnndsnk

Uploaded by

begoj22622
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views20 pages

ML Merged

dksnksndksnsknndsndnsnkdskndnsnndsnksndsdsnknsdnndsnk

Uploaded by

begoj22622
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

DEPARTMENT OF COMPUTER ENGINEERING

A Laboratory Manual for


Machine Learning Lab (CSL701)
ACADEMIC YEAR: 2024-25

Course Name: Machine Learning Lab Course Code: CSL701

Name: __________________________________________________________

Semester: VII (Seventh) Roll No.: ___________________


Div.: ____________________________ Exam. Seat No.: _____________

Email ID: _________________________ Mobile No.: _________________


DEPARTMENT OF COMPUTER ENGINEERING

VISION AND MISSION


Institution's
To be a world class institute and a front runner in educational and socioeconomic
Vision development of the nation by providing high quality technical education to students
from all sections of society.
To provide superior learning experiences in a caring and conducive environment so
Mission
as to empower students to be successful in life & contribute positively to society.
We, at SHREE L. R. TIWARI COLLEGE OF ENGINEERING, shall dedicate and
strive hard to continuously achieve academic excellence in the field of Engineering
Quality
and to produce the most competent Engineers through objective & innovative
Policy
teaching methods, consistent updating of facilities, welfare & quality improvement of
the faculty & a system of continual process improvement.

Computer Engineering Department's


To be a department of high repute focused on quality education, training and skill
Vision development in the field of computer engineering to prepare professionals and
entrepreneurs of high calibre with human values to serve our nation and globe.
M1: To develop - technical, analytical, theoretical competencies, managerial skills
and practical exposure.
M2: Over all development of students, faculty and staff by providing encouraging
Mission environment and infrastructure for learning, skill development and research.
M3: To strengthen - versatility, adaptability and chase for excellence amongst
students with highest ethical values as their core strength

PEO-1: Be employed in industry, government, or entrepreneurial endeavours to


demonstrate professional advancement through significant technical achievements
and expanded leadership responsibility by exhibiting ethical attitude and good
communication skills.
Program
PEO-2: Demonstrate the ability to work effectively as a team member and/or leader
Educational
Objectives in an ever-changing professional environment.
PEO-3: To pursue higher studies, engage in professional development, research and
entrepreneurship and adapt to emerging technologies.

_______________
Student’s Signature
DEPARTMENT OF COMPUTER ENGINEERING

Certificate

This is to certify that Mr. /Ms.________________________________________

Class ________________ Roll No. __________ Exam Seat No. ___________ of

Seventh Semester of Degree in Computer Engineering has completed the

required number of Practical’s / Term Work / Sessional in the subject Machine

Learning Lab from the Department of Computer Engineering during the

academic year of 2024-2025 as prescribed in the curriculum.

Lecturer in-Charge Head of the Department Principal


Date:

Seal of
Institution
INSTRUCTION FOR STUDENTS

Students shall read the points given below for understanding the theoretical concepts and
practical applications.
1) Listen carefully to the lecture given by teacher about importance of subject, curriculum
philosophy learning structure, skills to be developed, information about equipment,
instruments, procedure, method of continuous assessment, tentative plan of work in
laboratory and total amount of work to be done in a semester.
2) Student shall undergo study visit of the laboratory for types of equipment, instruments,
software to be used, before performing experiments.
3) Read the write up of each experiment to be performed, a day in advance.
4) Organize the work in the group and make a record of all observations.
5) Understand the purpose of experiment and its practical implications.
6) Write the answers of the questions allotted by the teacher during practical hours if
possible or afterwards, but immediately.
7) Student should not hesitate to ask any difficulty faced during conduct of
practical/exercise.
8) The student shall study all the questions given in the laboratory manual and practice to
write the answers to these questions.
9) Student shall develop maintenance skills as expected by the industries.
10) Student should develop the habit of pocket discussion/group discussion related to the
experiments/exercises so that exchanges of knowledge/skills could take place.
11) Student shall attempt to develop related hands-on-skills and gain confidence.
12) Student shall focus on development of skills rather than theoretical or codified
knowledge.
13) Student shall visit the nearby workshops, workstation, industries, laboratories, technical
exhibitions, trade fair etc. even not included in the Lab manual. In short, students should
have exposure to the area of work right in the student hood.
14) Student shall insist for the completion of recommended laboratory work, industrial
visits, answers to the given questions, etc.
15) Student shall develop the habit of evolving more ideas, innovations, skills etc. those
included in the scope of the manual.
16) Student shall refer technical magazines, proceedings of the seminars, refer websites
related to the scope of the subjects and update his knowledge and skills.
17) Student should develop the habit of not to depend totally on teachers but to develop self-
learning techniques.
18) Student should develop the habit to react with the teacher without hesitation with respect
to the academics involved.
19) Student should develop habit to submit the practicals, exercise continuously and
progressively on the scheduled dates and should get the assessment done.
20) Student should be well prepared while submitting the write up of the exercise. This will
develop the continuity of the studies and he/she will not be over loaded at the end of the
term.
GUIDELINES FOR TEACHERS

Teachers shall discuss the following points with students before start of practicals of the subject.
1) Learning Overview: To develop better understanding of importance of the subject. To
know related skills to be developed such as Intellectual skills and Motor skills.
2) Learning Structure: In this, topic and sub topics are organized in systematic way so that
ultimate purpose of learning the subject is achieved. This is arranged in the form of fact,
concept, principle, procedure, application and problem.
3) Know your Laboratory Work: To understand the layout of laboratory, specifications of
equipment/Instruments/Materials, procedure, working in groups, planning time ets.
Also to know total amount of work to be done in the laboratory.
4) Teaching shall ensure that required equipments are in working condition before start of
experiment, also keep operating instruction manual available.
5) Explain prior concepts to the students before starting of each experiment.
6) Involve students activity at the time of conduct of each experiment.
7) While taking reading/observation each student shall be given a chance to perform or
observe the experiment.
8) If the experimental set up has variations in the specifications of the equipment, the
teachers are advised to make the necessary changes, wherever needed.
9) Teacher shall assess the performance of students continuously as per norms prescribed
by university of Mumbai and guidelines provided by IQAC.
10) Teacher should ensure that the respective skills and competencies are developed in the
students after the completion of the practical exercise..
11) Teacher is expected to share the skills and competencies are developed in the students.
12) Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from the students by the industries.
13) Teachers shall ensure that industrial visits if recommended in the manual are covered.
14) Teacher may suggest the students to refer additional related literature of the Technical
papers/Reference books/Seminar proceedings, etc.
15) During assessment teacher is expected to ask questions to the students to tap their
achievements regarding related knowledge and skills so that students can prepare while
submitting record of the practicals. Focus should be given on development of enlisted
skills rather than theoretical /codified knowledge.
16) Teacher should enlist the skills to be developed in the students that are expected by the
industry.
17) Teacher should organize Group discussions /brain storming sessions / Seminars to
facilitate the exchange of knowledge amongst the students.
18) Teacher should ensure that revised assessment norms are followed simultaneously and
progressively.
19) Teacher should give more focus on hands on skills and should actually share the same.
20) Teacher shall also refer to the circulars related to practicals supervise and assessment
for additional guidelines.
DEPARTMENT OF COMPUTER ENGINEERING
Student’s Progress Assessments
Student Name: __________________________________ Roll No.: ______________________
Class/Semester: BE CS/SEM-VII Academic Year: 2024-2025
Course Name: Machine Learning Laboratory Course Code: CSL701
Assessment Parameters for Practical’s/Mini Project/Assignments
Criteria for Grading Total Lab
Exp.
No.
Title of Experiment PE KT DR DN PL (out of Average Objective
(out of 3)
(Out of 3) (Out of 3) (Out of 3) (Out of 3) (Out of 3) 15) s

1 To Implement Linear Regression LO1


To Study and Implement Logistic
2 Regression LO1

To Implement ensemble
3 learning bagging and boosting LO2

To Implement multivariate
4 Linear Regression LO1

5 To Implement SVM LO1

6 To Implement PCA LO2


To Implement Graph Based
7 Clustering LO5

8 To Implement DB Scan LO2

9 To Implement CART L04

10 To Implement LDA L06

11 Mini Project L01-L06

Average Marks
Criteria for Grading – Preparedness and Efforts(PE),Knowledge of tools(KT), Debugging and results(DR),
Documentation(DN), Punctuality & Lab Ethics(PL).
Criteria for Grading Total
Assignments (out of Average Covere
TS OM NT IS (out of 3) d COs
(Out of 3) (Out of 3) (Out of 3) (Out of 3) 12)
C01-C0
Assignment No. 1
3
C04-C0
Assignment No. 2
6
Average Marks
Criteria for Grading –Timely submission(TS), Originality of the material(OM), Neatness(NT), Innovative solution(IS)
Grades – Meet Expectations(3 Marks), Moderate Expectations (2 Marks), Below Expectations (1 Mark)

_______________ _________________ _______________


Student’s Signature Subject In-charge Head of Department
DEPARTMENT OF COMPUTER ENGINEERING
RECORD OF PROGRESSIVE ASSESSMENTS
Student Name: __________________________________ Roll No.: ________ (BE CS SEM-VII)
Course Name : Machine Learning Laboratory Course Code: CSL701

Assessment of Experiments (A)


Sr. Page Date of Date of Assessmen Teacher's
CO
Name of Experiments Performanc Submissio t (out of Signature
Covered
no. No. e n 15) and Remark

1 To Implement Linear Regression LO1


To Study and Implement Logistic
2 Regression LO1

To Implement ensemble learning bagging


3 and boosting LO2

To Implement multivariate Linear


4 Regression LO1

5 To Implement SVM LO1

6 To Implement SVM LO2

7 To Implement Graph Based Clustering LO5

8 To Implement DB Scan LO2

9 To Implement CART L04

10 To Implement LDA L06

11 Mini Project L01-L06

Average Marks (Out of 15)


Assessment of Assignments (B)
Teacher's
Sr. Date of Date of Assessment CO
Assignment Page No. Signature and
no. Display Completion (Out of 12) Remark Covered

1 Assignment No.1 C01-C03

2 Assignment N0. 2 C04-C06

Average Marks (Out of 12)


Converted Marks (Out of 5) (B)

Assessments of Attendance (C)


Mobile Computing Theory Attendance Mobile Computing Attendanc
AVG.
e Marks
Attendance
TH (out of) TH attend. TH % PR (out of) PR Attend. PR % (C) (Out
% (TH+PR)
of 5)

80

Total Term Work Marks: A+B+C = _________ (Out of 25)

_______________ _________________ _______________


Student Signature Subject In-charge Head of the Department
DEPARTMENT OF COMPUTER ENGINEERING
Course Objectives and Outcomes
Academic Year: 2024-2025 Class: BE Course Code: CSL701
Program: Computer Engineering Div: A Course Name: ML
Department: Computer Engineering Sem.: VII Faculty: Prof. Rajesh Gaikwad
Course Objectives:
Sr. No. Statement
1 To introduce the basic concepts and techniques of Machine Learning.
To acquire an in-depth understanding of various supervised and unsupervised
2
algorithms.
3 To be able to apply various ensemble techniques for combining ML models.
4 To demonstrate dimensionality reduction techniques.
Course Outcomes:
CO's No. Abbre. Statement
CSL701.1 CO1 To acquire fundamental knowledge of developing machine learning models.
To select, apply and evaluate an appropriate machine learning model for the
CSL701.2 CO2
given application.
To demonstrate ensemble techniques to combine predictions from different
CSL701.3 CO3
models.
CSL701.4 CO4 To demonstrate learning with Classification.
CSL701.5
CO5 To demonstrate learning with Clustering.
CSL701.6
C06 To demonstrate the dimensionality reduction techniques.
Course Prerequisite:
Sr. No. Pre-requisite
1 Data Structures, Analysis of Algorithms.
Teaching and Examination Scheme:
Teaching Credits Assigned
Examination Scheme
Scheme (Hrs)
TW/
Theory Pract Tut Theory
Pract
Tut Total Theory
Oral
Internal End Exam T Ora Tota
&
Assessment Sem. Duration W l l
3 2 - 3 1 - 4 Pract
Test 1 Test 2 Avg. Exam ( in Hrs)
20 20 20 - - 25 -- 25 50
Term Work (Total 25 Marks) = (Experiments: 15 mark + Assignments: 05 mark + Attendance:
05 marks (TH+PR)).
DEPARTMENT OF COMPUTER ENGINEERING

Course Exit Form


Student Name: __________________________________ Roll No.: ______________________
Class/Semester: _______________________________ Academic Year: ________________
Course Name: ___________________________________ Course Code: __________________

Judge your ability with regard to the following points by putting a (√), on the scale of 1 (lowest) to
5 (highest), based on the knowledge and skills you attained from this course.

Sr. 1 5
Your ability to 2 3 4
No. Lowest Highest

CSC701.1: To acquire fundamental knowledge of


1 developing machine learning models.

CSC701.2:To select, apply and evaluate an


2 appropriate machine learning model for the given
application.
CSC701.3:To demonstrate ensemble techniques to
3
combine predictions from different models.
CSC701.4:To demonstrate learning with
4
Classification.
CSC701.5: To demonstrate learning with
5
Clustering.

CSC701.6:To demonstrate the dimensionality


6
reduction techniques.

_______________ _______________
Student’s Signature Date
DEPARTMENT OF COMPUTER ENGINEERING
Programme Outcome (PO & PSOs)
Programme Outcomes are the skills and knowledge which the students have at the time of graduation. This will indicate
what student can do from subject-wise knowledge acquired during the programme.
PO Short title of the PO Description of the Programme outcome as defined by the NBA
Apply the knowledge of mathematics, science, engineering fundamentals, and an
PO-1 Engineering knowledge
engineering specialization to the solution of complex engineering problems.
Identify, formulate, review research literature, and analyze complex
PO-2 Problem analysis engineering problems reaching substantiated conclusions using first principles
of mathematics, natural sciences, and engineering sciences.
Design solutions for complex engineering problems and design system
Design/development of components or processes that meet the specified needs with appropriate
PO-3
solutions consideration for the public health and safety, and the cultural, societal, and
environmental considerations.
Use research-based knowledge and research methods including design of
Conduct investigations of
PO-4 experiments, analysis and interpretation of data, and synthesis of the information to
complex problems
provide valid conclusions.
Create, select, and apply appropriate techniques, resources, and modern engineering
PO-5 Modern tool usage and IT tools including prediction and modeling to complex engineering activities with
an understanding of the limitations.
Apply reasoning informed by the contextual knowledge to assess societal, health,
The engineer and
PO-6 safety, legal and cultural issues and the consequent responsibilities relevant to the
society
professional engineering practice.
Understand the impact of the professional engineering solutions in societal and
Environment and
PO-7 environmental contexts, and demonstrate the knowledge of, and need for
sustainability
sustainable development.
Apply ethical principles and commit to professional ethics and responsibilities and
PO-8 Ethics
norms of the engineering practice.
Function effectively as an individual, and as a member or leader in diverse teams, and
PO-9 Individual and team work
in multidisciplinary settings.
Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write
PO-10 Communication
effective reports and design documentation, make effective presentations, and give
and receive clear instructions.
Demonstrate knowledge and understanding of the engineering and
Project management
PO-11 management principles and apply these to one’s own work, as a member and leader
and finance
in a team, to manage projects and in multidisciplinary environments.
Recognize the need for, and have the preparation and ability to engage in
PO-12 Life-long learning
independent and life-long learning in the broadest context of technological change.
Program Specific Outcomes (PSOs) defined by the programme. Baseline-Rational Unified Process(RUP)
The graduate must be able to develop, deploy, test and maintain the software or
Computing solution to
PSO-1 computing hardware solutions to solve real life problems using state of the art
solve real life problem
technologies, standards, tools and programming paradigms.

Computer Engineering The graduate should be able to adapt Computer Engineering knowledge and skills to
PSO-2
knowledge and skills create career paths in industries or business organizations or institutes of repute.
DEPARTMENT OF COMPUTER ENGINEERING
CSL701 Machine Learning lab
Seven Semester, 2024-2025 (Odd Semester)

Name of Student :

Roll No. :

Division :

Assignment No. :

Outcome :

Task :

Date of Assignment :

Date of Submission :

Max. Marks
Particulars
Marks Obtained
Timely Submission (TS) 3

Originality of material (OM) 3

Neatness (NT) 3

3
Innovative Solution (IS)
12
Total
Grades – Meet Expectations (3 Marks), Moderate Expectations (2 Marks), Below Expectations (1 Mark)

Checked and Verified by


Name of Faculty : Prof. Rajesh Gaikwad
Signature :
Date :
Assignment 2
1. What is constrained optimization?
Ans: Constrained optimization is a mathematical approach used to identify the best solution
(maximum or minimum) of an objective function while adhering to specific constraints.
These constraints can be represented as equations or inequalities that restrict the values of the
decision variables. For example, a business may aim to maximize profit while staying within
budget limits or resource availability. This process is essential across various fields,
including economics, engineering, and logistics, where decision-making must align with
practical limitations.
To solve constrained optimization problems, several methods are employed, including
Lagrange multipliers and the Karush-Kuhn-Tucker (KKT) conditions. Lagrange multipliers
introduce additional variables to account for the constraints, allowing for the incorporation of
these limitations into the optimization process. In contrast, linear programming is a
specialized form where both the objective function and constraints are linear, making it easier
to analyze and solve. Overall, constrained optimization provides a structured way to make
informed decisions within the boundaries of real-world limitations.

2. What are popular Algorithm for Multiclass Classifications?

Ans: Multiclass classification is a common machine learning task where you need to classify
data points into one of multiple possible classes or categories. There are several popular
algorithms and approaches for multiclass classification, each with its own strengths and
weaknesses. Here are some of the most popular algorithms for multiclass classification:
Logistic Regression: While often used for binary classification, logistic regression can be
extended to multiclass problems through techniques like one-vs-all (OvA) or softmax
regression. It trains multiple binary classifiers or a single classifier with multiple output classes.
Decision Trees: Decision tree algorithms like Random Forest and Gradient Boosting can be
used for multiclass classification. They partition the feature space into regions and assign
classes to those regions.
k-Nearest Neighbors (k-NN): k-NN is a simple yet effective algorithm for multiclass
classification. It assigns a data point to the majority class among its k-nearest neighbors in the
feature space.
Naive Bayes: Naive Bayes algorithms, such as Gaussian Naive Bayes or Multinomial Naive
Bayes, are probabilistic classifiers that work well for multiclass problems, especially in text
classification.
Support Vector Machines (SVM): SVMs can handle multiclass problems through one-vs-
one (OvO) or one-vs-all (OvA) strategies. SVMs aim to find a hyperplane that best separates
the classes.
3. What is Graph Based Clustering?
Ans: Graph-based clustering is a technique that organizes data points into groups (or
clusters) by modeling them as nodes in a graph, where edges represent the relationships or
similarities between these points. This method capitalizes on the connectivity of the graph to
identify clusters, making it particularly effective for complex datasets where traditional
clustering algorithms may struggle. For instance, in social network analysis, individuals can
be represented as nodes, and their interactions (like friendships or collaborations) as edges.
By analyzing the structure of this graph, one can identify tightly-knit communities or groups
of users with similar interests.
A common algorithm used in graph-based clustering is Spectral Clustering. This method
involves constructing a graph Laplacian from the similarity matrix of the data and then
computing its eigenvalues and eigenvectors. For example, consider a dataset of images where
each image is a node, and edges represent similarity based on visual features. By applying
Spectral Clustering, we can effectively group similar images together, allowing for tasks like
automatic categorization or retrieval based on visual similarity. This approach not only
enhances the quality of clustering but also provides a deeper understanding of the underlying
relationships within the data.
4. Write short note on Epsilon neighborhood graph.

Ans: An Epsilon Neighborhood Graph, often referred to as an Epsilon Graph or Epsilon-


Nearest Neighbors Graph, is a data structure used in machine learning and data analysis for
tasks like clustering and density estimation. It is particularly associated with the DBSCAN
(Density-Based Spatial Clustering of Applications with Noise) algorithm, which is used for
clustering spatial data.
Here's a short note on the Epsilon Neighborhood Graph:
Definition: An Epsilon Neighborhood Graph is constructed by connecting data points that are
within a specified distance, called "epsilon" (ε), of each other. In other words, for each data
point, the graph includes edges to all other data points that are within a radius of ε units from
it.
DBSCAN Application: Epsilon Neighborhood Graphs are used in the DBSCAN clustering
algorithm to identify dense regions in a dataset. DBSCAN assigns core points, border points,
and noise points based on the connectivity of data points in the Epsilon Neighborhood Graph.
Core points have a minimum number of data points (defined by the "min_samples" parameter
in DBSCAN) within ε distance, border points are within ε distance of core points but do not
meet the minimum count requirement, and noise points have no core points within ε distance.
Variable Density Data: Epsilon Neighborhood Graphs are valuable for clustering data with
varying densities, as they adapt to the local density of data points. In regions with high data
density, the graph will have many edges, while in sparse regions, it will have fewer edges.
Tuning Parameter: The choice of the ε parameter is critical in constructing the Epsilon
Neighborhood Graph. A smaller ε will result in more fine-grained clusters, while a larger ε
may combine multiple clusters into one. Finding the right ε value often requires domain
knowledge or experimentation.
Efficiency: Constructing the Epsilon Neighborhood Graph efficiently can be a challenge,
especially for large datasets. Various data structures, like KD-trees and R-trees, are used to
speed up the search for neighboring points within ε distance.

5. Explain K-means and Spectral Clustering.\

Ans: K-Means and Spectral Clustering are two different approaches to clustering data, each
with its own strengths and weaknesses. Let's explore each of them in more detail:
K-Means Clustering:
Basic Idea: K-Means is a partition-based clustering algorithm that aims to group data points
into K clusters, where K is a predefined number of clusters.
Clustering Process:
Initialization: K initial cluster centroids are randomly or strategically chosen from the data
points.
Assignment: Each data point is assigned to the cluster whose centroid is closest (usually based
on Euclidean distance).
Update: The centroids of the clusters are recalculated as the mean of all data points assigned
to that cluster.
Repeat Assignment and Update: The assignment and update steps are repeated until
convergence (i.e., when the centroids no longer change significantly) or for a specified number
of iterations.
Strengths:
Simplicity and efficiency: K-Means is computationally efficient and easy to implement.
Works well for spherical clusters: It performs well when clusters are roughly spherical, evenly
sized, and have similar densities.
Weaknesses:
Sensitive to initialization: The choice of initial centroids can affect the final clustering result,
leading to suboptimal solutions.
Assumes equal-sized, spherical clusters: K-Means may struggle with non-convex clusters,
uneven cluster sizes, and clusters with varying densities.
Spectral Clustering:
Basic Idea: Spectral Clustering is a graph-based clustering algorithm that leverages spectral
graph theory to find clusters in data.
Clustering Process:
Construct Similarity Graph: A similarity graph (e.g., Epsilon Neighborhood Graph or K-
Nearest Neighbors Graph) is created based on pairwise similarities between data points.
Graph Laplacian: A graph Laplacian matrix is derived from the similarity graph.
Eigenvector Decomposition: The eigenvectors and eigenvalues of the Laplacian matrix are
computed.
Dimension Reduction: A subset of the eigenvectors (usually corresponding to the smallest
eigenvalues) is selected to reduce the dimensionality of the data.
Clustering: Traditional clustering techniques like K-Means are applied in the reduced-
dimensional space.
Strengths:
Handles non-convex clusters: Spectral Clustering is effective at finding clusters with complex
shapes, as it captures the underlying data structure.
Not sensitive to initialization: Unlike K-Means, Spectral Clustering is not sensitive to the initial
choice of cluster centers.
Can uncover hidden structures: It can discover clusters that may not be apparent in the original
feature space.
Weaknesses:
Parameter tuning: Choosing the number of clusters (K) and graph-related parameters (e.g.,
epsilon or the number of nearest neighbors) can be challenging.
Computationally intensive: Spectral Clustering can be computationally expensive, especially
for large datasets, due to eigenvalue decomposition.

6. Why dimension Reduction is very important step in machine Learning?

Ans: Dimension reduction is a crucial step in machine learning for several reasons:
Curse of Dimensionality: As the number of features (dimensions) in a dataset increases, the
amount of data required to adequately cover that space grows exponentially. This phenomenon
is known as the "curse of dimensionality." With high-dimensional data, the dataset can become
sparse, making it challenging to find meaningful patterns and relationships. Dimension
reduction helps mitigate this problem by reducing the number of features while retaining
important information.
Computational Efficiency: High-dimensional data requires more computational resources for
training machine learning models, making the process slow and resource-intensive. Dimension
reduction can significantly speed up training and prediction times by reducing the feature
space's dimensionality.
Overfitting Reduction: High-dimensional datasets are more prone to overfitting, where a
model fits the noise in the data rather than the underlying patterns. Reducing the dimensionality
can help reduce overfitting and improve a model's generalization to unseen data.
Visualization: Visualizing data in high dimensions is challenging. Humans are limited in their
ability to comprehend and visualize data beyond three dimensions. Dimension reduction
techniques, such as Principal Component Analysis (PCA) or t-SNE, project data into lower-
dimensional spaces that can be visualized more easily, helping analysts and data scientists gain
insights.
Feature Engineering: Dimension reduction can assist in feature engineering by identifying
which features contribute the most to explaining the data's variance or target variable. This
knowledge can guide feature selection and the creation of more informative features.
Improved Model Performance: Removing irrelevant or redundant features through
dimension reduction can lead to a simpler and more interpretable model, improving model
performance and reducing the risk of overfitting.
Noise Reduction: High-dimensional data often contains noise or irrelevant information.
Dimension reduction methods aim to preserve the most informative features while discarding
less useful ones, effectively reducing the impact of noise.
Interpretability: Simplifying the dataset through dimension reduction can make it easier to
interpret and understand the relationships between variables. This is especially important in
fields like healthcare and finance, where interpretability is crucial.
DEPARTMENT OF COMPUTER ENGINEERING
CSL702 Machine Learning Lab
Seven Semester, 2024-25 (ODD Semester)

Name of Student :

Roll No. :

Batch :

Title of Mini Project :

Date of Implementation:

Date of Submission :

POs Max. Marks


Particulars
covered Marks Obtained
Knowledge regarding design and analysis of basic electronics PO-2 3
circuits(KD)
Working in a group (WG) PO-9 3
Presentation skill (PS) PO-10 3
Time Management (TM) PO-11 3
Lifelong learning (LL) PO-12 3
Ethics (ET) PO-8 3
Total 18
Total (out of 5)
Grades – Meet Expectations (3 Marks), Moderate Expectations (2 Marks), Below Expectations (1 Mark)

Checked and Verified by

Name of Faculty : Prof. Rajesh Gaikwad


Signature :
Date :

You might also like