0% found this document useful (0 votes)

3 views12 pages

ML File - 1

The document provides a comprehensive introduction to various supervised and unsupervised learning techniques using the Scikit-learn library in Python. It covers exercises on K-Nearest Neighbors, K-Means Clustering, Linear Regression, Logistic Regression, and Decision Trees, detailing their theoretical foundations, code implementations, and practical applications. Each exercise aims to enhance understanding of machine learning concepts and their real-world applications.

Uploaded by

shubhamgoelgaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views12 pages

ML File - 1

Uploaded by

shubhamgoelgaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Exercise 1: Introduction to Scikit-learn for Supervised Learning

Objective: To gain hands-on experience with implementing supervised learning algorithms

using the Scikit-learn library in Python.

Solution:

Theory:

Supervised Learning is a type of machine learning where the algorithm is trained on labeled
data, i.e., input features (X) and corresponding output labels (y). The goal is to learn a mapping
function that can predict the output for new, unseen data.

Scikit-learn (sklearn) is a Python library that provides simple and efficient tools for data
mining and machine learning. It includes several classification, regression, and clustering
algorithms.

In this exercise, we use the K-Nearest Neighbors (KNN) classifier on the Iris dataset, a
popular dataset containing measurements of three types of iris flowers.
Key Features of Scikit-learn:

• Supervised learning: Classification and regression algorithms like:

o Decision Tree

o Random Forest

o K-Nearest Neighbors (KNN)

o Support Vector Machine (SVM)

o Logistic Regression
• Unsupervised learning: Clustering and dimensionality reduction algorithms like:

o K-Means

o PCA (Principal Component Analysis)

• Model selection: Tools for cross-validation, hyperparameter tuning (GridSearchCV,

RandomizedSearchCV)

• Preprocessing: Functions for:

o Data scaling (e.g., StandardScaler)

o Handling missing values

o Encoding categorical variables

• Datasets: Includes built-in sample datasets like:
o Iris

1|Page
o Digits

o Boston Housing (deprecated)

o Wine, etc.

Why Use Scikit-learn?

• Easy to learn and use

• Well-documented

• Integrates well with other libraries like NumPy, Pandas, and Matplotlib

• Widely used in education, research, and industry

---------------------------------------------Page End-----------------------------------------------------

Exercise 2: Exploring Unsupervised Learning with K-Means Clustering

Objective: To explore the concepts of unsupervised learning and clustering using the K-Means
algorithm.

Solution:

Objective:
To explore the concepts of unsupervised learning by applying the K-Means clustering
algorithm using the Scikit-learn library. This exercise helps understand how to group similar
data points when no labels are provided.

Theory:

What is Unsupervised Learning?

Unsupervised learning is a type of machine learning where the model is not given any labeled
output data. The goal is to discover hidden patterns or groupings in the data.
What is K-Means Clustering?

K-Means is a popular unsupervised learning algorithm used for clustering data into K groups
(clusters) based on feature similarity. It works by:
1. Choosing the number of clusters (K).

2. Randomly selecting initial cluster centroids.

3. Assigning data points to the nearest centroid.

4. Updating centroids based on the mean of points in each cluster.

5. Repeating steps 3 and 4 until convergence.

Code Implementation:
import pandas as pd

2|Page
import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

df = pd.read_csv('/content/income.csv')
df.info()

df.describe()

# features scalling

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

df['Age']=scaler.fit_transform(df[['Age']])

df['Income($)']=scaler.fit_transform(df[['Income($)']])
print(df)
# plot data point

plt.figure(figsize=(10,4))

plt.scatter(df['Age'],df['Income($)'],s=100)

plt.xlabel('Age')

plt.ylabel('Income')

plt.title('Customer Data')

plt.show()
# Assuming 'df' is your DataFrame containing the 'Age' and 'Income($)' columns

X = df[['Age', 'Income($)']] # Select the features for clustering

km = KMeans(n_clusters=5)

ypred = km.fit_predict(X) # Pass the feature data to fit_predict

ypred

df['cluster'] = ypred
print(df)

## to get the centroid of the cluster

centroid=km.cluster_centers_

3|Page
centroid

df1=df[df['cluster']==0]

df1

df2=df[df['cluster']==1]
df3=df[df['cluster']==2]

plt.scatter(df1['Age'],df1['Income($)'],color='green',label='cluster1',s=150)

plt.scatter(df2['Age'],df2['Income($)'],color='red',label='cluster2',s=150)

plt.scatter(df3['Age'],df3['Income($)'],color='blue',label='cluster3',s=150)

# to draw the centroid

plt.scatter(centroid[:,0],centroid[:,1],s=200,marker="*",color='purple',label='centroid'
)

plt.show()

Output:

Fig no.1

4|Page
Fig no.2

---------------------------------------------------Page End-----------------------------------------------

Exercise 3: Implementing Linear Regression from Scratch.

Objective: To gain a deeper understanding of linear regression by implementing it from scratch

using Python.

Solution:
Theory:

What is Linear Regression?

Linear regression is a supervised learning algorithm used for predicting continuous values. It
assumes a linear relationship between the input feature x and the output y, modeled by the
equation:

y=mx+cy = mx + cy=mx+c

Where:
• m is the slope (also called weight or coefficient),

• c is the intercept (bias),

• x is the independent variable,

• y is the dependent variable (target).

Code Implementation:

import numpy as np
import matplotlib.pyplot as mtp

5|Page
import pandas as pd

data_set= pd.read_csv("Salary_Data.csv")

x=data_set.iloc[:,:-1].values

y=data_set.iloc[:,1].values
print(x)

print(y)

# splitting the data set into training and testing

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test= train_test_split(x,y,test_size=1/3, random_state=0)

# fitting the simple linear regression model to the training dataset

from sklearn.linear_model import LinearRegression

regressor= LinearRegression() # regressor is just a variable we can replace it to any other
variable (such as a,x,etc)

regressor.fit(x_train,y_train)
# prediction of test and training set result

y_pred= regressor.predict(x_test)

print(y_pred)

x_pred= regressor.predict(x_train)

print(x_pred)

# visualizing the training set results

mtp.scatter(x_train,y_train,color="green")
mtp.plot(x_train,x_pred,color="red")

mtp.title("Salary vs Experience (Training set)")

mtp.xlabel("Years of Experience")

mtp.ylabel("Salary(In Rupees)")

mtp.show()

# visualizing the test set result

mtp.scatter(x_test,y_test,color="blue")
mtp.plot(x_train,x_pred,color="red")

6|Page
mtp.title("Salary vs Experience (Test Dataset)")

mtp.xlabel("Years of Experience")

mtp.ylabel("Salary(In Rupees)")

mtp.show()
Output:

Fig no.1

Fig no.2
------------------------------------------------------------

7|Page
Exercise 4: Binary Classification with Logistic Regression.

Objective: To implement logistic regression for binary classification tasks and understand its
application in real-world scenarios.

Solution:

Theory:

What is Binary Classification?

Binary classification is a supervised learning task where the output variable has only two
possible classes, e.g., yes/no, 0/1, true/false.

What is Logistic Regression?

Logistic Regression is a statistical model used for classification tasks. It estimates the
probability that a given input point belongs to a certain class using the sigmoid (logistic)
function:

σ(z)=11+e−z, where z=w⋅x+b\sigma(z) = \frac{1}{1 + e^{-z}}, \text{ where } z = w \cdot x +

bσ(z)=1+e−z1, where z=w⋅x+b

The output is a probability between 0 and 1, and a threshold (usually 0.5) is used to assign class
labels.

Real-world Applications:

• Email spam detection (Spam or Not Spam)

• Disease diagnosis (Positive or Negative)

• Credit risk assessment (Default or Not)

Code Implementation:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

from sklearn.preprocessing import LabelEncoder

# load the titanic data

data = pd.read_csv("/content/Titanic-Dataset.csv")

print(data)

# Select features and target

features =['Pclass','Sex','Age','SibSp','Parch','Fare',]

8|Page
data = data[features + ['Survived']]

# Handle missing values

data['Age'].fillna(data['Age'].median(), inplace=True)

# Convert categorical column 'sex' to numeric

le = LabelEncoder()

data['Sex'] = le.fit_transform(data['Sex']) # male =1, female = 0

# split features and target

X = data[features]

Y = data['Survived']

# Train-test split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Train logistic regression
model = LogisticRegression(max_iter=200)

model.fit(X_train,Y_train)

# Predictions

Y_pred = model.predict(X_test)

# Evaluation

print("Accuracy:", accuracy_score(Y_test, Y_pred))

print("\nClassification Report:\n", classification_report(Y_test, Y_pred))

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,
roc_curve, auc
import matplotlib.pyplot as plt

import seaborn as sns

cm= confusion_matrix(Y_test, Y_pred)

plt.figure(figsize=(6, 5))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Did Not Survive',

'Survived'], yticklabels=['Did Not Survived', 'Survived'])

plt.xlabel('Predicted')
plt.ylabel('Actual')

9|Page
plt.title('Confusion Matrix')

plt.show()

Output:

Fig no.1

Exercise 5: Decision Tree Classifier for Multiclass Classification

Objective: To understand the working of decision tree classifiers and their application in
multiclass classification problems.

Solution:
Theory:

What is a Decision Tree Classifier?

A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It works by splitting the dataset into branches based on feature values,
forming a tree structure. Each node represents a decision based on a feature, and each leaf node
represents a class label.

What is Multiclass Classification?

Multiclass classification involves classifying inputs into more than two categories, unlike
binary classification. For example:

• Classifying flowers as Setosa, Versicolor, or Virginica

• Digit recognition (0–9)

10 | P a g e
Use Case:

Classify iris flowers into three species using Decision Tree.

Code Implementation:

from sklearn.datasets import load_iris

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report

# Load dataset

iris = load_iris()
X = iris.data
y = iris.target

# Split into training and testing data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Decision Tree classifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Make predictions

y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred,

target_names=iris.target_names))

Output:

11 | P a g e
Fig no.1

12 | P a g e

ML Cheatsheet
No ratings yet
ML Cheatsheet
4 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Machine
100% (1)
Machine
45 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
Lesson 09 - Introduction To Model Building
No ratings yet
Lesson 09 - Introduction To Model Building
85 pages
Slides (A12 A14)
No ratings yet
Slides (A12 A14)
353 pages
ML File
No ratings yet
ML File
17 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Final
No ratings yet
Final
13 pages
6 Real-World Case Studies: Data Science For Business
No ratings yet
6 Real-World Case Studies: Data Science For Business
18 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
ML Notes 1
No ratings yet
ML Notes 1
3 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
A.V.C College of Engineering, Mannampandal M.E - Applied Electronics
No ratings yet
A.V.C College of Engineering, Mannampandal M.E - Applied Electronics
3 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Crash Course Sul Machine Learning ?
No ratings yet
Crash Course Sul Machine Learning ?
13 pages
ML Lab Mannual
No ratings yet
ML Lab Mannual
29 pages
ML File
No ratings yet
ML File
10 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page
Logistic Regression
No ratings yet
Logistic Regression
21 pages
ML Python Exercises UOM BDS Classification
No ratings yet
ML Python Exercises UOM BDS Classification
18 pages
ML External Xerox
No ratings yet
ML External Xerox
1 page
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
Minor Lab
No ratings yet
Minor Lab
4 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
Machine Learning Classification Bootcamp Cheatsheet
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
7 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Advance AI and ML LAB
No ratings yet
Advance AI and ML LAB
16 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
St. John College of Engineering and Management, Palghar - Maharashtra
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
11 pages
ML and Deploying It Using Flask and Docker.
No ratings yet
ML and Deploying It Using Flask and Docker.
30 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
BE - LP III Lab Manual
No ratings yet
BE - LP III Lab Manual
54 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Supervised Learning With Scikit-Learn
No ratings yet
Supervised Learning With Scikit-Learn
178 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
AIYA Internship Offer Letter Summer 2024
No ratings yet
AIYA Internship Offer Letter Summer 2024
6 pages
ML Models
No ratings yet
ML Models
21 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
Chapter 1,2 Report
No ratings yet
Chapter 1,2 Report
5 pages
Evaluating Fracture-Fluid Flowback in Marcellus Using Data-Mining Technologies
No ratings yet
Evaluating Fracture-Fluid Flowback in Marcellus Using Data-Mining Technologies
14 pages
A Review: An Improved K-Means Clustering Technique in WSN: Navjot Kaur Jassi, Sandeep Singh Wraich
No ratings yet
A Review: An Improved K-Means Clustering Technique in WSN: Navjot Kaur Jassi, Sandeep Singh Wraich
5 pages
ML Manual AIDS
No ratings yet
ML Manual AIDS
44 pages
Cluster MCQ
No ratings yet
Cluster MCQ
12 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
10 K Means Clustering PDF
No ratings yet
10 K Means Clustering PDF
5 pages
Babu G. Computational Imaging and Analytics in Biomedical Engineering... 2024
No ratings yet
Babu G. Computational Imaging and Analytics in Biomedical Engineering... 2024
356 pages
Gene Clustering Review Paper
No ratings yet
Gene Clustering Review Paper
4 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
Clustering Theory Applications and Algorithms
No ratings yet
Clustering Theory Applications and Algorithms
9 pages
AI.5-Machine Learning (21-26)
No ratings yet
AI.5-Machine Learning (21-26)
196 pages
Content Server
No ratings yet
Content Server
15 pages
Case Study-1: Department of Computer Science and Engineering (7 Semester)
No ratings yet
Case Study-1: Department of Computer Science and Engineering (7 Semester)
16 pages
Step by Step K Means Example
No ratings yet
Step by Step K Means Example
2 pages
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
No ratings yet
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
20 pages
Datamining Bits
No ratings yet
Datamining Bits
16 pages
Worksheet Clustering
No ratings yet
Worksheet Clustering
31 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
6 pages
Loan Approval Prediction System
No ratings yet
Loan Approval Prediction System
11 pages
Midterm Exam - Attempt Review
No ratings yet
Midterm Exam - Attempt Review
17 pages
Kmean Clustering
No ratings yet
Kmean Clustering
10 pages
Tiny Robotics Dataset and Benchmark For Continual
No ratings yet
Tiny Robotics Dataset and Benchmark For Continual
7 pages
Algorithm To Deduce Parameter From Data
No ratings yet
Algorithm To Deduce Parameter From Data
4 pages
Mastering Objectoriented Python
From Everand
Mastering Objectoriented Python
Steven F. Lott
5/5 (2)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet