0% found this document useful (0 votes)
3 views12 pages

ML File - 1

The document provides a comprehensive introduction to various supervised and unsupervised learning techniques using the Scikit-learn library in Python. It covers exercises on K-Nearest Neighbors, K-Means Clustering, Linear Regression, Logistic Regression, and Decision Trees, detailing their theoretical foundations, code implementations, and practical applications. Each exercise aims to enhance understanding of machine learning concepts and their real-world applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views12 pages

ML File - 1

The document provides a comprehensive introduction to various supervised and unsupervised learning techniques using the Scikit-learn library in Python. It covers exercises on K-Nearest Neighbors, K-Means Clustering, Linear Regression, Logistic Regression, and Decision Trees, detailing their theoretical foundations, code implementations, and practical applications. Each exercise aims to enhance understanding of machine learning concepts and their real-world applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Exercise 1: Introduction to Scikit-learn for Supervised Learning

Objective: To gain hands-on experience with implementing supervised learning algorithms


using the Scikit-learn library in Python.

Solution:

Theory:

Supervised Learning is a type of machine learning where the algorithm is trained on labeled
data, i.e., input features (X) and corresponding output labels (y). The goal is to learn a mapping
function that can predict the output for new, unseen data.

Scikit-learn (sklearn) is a Python library that provides simple and efficient tools for data
mining and machine learning. It includes several classification, regression, and clustering
algorithms.

In this exercise, we use the K-Nearest Neighbors (KNN) classifier on the Iris dataset, a
popular dataset containing measurements of three types of iris flowers.
Key Features of Scikit-learn:

• Supervised learning: Classification and regression algorithms like:

o Decision Tree

o Random Forest

o K-Nearest Neighbors (KNN)

o Support Vector Machine (SVM)


o Logistic Regression
• Unsupervised learning: Clustering and dimensionality reduction algorithms like:

o K-Means

o PCA (Principal Component Analysis)

• Model selection: Tools for cross-validation, hyperparameter tuning (GridSearchCV,


RandomizedSearchCV)

• Preprocessing: Functions for:

o Data scaling (e.g., StandardScaler)

o Handling missing values

o Encoding categorical variables


• Datasets: Includes built-in sample datasets like:
o Iris

1|Page
o Digits

o Boston Housing (deprecated)

o Wine, etc.

Why Use Scikit-learn?


• Easy to learn and use

• Well-documented

• Integrates well with other libraries like NumPy, Pandas, and Matplotlib

• Widely used in education, research, and industry

---------------------------------------------Page End-----------------------------------------------------

Exercise 2: Exploring Unsupervised Learning with K-Means Clustering

Objective: To explore the concepts of unsupervised learning and clustering using the K-Means
algorithm.

Solution:

Objective:
To explore the concepts of unsupervised learning by applying the K-Means clustering
algorithm using the Scikit-learn library. This exercise helps understand how to group similar
data points when no labels are provided.

Theory:

What is Unsupervised Learning?


Unsupervised learning is a type of machine learning where the model is not given any labeled
output data. The goal is to discover hidden patterns or groupings in the data.
What is K-Means Clustering?

K-Means is a popular unsupervised learning algorithm used for clustering data into K groups
(clusters) based on feature similarity. It works by:
1. Choosing the number of clusters (K).

2. Randomly selecting initial cluster centroids.

3. Assigning data points to the nearest centroid.

4. Updating centroids based on the mean of points in each cluster.

5. Repeating steps 3 and 4 until convergence.

Code Implementation:
import pandas as pd

2|Page
import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

df = pd.read_csv('/content/income.csv')
df.info()

df.describe()

# features scalling

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

df['Age']=scaler.fit_transform(df[['Age']])

df['Income($)']=scaler.fit_transform(df[['Income($)']])
print(df)
# plot data point

plt.figure(figsize=(10,4))

plt.scatter(df['Age'],df['Income($)'],s=100)

plt.xlabel('Age')

plt.ylabel('Income')

plt.title('Customer Data')

plt.show()
# Assuming 'df' is your DataFrame containing the 'Age' and 'Income($)' columns

X = df[['Age', 'Income($)']] # Select the features for clustering

km = KMeans(n_clusters=5)

ypred = km.fit_predict(X) # Pass the feature data to fit_predict

ypred

df['cluster'] = ypred
print(df)

## to get the centroid of the cluster


centroid=km.cluster_centers_

3|Page
centroid

df1=df[df['cluster']==0]

df1

df2=df[df['cluster']==1]
df3=df[df['cluster']==2]

plt.scatter(df1['Age'],df1['Income($)'],color='green',label='cluster1',s=150)

plt.scatter(df2['Age'],df2['Income($)'],color='red',label='cluster2',s=150)

plt.scatter(df3['Age'],df3['Income($)'],color='blue',label='cluster3',s=150)

# to draw the centroid

plt.scatter(centroid[:,0],centroid[:,1],s=200,marker="*",color='purple',label='centroid'
)

plt.show()

Output:

Fig no.1

4|Page
Fig no.2

---------------------------------------------------Page End-----------------------------------------------

Exercise 3: Implementing Linear Regression from Scratch.

Objective: To gain a deeper understanding of linear regression by implementing it from scratch


using Python.

Solution:
Theory:

What is Linear Regression?

Linear regression is a supervised learning algorithm used for predicting continuous values. It
assumes a linear relationship between the input feature x and the output y, modeled by the
equation:

y=mx+cy = mx + cy=mx+c

Where:
• m is the slope (also called weight or coefficient),

• c is the intercept (bias),

• x is the independent variable,

• y is the dependent variable (target).


Code Implementation:

import numpy as np
import matplotlib.pyplot as mtp

5|Page
import pandas as pd

data_set= pd.read_csv("Salary_Data.csv")

x=data_set.iloc[:,:-1].values

y=data_set.iloc[:,1].values
print(x)

print(y)

# splitting the data set into training and testing

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test= train_test_split(x,y,test_size=1/3, random_state=0)

# fitting the simple linear regression model to the training dataset

from sklearn.linear_model import LinearRegression


regressor= LinearRegression() # regressor is just a variable we can replace it to any other
variable (such as a,x,etc)

regressor.fit(x_train,y_train)
# prediction of test and training set result

y_pred= regressor.predict(x_test)

print(y_pred)

x_pred= regressor.predict(x_train)

print(x_pred)

# visualizing the training set results

mtp.scatter(x_train,y_train,color="green")
mtp.plot(x_train,x_pred,color="red")

mtp.title("Salary vs Experience (Training set)")

mtp.xlabel("Years of Experience")

mtp.ylabel("Salary(In Rupees)")

mtp.show()

# visualizing the test set result

mtp.scatter(x_test,y_test,color="blue")
mtp.plot(x_train,x_pred,color="red")

6|Page
mtp.title("Salary vs Experience (Test Dataset)")

mtp.xlabel("Years of Experience")

mtp.ylabel("Salary(In Rupees)")

mtp.show()
Output:

Fig no.1

Fig no.2
------------------------------------------------------------

7|Page
Exercise 4: Binary Classification with Logistic Regression.

Objective: To implement logistic regression for binary classification tasks and understand its
application in real-world scenarios.

Solution:

Theory:

What is Binary Classification?

Binary classification is a supervised learning task where the output variable has only two
possible classes, e.g., yes/no, 0/1, true/false.

What is Logistic Regression?

Logistic Regression is a statistical model used for classification tasks. It estimates the
probability that a given input point belongs to a certain class using the sigmoid (logistic)
function:

σ(z)=11+e−z, where z=w⋅x+b\sigma(z) = \frac{1}{1 + e^{-z}}, \text{ where } z = w \cdot x +


bσ(z)=1+e−z1, where z=w⋅x+b

The output is a probability between 0 and 1, and a threshold (usually 0.5) is used to assign class
labels.

Real-world Applications:

• Email spam detection (Spam or Not Spam)


• Disease diagnosis (Positive or Negative)

• Credit risk assessment (Default or Not)

Code Implementation:

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, classification_report

from sklearn.preprocessing import LabelEncoder


# load the titanic data

data = pd.read_csv("/content/Titanic-Dataset.csv")

print(data)

# Select features and target


features =['Pclass','Sex','Age','SibSp','Parch','Fare',]

8|Page
data = data[features + ['Survived']]

# Handle missing values

data['Age'].fillna(data['Age'].median(), inplace=True)

# Convert categorical column 'sex' to numeric


le = LabelEncoder()

data['Sex'] = le.fit_transform(data['Sex']) # male =1, female = 0

# split features and target

X = data[features]

Y = data['Survived']

# Train-test split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)


# Train logistic regression
model = LogisticRegression(max_iter=200)

model.fit(X_train,Y_train)

# Predictions

Y_pred = model.predict(X_test)

# Evaluation

print("Accuracy:", accuracy_score(Y_test, Y_pred))

print("\nClassification Report:\n", classification_report(Y_test, Y_pred))


from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,
roc_curve, auc
import matplotlib.pyplot as plt

import seaborn as sns

cm= confusion_matrix(Y_test, Y_pred)

plt.figure(figsize=(6, 5))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Did Not Survive',


'Survived'], yticklabels=['Did Not Survived', 'Survived'])

plt.xlabel('Predicted')
plt.ylabel('Actual')

9|Page
plt.title('Confusion Matrix')

plt.show()

Output:

Fig no.1

Exercise 5: Decision Tree Classifier for Multiclass Classification

Objective: To understand the working of decision tree classifiers and their application in
multiclass classification problems.

Solution:
Theory:

What is a Decision Tree Classifier?

A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It works by splitting the dataset into branches based on feature values,
forming a tree structure. Each node represents a decision based on a feature, and each leaf node
represents a class label.

What is Multiclass Classification?

Multiclass classification involves classifying inputs into more than two categories, unlike
binary classification. For example:

• Classifying flowers as Setosa, Versicolor, or Virginica


• Digit recognition (0–9)

10 | P a g e
Use Case:

Classify iris flowers into three species using Decision Tree.

Code Implementation:

from sklearn.datasets import load_iris

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report

# Load dataset

iris = load_iris()
X = iris.data
y = iris.target

# Split into training and testing data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Decision Tree classifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Make predictions

y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred,


target_names=iris.target_names))

Output:

11 | P a g e
Fig no.1

12 | P a g e

You might also like