ML File - 1
ML File - 1
Solution:
Theory:
Supervised Learning is a type of machine learning where the algorithm is trained on labeled
data, i.e., input features (X) and corresponding output labels (y). The goal is to learn a mapping
function that can predict the output for new, unseen data.
Scikit-learn (sklearn) is a Python library that provides simple and efficient tools for data
mining and machine learning. It includes several classification, regression, and clustering
algorithms.
In this exercise, we use the K-Nearest Neighbors (KNN) classifier on the Iris dataset, a
popular dataset containing measurements of three types of iris flowers.
Key Features of Scikit-learn:
o Decision Tree
o Random Forest
o K-Means
1|Page
o Digits
o Wine, etc.
• Well-documented
• Integrates well with other libraries like NumPy, Pandas, and Matplotlib
---------------------------------------------Page End-----------------------------------------------------
Objective: To explore the concepts of unsupervised learning and clustering using the K-Means
algorithm.
Solution:
Objective:
To explore the concepts of unsupervised learning by applying the K-Means clustering
algorithm using the Scikit-learn library. This exercise helps understand how to group similar
data points when no labels are provided.
Theory:
K-Means is a popular unsupervised learning algorithm used for clustering data into K groups
(clusters) based on feature similarity. It works by:
1. Choosing the number of clusters (K).
Code Implementation:
import pandas as pd
2|Page
import numpy as np
df = pd.read_csv('/content/income.csv')
df.info()
df.describe()
# features scalling
scaler = StandardScaler()
df['Age']=scaler.fit_transform(df[['Age']])
df['Income($)']=scaler.fit_transform(df[['Income($)']])
print(df)
# plot data point
plt.figure(figsize=(10,4))
plt.scatter(df['Age'],df['Income($)'],s=100)
plt.xlabel('Age')
plt.ylabel('Income')
plt.title('Customer Data')
plt.show()
# Assuming 'df' is your DataFrame containing the 'Age' and 'Income($)' columns
km = KMeans(n_clusters=5)
ypred
df['cluster'] = ypred
print(df)
3|Page
centroid
df1=df[df['cluster']==0]
df1
df2=df[df['cluster']==1]
df3=df[df['cluster']==2]
plt.scatter(df1['Age'],df1['Income($)'],color='green',label='cluster1',s=150)
plt.scatter(df2['Age'],df2['Income($)'],color='red',label='cluster2',s=150)
plt.scatter(df3['Age'],df3['Income($)'],color='blue',label='cluster3',s=150)
plt.scatter(centroid[:,0],centroid[:,1],s=200,marker="*",color='purple',label='centroid'
)
plt.show()
Output:
Fig no.1
4|Page
Fig no.2
---------------------------------------------------Page End-----------------------------------------------
Solution:
Theory:
Linear regression is a supervised learning algorithm used for predicting continuous values. It
assumes a linear relationship between the input feature x and the output y, modeled by the
equation:
y=mx+cy = mx + cy=mx+c
Where:
• m is the slope (also called weight or coefficient),
import numpy as np
import matplotlib.pyplot as mtp
5|Page
import pandas as pd
data_set= pd.read_csv("Salary_Data.csv")
x=data_set.iloc[:,:-1].values
y=data_set.iloc[:,1].values
print(x)
print(y)
regressor.fit(x_train,y_train)
# prediction of test and training set result
y_pred= regressor.predict(x_test)
print(y_pred)
x_pred= regressor.predict(x_train)
print(x_pred)
mtp.scatter(x_train,y_train,color="green")
mtp.plot(x_train,x_pred,color="red")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
mtp.scatter(x_test,y_test,color="blue")
mtp.plot(x_train,x_pred,color="red")
6|Page
mtp.title("Salary vs Experience (Test Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
Output:
Fig no.1
Fig no.2
------------------------------------------------------------
7|Page
Exercise 4: Binary Classification with Logistic Regression.
Objective: To implement logistic regression for binary classification tasks and understand its
application in real-world scenarios.
Solution:
Theory:
Binary classification is a supervised learning task where the output variable has only two
possible classes, e.g., yes/no, 0/1, true/false.
Logistic Regression is a statistical model used for classification tasks. It estimates the
probability that a given input point belongs to a certain class using the sigmoid (logistic)
function:
The output is a probability between 0 and 1, and a threshold (usually 0.5) is used to assign class
labels.
Real-world Applications:
Code Implementation:
import pandas as pd
data = pd.read_csv("/content/Titanic-Dataset.csv")
print(data)
8|Page
data = data[features + ['Survived']]
data['Age'].fillna(data['Age'].median(), inplace=True)
X = data[features]
Y = data['Survived']
# Train-test split
model.fit(X_train,Y_train)
# Predictions
Y_pred = model.predict(X_test)
# Evaluation
plt.figure(figsize=(6, 5))
plt.xlabel('Predicted')
plt.ylabel('Actual')
9|Page
plt.title('Confusion Matrix')
plt.show()
Output:
Fig no.1
Objective: To understand the working of decision tree classifiers and their application in
multiclass classification problems.
Solution:
Theory:
A Decision Tree is a supervised machine learning algorithm used for both classification and
regression tasks. It works by splitting the dataset into branches based on feature values,
forming a tree structure. Each node represents a decision based on a feature, and each leaf node
represents a class label.
Multiclass classification involves classifying inputs into more than two categories, unlike
binary classification. For example:
10 | P a g e
Use Case:
Code Implementation:
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
Output:
11 | P a g e
Fig no.1
12 | P a g e