0% found this document useful (0 votes)
16 views17 pages

ML N PY Programs

Uploaded by

ps20601713
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views17 pages

ML N PY Programs

Uploaded by

ps20601713
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Ankur Madan 2201921570019

Ankur Madan 2201921570019

Q1 Write a Python program to load the iris data from a given


csv file into a data frame and print the shape of the data, type
of the data and first 3 rows.

In [7]: import pandas as pd

# Load iris data into dataframe


df = pd.read_csv('iris.csv')

# Print shape, type and first 3 rows


print(df.shape)
print(type(df))
print(df.head(3))

(150, 5)
<class 'pandas.core.frame.DataFrame'>
sepal.length sepal.width petal.length petal.width variety
0 5.1 3.5 1.4 0.2 Setosa
1 4.9 3.0 1.4 0.2 Setosa
2 4.7 3.2 1.3 0.2 Setosa

Q2. Write a Python program using Scikit-learn to print the


keys, number of rowscolumns, feature names and the
description of the Iris data
In [8]: from sklearn.datasets import load_iris

iris = load_iris()

# Print keys of iris dict


print(iris.keys())

# Print number of rows and columns


print(iris.data.shape)

# Print feature names


print(iris.feature_names)

# Print data description


print(iris.DESCR)

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR',


'feature_names', 'f ilename', 'data_module'])
(150, 4)
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
.. _iris_dataset:

Iris plants dataset

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)


:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
Ankur Madan 2201921570019
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica

:Summary Statistics:

============== ==== ==== ======= ===== ====================


Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ====================

Q3. Write a Python program to split the iris dataset into its
attributes (X) and labels (y). The X variable contains the first
four columns (i.e. attributes) and y contains the labels of the
dataset

In [9]: from sklearn.datasets import load_iris

iris = load_iris()

# Get the Iris data attributes


X = iris.data

# Get the Iris labels (0, 1, 2)


y = iris.target

print("Attributes shape: ", X.shape)


print("Labels shape:", y.shape)

Attributes shape: (150, 4)


Labels shape: (150,)

Q4. Write a Python program to draw a scatterplot, then add a


joint density estimate to describe individual distributions on
the same plot between Sepal length and Sepal width.
In [10]: import seaborn as sns
import matplotlib.pyplot as plt

# Load the Iris dataset from seaborn


iris = sns.load_dataset('iris')

# Create a scatterplot with a joint density estimate


sns.scatterplot(x='sepal_length', y='sepal_width', data=iris, color='blue')
sns.kdeplot(x='sepal_length', y='sepal_width', data=iris, color='red', fill=True,

# Show the plot


plt.title('Scatterplot with Joint Density Estimate')
plt.show()
Ankur Madan 2201921570019

Q5. Write a Python program using Scikit-learn to split the iris


dataset into 70% train data and 30% test data. Out of total 150
records, the training set will contain 120 records and the test
set contains 30 of those records. Print both datasets.
In [11]: from sklearn.model_selection import train_test_split
import seaborn as sns

# Load the Iris dataset from seaborn


iris = sns.load_dataset('iris')

# Split the dataset into 70% train and 30% test


train_data, test_data = train_test_split(iris, test_size=0.3, random_state=42)

# Display the shapes of the training and test sets


print(f"Training Set Shape: {train_data.shape}")
print(f"Test Set Shape: {test_data.shape}")

# Display the training set


print("\nTraining Set:")
print(train_data)

# Display the test set


print("\nTest Set:")
print(test_data)
Ankur Madan 2201921570019
Training Set Shape: (105, 5)
Test Set Shape: (45, 5)

Training Set:
sepal_length sepal_width petal_length petal_width species
81 5.5 2.4 3.7 1.0 versicolor
133 6.3 2.8 5.1 1.5 virginica
137 6.4 3.1 5.5 1.8 virginica
75 6.6 3.0 4.4 1.4 versicolor
109 7.2 3.6 6.1 2.5 virginica
.. ... ... ... ... ...
71 6.1 2.8 4.0 1.3 versicolor
106 4.9 2.5 4.5 1.7 virginica
14 5.8 4.0 1.2 0.2 setosa
92 5.8 2.6 4.0 1.2 versicolor
102 7.1 3.0 5.9 2.1 virginica

[105 rows x 5 columns]

Test Set:
sepal_length sepal_width petal_length petal_width species
73 6.1 2.8 4.7 1.2 versicolor
18 5.7 3.8 1.7 0.3 setosa
118 7.7 2.6 6.9 2.3 virginica
78 6.0 2.9 4.5 1.5 versicolor
76 6.8 2.8 4.8 1.4 versicolor
31 5.4 3.4 1.5 0.4 setosa
64 5.6 2.9 3.6 1.3 versicolor
141 6.9 3.1 5.1 2.3 virginica
68 6.2 2.2 4.5 1.5 versicolor
82 5.8 2.7 3.9 1.2 versicolor
110 6.5 3.2 5.1 2.0 virginica
12 4.8 3.0 1.4 0.1 setosa
36 5.5 3.5 1.3 0.2 setosa
9 4.9 3.1 1.5 0.1 setosa
19 5.1 3.8 1.5 0.3 setosa
56 6.3 3.3 4.7 1.6 versicolor
104 6.5 3.0 5.8 2.2 virginica
69 5.6 2.5 3.9 1.1 versicolor
55 5.7 2.8 4.5 1.3 versicolor
132 6.4 2.8 5.6 2.2 virginica
29 4.7 3.2 1.6 0.2 setosa
127 6.1 3.0 4.9 1.8 virginica
26 5.0 3.4 1.6 0.4 setosa
128 6.4 2.8 5.6 2.1 virginica
131 7.9 3.8 6.4 2.0 virginica
145 6.7 3.0 5.2 2.3 virginica
108 6.7 2.5 5.8 1.8 virginica
143 6.8 3.2 5.9 2.3 virginica
45 4.8 3.0 1.4 0.3 setosa
30 4.8 3.1 1.6 0.2 setosa
22 4.6 3.6 1.0 0.2 setosa
15 5.7 4.4 1.5 0.4 setosa
65 6.7 3.1 4.4 1.4 versicolor
11 4.8 3.4 1.6 0.2 setosa
42 4.4 3.2 1.3 0.2 setosa
146 6.3 2.5 5.0 1.9 virginica
51 6.4 3.2 4.5 1.5 versicolor
27 5.2 3.5 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
32 5.2 4.1 1.5 0.1 setosa
142 5.8 2.7 5.1 1.9 virginica
85 6.0 3.4 4.5 1.6 versicolor
86 6.7 3.1 4.7 1.5 versicolor
Ankur Madan 2201921570019
16 5.4 3.9 1.3 0.4 setosa
10 5.4 3.7 1.5 0.2 setosa

Q6. Implement and demonstrate the any suitable algorithm


for finding the most specific hypothesis based on a given set
of training data samples. Read the training data from a .CSV
file.
In [14]: import pandas as pd

def find_most_specific_hypothesis(training_data):
# Check if there are positive examples
positive_examples = training_data[training_data['label'] == 'Y']

if positive_examples.empty:
print("No positive examples in the training data. Setting the hypothesis to
# Set the hypothesis to a default value, such as all '?'
return ['?'] * (len(training_data.columns) - 1)

# Initialize the hypothesis with the first positive example


hypothesis = positive_examples.iloc[0, :-1].tolist()

# Refine the hypothesis based on positive examples


for index, row in training_data.iterrows():
if row['label'] == 'Y':
for i in range(len(hypothesis)):
if hypothesis[i] != row[i]:
hypothesis[i] = '?'

return hypothesis

# Use the Iris dataset as an example


iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/
iris.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'lab

# Display the training data


print("Training Data:")
print(iris)

# Apply the algorithm to find the most specific hypothesis


specific_hypothesis = find_most_specific_hypothesis(iris)

# Display the most specific hypothesis


print("\nMost Specific Hypothesis:")
print(specific_hypothesis)
Ankur Madan 2201921570019
Training Data:
sepal_length sepal_width petal_length petal_width label
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

[150 rows x 5 columns]


No positive examples in the training data. Setting the hypothesis to a default val
ue.

Most Specific Hypothesis:


['?', '?', '?', '?']

Q7. For a given set of training data examples stored in a .CSV


file, implement and demonstrate the Candidate-Elimination
algorithm to output a description of the set of all hypotheses
consistent with the training examples
In [15]:

def initialize_hypothesis(data):

return hypothesis

def is_consistent(instance, hypothesis):

def candidate_elimination(training_data):
hypothesis_space = initialize_hypothesis(training_data)

hypothesis_space[0][i] = instance[i]
elif hypothesis_space[0][i] != instance[i]:

hypothesis_space[1][i] = instance[i]

if instance[i] != hypothesis_space[1][i]:

return hypothesis_space
Ankur Madan 2201921570019

iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/
iris.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'lab

print("Training Data:")
print(iris)

hypotheses = candidate_elimination(iris)

Training Data:
sepal_length sepal_width petal_length petal_width label
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

[150 rows x 5 columns]

Set of Hypotheses Consistent with Training Examples:


['0', '0', '0', '0']
['?', '?', '?', '?']

Q8. Write a program to demonstrate the working of the


decision tree using any suitable algorithm. Use an appropriate
data set for building the decision tree and apply this
knowledge to classify a new sample
In [16]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

clf = DecisionTreeClassifier(random_state=42)

y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
Ankur Madan 2201921570019

predicted_class = clf.predict(new_sample)
class_name = iris.target_names[predicted_class][0]

print(f"\nNew Sample Classification:")


print(f"Predicted Class: {class_name}")

Accuracy on Test Set: 100.00%

New Sample Classification:


Predicted Class: setosa

Q9. Build an Artificial Neural Network by implementing the


Backpropagation algorithm
In [17]: import numpy as np

class NeuralNetwork:
def init (self, input_size, hidden_size, output_size):
# Initialize weights and biases
self.weights_input_hidden = np.random.rand(input_size, hidden_size)
self.bias_hidden = np.zeros((1, hidden_size))
self.weights_hidden_output = np.random.rand(hidden_size, output_size)
self.bias_output = np.zeros((1, output_size))

def sigmoid(self, x):


return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):


return x * (1 - x)

def forward(self, inputs):


# Forward pass
self.hidden_layer_activation = np.dot(inputs, self.weights_input_hidden) +
self.hidden_layer_output = self.sigmoid(self.hidden_layer_activation)
self.output_activation = np.dot(self.hidden_layer_output, self.weights_hidd
self.predicted_output = self.sigmoid(self.output_activation)
return self.predicted_output

def backward(self, inputs, targets, learning_rate):


# Backward pass
error = targets - self.predicted_output

# Output layer
output_delta = error * self.sigmoid_derivative(self.predicted_output)
hidden_error = output_delta.dot(self.weights_hidden_output.T)

# Update weights and biases


self.weights_hidden_output += self.hidden_layer_output.T.dot(output_delta)
self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning

# Hidden layer
hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_layer_out
self.weights_input_hidden += inputs.T.dot(hidden_delta) * learning_rate
self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning

def train(self, inputs, targets, epochs, learning_rate):


for epoch in range(epochs):
for i in range(len(inputs)):
input_data = np.array([inputs[i]])
target_data = np.array([targets[i]])
self.forward(input_data)
self.backward(input_data, target_data, learning_rate)

return self.forward(inputs)

nn = NeuralNetwork(input_size=2, hidden_size=2, output_size=1)

nn.train(inputs, targets, epochs=10000, learning_rate=0.1)

prediction = nn.predict(np.array([inputs[i]]))
print(f"Input: {inputs[i]}, Predicted Output: {prediction}")

Input: [0 0], Predicted Output: [[0.05346176]]


Input: [0 1], Predicted Output: [[0.95140656]]
Input: [1 0], Predicted Output: [[0.95124283]]
Input: [1 1], Predicted Output: [[0.05207599]]

Q10. Write a program to implement the naïve Bayesian


classifier for a sample training data set stored as a .CSV file
In [18]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

data = pd.DataFrame(data=iris.data, columns=iris.feature_names)

nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

y_pred = nb_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


confusion_mat = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\n{confusion_mat}')
print(f'Classification Report:\n{classification_rep}')

Accuracy: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Q11. Write a program to construct a Bayesian network


considering medical data. Use this model to demonstrate the
diagnosis of heart patients using standard Heart Disease Data
Set.
In [1]:
age Gender Family diet Lifestyle cholestrol heartdisease
0 0 0 1 1 3 0 1
1 0 1 1 1 3 0 1
2 1 0 0 0 2 1 1
3 4 0 1 1 3 2 0
4 3 1 1 0 0 2 0
5 2 0 1 1 1 0 1
6 4 0 1 0 2 0 1
7 0 0 1 1 3 0 1
8 3 1 1 0 0 2 0
9 1 1 0 0 0 2 1
10 4 1 0 1 2 0 1
11 4 0 1 1 3 2 0
12 2 1 0 0 0 0 0
13 2 0 1 1 1 0 1
14 3 1 1 0 0 1 0
15 0 0 1 0 0 2 1
16 1 1 0 1 2 1 1
17 3 1 1 1 0 1 0
18 4 0 1 1 3 2 0
For Age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3, Teen:4
For Gender enter Male:0, Female:1
For Family History enter Yes:1, No:0
For Diet enter High:0, Medium:1
for LifeStyle enter Athlete:0, Active:1, Moderate:2, Sedentary:3
for Cholesterol enter High:0, BorderLine:1, Normal:2
Enter Age: 0
Enter Gender: 0
Enter Family History: 0
Enter Diet: 0
Enter Lifestyle: 3
Enter Cholestrol: 0
+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.5000 |
+-----------------+---------------------+
| heartdisease(1) | 0.5000 |
+-----------------+---------------------+
Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]

Q12. Apply any suitable algorithm to cluster a set of data


stored in a .CSV file. Use the same data set for clustering using
k-Means algorithm. Compare the results of these two
algorithms and comment on the quality of clustering.
In [23]:

from sklearn.cluster import KMeans, AgglomerativeClustering


from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)

scaled_data = scaler.fit_transform(data)

kmeans = KMeans(n_clusters=3, random_state=42)


kmeans_labels = kmeans.fit_predict(scaled_data)

hierarchical = AgglomerativeClustering(n_clusters=3)
hierarchical_labels = hierarchical.fit_predict(scaled_data)

data['KMeans_Cluster'] = kmeans_labels
data['Hierarchical_Cluster'] = hierarchical_labels

plt.figure(figsize=(12, 6))

sns.scatterplot(data=data, x='sepal length (cm)', y='sepal width (cm)', hue='KMean


plt.title('K-Means Clustering')

sns.scatterplot(data=data, x='sepal length (cm)', y='sepal width (cm)', hue='Hiera


plt.title('Hierarchical Clustering')

kmeans_silhouette = silhouette_score(scaled_data, kmeans_labels)


hierarchical_silhouette = silhouette_score(scaled_data, hierarchical_labels)

print(f"Silhouette Score - K-Means: {kmeans_silhouette}")


print(f"Silhouette Score - Hierarchical: {hierarchical_silhouette}")

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412: Future
Warning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set t
he value of `n_init` explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436: UserWa
rning: KMeans is known to have a memory leak on Windows with MKL, when there are l
ess chunks than available threads. You can avoid it by setting the environment var
iable OMP_NUM_THREADS=1.
warnings.warn(
Silhouette Score - K-Means: 0.45994823920518635
Silhouette Score - Hierarchical: 0.446689041028591

Q13. Write a program to implement k-Nearest Neighbor


algorithm to classify the iris data set
In [29]:

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

data = pd.DataFrame(data=iris.data, columns=iris.feature_names)

X_train_scaled = scaler.fit_transform(X_train)

knn_classifier = KNeighborsClassifier(n_neighbors=k_value)

knn_classifier.fit(X_train_scaled, y_train)

y_pred = knn_classifier.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\n{confusion_mat}')
print(f'Classification Report:\n{classification_rep}')
Accuracy: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Q14. Implement the non-parametric Regression algorithm in


order to fit data points. Select appropriate data set for your
experiment and draw graphs

In [22]: import numpy as np


import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsRegressor

# Generate synthetic data


np.random.seed(42)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Fit k-NN regression model


k_value = 5
knn_regressor = KNeighborsRegressor(n_neighbors=k_value)
knn_regressor.fit(X, y)

# Predict using the model


X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_pred = knn_regressor.predict(X_test)

# Plot the results


plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X_test, y_pred, color='navy', label='prediction', linewidth=2)
plt.xlabel('data')
plt.ylabel('target')
plt.title('k-NN Regression')
plt.legend()
plt.show()
Q15. Write a Python program to get the accuracy of the
Logistic Regression.
In [21]:

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

from sklearn.exceptions import ConvergenceWarning

simplefilter("ignore", category=ConvergenceWarning)

data = pd.DataFrame(data=iris.data, columns=iris.feature_names)

X_scaled = scaler.fit_transform(X)

logistic_regression_model = LogisticRegression(max_iter=1000)
logistic_regression_model.fit(X_train, y_train)

y_pred = logistic_regression_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

Accuracy: 1.0

You might also like