ML Project Assigment
ML Project Assigment
(KMBA352)
PROJECT FILE
Of
By
Vinay Kumar
2301921570138
Ms. Rekha
1
S.No PROGRAMME PAGE DATE SIGN
. NO.
Write a Python program to load the
iris data from a given CSV file into
1 4 24-09-2024
a data frame and print the shape of
the data, the type of the data and
the first 3 rows.
2
Write a program to demonstrate the
working of the decision tree using
any suitable algorithm. Use an
8 appropriate data set for building 16-17 18-10-2024
the decision tree and apply this
knowledge to classify a new sample.
3
Q1. Write a Python program to load the iris data from a
given CSV file into a data frame and print the shape of
the data, the type of the data and the first 3 rows.
(150, 5)
<class 'pandas.core.frame.DataFrame'>
sepal.length sepal.width petal.length petal.width variety
0 5.1 3.5 1.4 0.2 Setosa
1 4.9 3.0 1.4 0.2 Setosa
2 4.7 3.2 1.3 0.2 Setosa
4
Q2. Write a Python program using Scikit-learn to print
the keys, number of rows columns, feature names and
the description of the Iris data
5
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
:Summary Statistics:
6
Q3. Write a Python program to split the iris dataset into
its attributes (X) and labels (y). The X variable contains
the first four columns (i.e. attributes) and y contains the
labels of the dataset
7
Q4. Write a Python program to draw a scatterplot, then
add a joint density estimate to describe individual
distributions on the same plot between Sepal length and
Sepal width.
8
9
Q5. Write a Python program using Scikit-learn to split
the iris dataset into 70% train data and 30% test data.
Out of total 150 records, the training set will contain
120 records and the test set contains 30 of those
records. Print both datasets.
10
Training Set Shape: (105, 5)
Test Set Shape: (45, 5)
Training Set:
sepal_lengt sepal_widt petal_lengt petal_widt species
81 h h h h versicolo
16 5.4 3.9 1.3 0.4 setosa
10 5.5
5.4 2.4
3.7 3.7
1.5 1.0
0.2 r
setosa
133 6.3 2.8 5.1 1.5 virginica
137 6.4 3.1 5.5 1.8 virginica
75 6.6 3.0 4.4 1.4 versicolo
r
109 7.2 3.6 6.1 2.5 virginica
.. ... ... ... ... ...
71 6.1 2.8 4.0 1.3 versicolo
r
106 4.9 2.5 4.5 1.7 virginica
14 5.8 4.0 1.2 0.2 setosa
92 5.8 2.6 4.0 1.2 versicolo
r
102 7.1 3.0 5.9 2.1 virginica
def find_most_specific_hypothesis(training_data):
# Check if there are positive examples
positive_examples = training_data[training_data['label'] == 'Y']
if positive_examples.empty:
print("No positive examples in the training data. Setting the
hypothesis to
# Set the hypothesis to a default value, such as all '?'
return ['?'] * (len(training_data.columns) - 1)
return hypothesis
12
Training Data:
sepal_lengt sepal_widt petal_lengt petal_widt label
h h h h
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-
virginica
146 6.3 2.5 5.0 1.9 Iris-
virginica
147 6.5 3.0 5.2 2.0 Iris-
virginica
148 6.2 3.4 5.4 2.3 Iris-
virginica
149 5.9 3.0 5.1 1.8 Iris-
virginica
13
Q7. For a given set of training data examples stored in a
.CSV file, implement and demonstrate the Candidate-
Elimination algorithm to output a description of the set
of all hypotheses consistent with the training examples
In [15]:
def initialize_hypothesis(data):
return hypothesis
def candidate_elimination(training_data):
hypothesis_space = initialize_hypothesis(training_data)
hypothesis_space[0][i] = instance[i]
elif hypothesis_space[0][i] != instance[i]:
hypothesis_space[1][i] = instance[i]
if instance[i] != hypothesis_space[1][i]:
return hypothesis_space
14
iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/ iris.columns = ['sepal_length', 'sepal_width',
'petal_length', 'petal_width', 'lab
print("Training
Data:") print(iris)
hypotheses = candidate_elimination(iris)
Training Data:
sepal_lengt sepal_widt petal_lengt petal_widt label
h h h h
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-
virginica
146 6.3 2.5 5.0 1.9 Iris-
virginica
147 6.5 3.0 5.2 2.0 Iris-
virginica
148 6.2 3.4 5.4 2.3 Iris-
virginica
149 5.9 3.0 5.1 1.8 Iris-
virginica
15
Q8. Write a program to demonstrate the working of the
decision tree using any suitable algorithm. Use an
appropriate data set for building the decision tree and
apply this knowledge to classify a new sample
In [16]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
clf = DecisionTreeClassifier(random_state=42)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
16
predicted_class = clf.predict(new_sample)
class_name =
iris.target_names[predicted_class][0]
17
Q9. Build an Artificial Neural Network by implementing the
Backpropagation algorithm
class NeuralNetwork:
def init (self, input_size, hidden_size, output_size):
# Initialize weights and biases
self.weights_input_hidden = np.random.rand(input_size,
hidden_size) self.bias_hidden = np.zeros((1, hidden_size))
self.weights_hidden_output = np.random.rand(hidden_size,
output_size) self.bias_output = np.zeros((1, output_size))
# Output layer
output_delta = error *
self.sigmoid_derivative(self.predicted_output) hidden_error =
output_delta.dot(self.weights_hidden_output.T)
# Hidden layer
hidden_delta = hidden_error *
self.sigmoid_derivative(self.hidden_layer_out
self.weights_input_hidden 18
+= inputs.T.dot(hidden_delta) *
target_data =
np.array([targets[i]])
self.forward(input_data)
self.backward(input_data, target_data, learning_rate)
return self.forward(inputs)
prediction = nn.predict(np.array([inputs[i]]))
print(f"Input: {inputs[i]}, Predicted Output:
Input: [0 0], Predicte Output [[0.05346176]]
d :
Input: [0 1], Predicte Output [[0.95140656]]
d :
Input: [1 0], Predicte Output [[0.95124283]]
d :
Input: [1 1], Predicte Output [[0.05207599]]
d :
19
Q10. Write a program to implement the naïve Bayesian
classifier for a sample training data set stored as a .
CSV file
In [18]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
nb_classifier = GaussianNB()
nb_classifier.fit(X_train,
y_train)
y_pred = nb_classifier.predict(X_test)
20
print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\
n{confusion_mat}')
print(f'Classification Report:\n{classification_rep}')
Accuracy: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
21
Q11. Write a program to construct a Bayesian network
considering medical data. Use this model to
demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.
In [1]:
22
age Gender Family diet Lifestyle cholestrol
heartdisease 0 0 0 1 1 3
0 1
1 0 1 1 1 3 0 1
2 1 0 0 0 2 1 1
3 4 0 1 1 3 2 0
4 3 1 1 0 0 2 0
5 2 0 1 1 1 0 1
6 4 0 1 0 2 0 1
7 0 0 1 1 3 0 1
8 3 1 1 0 0 2 0
9 1 1 0 0 0 2 1
10 4 1 0 1 2 0 1
11 4 0 1 1 3 2 0
12 2 1 0 0 0 0 0
13 2 0 1 1 1 0 1
14 3 1 1 0 0 1 0
15 0 0 1 0 0 2 1
16 1 1 0 1 2 1 1
17 3 1 1 1 0 1 0
18 4 0 1 1 3 2 0
For Age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3,
Teen:4 For Gender enter Male:0, Female:1
For Family History enter Yes:1,
No:0 For Diet enter High:0,
Medium:1
for LifeStyle enter Athlete:0, Active:1, Moderate:2,
Sedentary:3 for Cholesterol enter High:0, BorderLine:1,
Normal:2
Enter Age: 0
Enter Gender: 0
Enter Family History: 0
Enter Diet: 0
Enter Lifestyle: 3
Enter Cholestrol: 0
+ + +
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.5000 |
+ + +
| heartdisease(1) | 0.5000 |
+ + +
Finding Elimination Order: : : 0it [00:00, ?
it/s] 0it [00:00, ?it/s]
23
Q12. Apply any suitable algorithm to cluster a set of
data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results
of these two algorithms and comment on the quality of
clustering.
In [23]:
24
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
scaled_data = scaler.fit_transform(data)
hierarchical = AgglomerativeClustering(n_clusters=3)
hierarchical_labels = hierarchical.fit_predict(scaled_data)
data['KMeans_Cluster'] = kmeans_labels
data['Hierarchical_Cluster'] =
hierarchical_labels
plt.figure(figsize=(12, 6))
C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1412:
Future Warning: The default value of `n_init` will change from 10 to
'auto' in 1.4. Set t he value of `n_init` explicitly to suppress the
warning
super()._check_params_vs_input(X, default_n_init=10) C:\ProgramData\
anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1436:
UserWa rning: KMeans is known to have a memory leak on Windows with MKL,
when there are l ess chunks than available threads. You can avoid it by
setting the environment var iable OMP_NUM_THREADS=1.
warnings.warn(
25
Silhouette Score - K-Means:
0.45994823920518635 Silhouette Score -
Hierarchical: 0.446689041028591
26
Q13. Write a program to implement k-Nearest Neighbor
algorithm to classify the iris data set
In [29]:
X_train_scaled = scaler.fit_transform(X_train)
knn_classifier = KNeighborsClassifier(n_neighbors=k_value)
knn_classifier.fit(X_train_scaled, y_train)
y_pred = knn_classifier.predict(X_test_scaled)
27
accuracy = accuracy_score(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(f'Confusion Matrix:\
n{confusion_mat}')
print(f'Classification Report:\n{classification_rep}')
Accuracy: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
28
Q14. Implement the non-parametric Regression
algorithm in order to fit data points. Select appropriate
data set for your experiment and draw graphs
29
30
Q15. Write a Python program to get the accuracy of the
Logistic Regression.
In [21]:
simplefilter("ignore", category=ConvergenceWarning)
X_scaled = scaler.fit_transform(X)
logistic_regression_model = LogisticRegression(max_iter=1000)
31
logistic_regression_model.fit(X_train, y_train)
y_pred = logistic_regression_model.predict(X_test)
print(f'Accuracy: {accuracy}')
Accuracy: 1.0
32