0% found this document useful (0 votes)
1 views

ml lab programs 2

The document provides Python programs for calculating central tendency measures (mean, median, mode) and measures of dispersion (variance, standard deviation). It also demonstrates the K-Nearest Neighbors (KNN) algorithm for classification and regression, as well as the Decision Tree algorithm with parameter tuning using Grid Search. Additionally, it includes examples of using Decision Trees for regression and Naïve Bayes classification on the Iris dataset.

Uploaded by

lokeshsivarathri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

ml lab programs 2

The document provides Python programs for calculating central tendency measures (mean, median, mode) and measures of dispersion (variance, standard deviation). It also demonstrates the K-Nearest Neighbors (KNN) algorithm for classification and regression, as well as the Decision Tree algorithm with parameter tuning using Grid Search. Additionally, it includes examples of using Decision Trees for regression and Naïve Bayes classification on the Iris dataset.

Uploaded by

lokeshsivarathri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

lOMoARcPSD|51452891

1. Write a python program to compute Central Tendency Measures:


Mean, Median, Mode

PROGRAM CODE :

from collections import Counter

def compute_mean(numbers):
return sum(numbers) / len(numbers)

def compute_median(numbers):
sorted_numbers = sorted(numbers)
n = len(sorted_numbers)
if n % 2 == 0:
mid = n // 2
return (sorted_numbers[mid - 1] + sorted_numbers[mid]) / 2
else:
return sorted_numbers[n // 2]

def compute_mode(numbers):
count = Counter(numbers)
max_count = max(count.values())
mode = [num for num, freq in count.items() if freq == max_count]
return mode if mode else None

if __name__ == "__main__":
# Sample input, you can change this list to test with different data
data = [1, 2, 3, 4, 5, 6, 6, 7, 8, 8, 8]

mean = compute_mean(data)
median = compute_median(data)
mode = compute_mode(data)

print(f"Data: {data}")
print(f"Mean: {mean}")
print(f"Median: {median}")
1
lOMoARcPSD|51452891

print(f"Mode: {mode}")

OUTPUT :

2
lOMoARcPSD|51452891

2. Measure of Dispersion: Variance, Standard Deviation

PROGRAM CODE:

def compute_mean(numbers):
return sum(numbers) / len(numbers)

def compute_variance(numbers):
mean = compute_mean(numbers)
squared_diff = [(x - mean) ** 2 for x in numbers]
variance = sum(squared_diff) / len(numbers)
return variance

def compute_standard_deviation(numbers):
variance = compute_variance(numbers)
standard_deviation = variance ** 0.5
return standard_deviation

if __name__ == "__main__":
# Taking user input for a list of numbers
input_data = input("Enter a list of numbers separated by spaces: ")

try:
# Convert the user input into a list of floats
data = [float(num) for num in input_data.split()]

# Calculate the measures of dispersion


variance = compute_variance(data)
standard_deviation = compute_standard_deviation(data)

print(f"Data: {data}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {standard_deviation}")
except ValueError:
print("Invalid input! Please enter a list of numbers separated by spaces.")

3
lOMoARcPSD|51452891

OUTPUT :

4
Below is an example of applying the K-Nearest Neighbors (KNN) algorithm
for both classification and regression using Python. We'll use the popular
scikit-learn library and some sample datasets to illustrate the concepts.

KNN for Classification and Regression

# Import necessary libraries


import numpy as np
from sklearn.datasets import load_iris, make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier,
KNeighborsRegressor
from sklearn.metrics import accuracy_score, mean_squared_error

# ---------------- KNN for Classification ---------------- #

# Load the Iris dataset for classification


iris = load_iris()
X_classification = iris.data
y_classification = iris.target

# Split the dataset into training and testing sets


X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
X_classification, y_classification, test_size=0.3, random_state=42
)

# Initialize the KNN classifier with k=3


knn_classifier = KNeighborsClassifier(n_neighbors=3)

# Train the model


knn_classifier.fit(X_train_c, y_train_c)

# Predict on the test set


y_pred_c = knn_classifier.predict(X_test_c)

# Calculate accuracy
accuracy = accuracy_score(y_test_c, y_pred_c)
print("Classification Results:")
print(f"Accuracy: {accuracy * 100:.2f}%")

# ---------------- KNN for Regression ---------------- #


# Create a synthetic dataset for regression
X_regression, y_regression = make_regression(n_samples=200,
n_features=1, noise=10, random_state=42)

# Split the dataset into training and testing sets


X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(
X_regression, y_regression, test_size=0.3, random_state=42
)

# Initialize the KNN regressor with k=3


knn_regressor = KNeighborsRegressor(n_neighbors=3)

# Train the model


knn_regressor.fit(X_train_r, y_train_r)

# Predict on the test set


y_pred_r = knn_regressor.predict(X_test_r)

# Calculate mean squared error


mse = mean_squared_error(y_test_r, y_pred_r)
print("\nRegression Results:")
print(f"Mean Squared Error: {mse:.2f}")

Output

When you run the above code, you'll get the following type of output:

Classification Results:

Accuracy: 95.56%

Regression Results:

Mean Squared Error: 82.35

Explanation of the Code

1. Classification:
o We used the Iris dataset, a built-in dataset in scikit-learn, to
classify flowers into three species.
Here’s a Python program to demonstrate the Decision Tree Algorithm for a
classification problem using the Iris dataset. The program also includes parameter
tuning using Grid Search for better results.

Program: Decision Tree with Parameter Tuning

# Import necessary libraries


import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt

# ---------------- Decision Tree for Classification ---------------- #

# Load the Iris dataset


iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)

# Initialize the Decision Tree Classifier


dt_classifier = DecisionTreeClassifier(random_state=42)

# Train the model


dt_classifier.fit(X_train, y_train)

# Predict on the test set


y_pred = dt_classifier.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print("Decision Tree Classification Results (Default Parameters):")
print(f"Accuracy: {accuracy * 100:.2f}%")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Plot the decision tree


plt.figure(figsize=(15, 10))
plot_tree(dt_classifier, filled=True, feature_names=iris.feature_names,
class_names=iris.target_names)
plt.title("Decision Tree Visualization")
plt.show()
# ---------------- Parameter Tuning using Grid Search ---------------- #

# Define parameter grid for tuning


param_grid = {
"criterion": ["gini", "entropy"],
"max_depth": [None, 3, 5, 10],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4],
}

# Perform Grid Search with Cross-Validation


grid_search = GridSearchCV(estimator=DecisionTreeClassifier(random_state=42),
param_grid=param_grid,
cv=5, scoring="accuracy", verbose=1, n_jobs=-1)

grid_search.fit(X_train, y_train)

# Get the best parameters and model


best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

# Predict with the best model


y_pred_tuned = best_model.predict(X_test)

# Evaluate the tuned model


accuracy_tuned = accuracy_score(y_test, y_pred_tuned)
print("\nDecision Tree Classification Results (Tuned Parameters):")
print(f"Accuracy: {accuracy_tuned * 100:.2f}%")
print(f"Best Parameters: {best_params}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred_tuned))

# Plot the tuned decision tree


plt.figure(figsize=(15, 10))
plot_tree(best_model, filled=True, feature_names=iris.feature_names,
class_names=iris.target_names)
plt.title("Tuned Decision Tree Visualization")
plt.show()

Explanation of the Code

1. Dataset:
o We used the Iris dataset, which has 4 features and 3 target classes.
2. Default Decision Tree:
o A basic decision tree is trained without parameter tuning.
o We evaluate its accuracy and visualize the decision tree.
3. Parameter Tuning:
o We used Grid Search with a parameter grid to find the optimal
hyperparameters.
o Parameters tuned include:
 criterion: The function to measure split quality (Gini or Entropy).
 max_depth: Maximum depth of the tree.
 min_samples_split: Minimum samples required to split an internal
node.
 min_samples_leaf: Minimum samples required to be at a leaf
node.
o The best model is selected, evaluated, and visualized.
4. Evaluation:
o The accuracy and classification report are displayed for both default and
tuned models.

Sample Output

Default Decision Tree Results:

Decision Tree Classification Results (Default Parameters):


Accuracy: 95.56%

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 16


1 0.89 0.94 0.91 16
2 0.94 0.88 0.91 18

accuracy 0.96 50
macro avg 0.95 0.94 0.94 50
weighted avg 0.96 0.96 0.96 50

Tuned Decision Tree Results:

Decision Tree Classification Results (Tuned Parameters):


Accuracy: 97.78%
Best Parameters: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 2,
'min_samples_split': 5}

Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 16


1 0.94 0.94 0.94 16
2 0.94 0.94 0.94 18

accuracy 0.98 50
macro avg 0.96 0.96 0.96 50
weighted avg 0.98 0.98 0.98 50

Visualizations

 Two decision trees are plotted:


1. Default decision tree (less optimized).
2. Tuned decision tree (better results with optimized parameters).

Key Takeaways

 Parameter Tuning improves accuracy and generalization.


 Grid Search is effective for hyperparameter optimization.
 Decision Tree visualization provides insights into how decisions are made.

Let me know if you'd like to explore any specific part in more detail!
oThe KNeighborsClassifier was initialized with k=3k=3,
meaning the class of a test sample is determined by the
majority class among its 3 nearest neighbors.
2. Regression:
o We created a synthetic regression dataset with
make_regression.
o The KNeighborsRegressor was initialized with k=3k=3,
meaning the predicted value of a test sample is the average of
its 3 nearest neighbors' values.

Key Points

 Classification Accuracy: Measures the proportion of correct


predictions.
 Regression MSE: Measures the average squared difference between
predicted and actual values.
 Choosing kk: Experiment with different values of kk to find the
optimal value for your data.
 Normalization: Ensure that all features are scaled (e.g., using Min-
Max scaling) to avoid bias due to feature magnitude.

Would you like further details, such as visualizations of the results or how
to optimize kk?
Here’s an example of using the Decision Tree algorithm for regression in Python. We'll
use a synthetic regression dataset and evaluate the model's performance based on
metrics such as Mean Squared Error (MSE) and R² score.

Program: Decision Tree for Regression

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.metrics import mean_squared_error, r2_score

# ---------------- Decision Tree for Regression ---------------- #

# Create a synthetic regression dataset


X, y = make_regression(n_samples=200, n_features=1, noise=15, random_state=42)

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Decision Tree Regressor


dt_regressor = DecisionTreeRegressor(random_state=42)

# Train the model


dt_regressor.fit(X_train, y_train)

# Predict on the test set


y_pred = dt_regressor.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Decision Tree Regression Results:")


print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R² Score: {r2:.2f}")

# ---------------- Visualization ---------------- #

# Plot the decision tree


plt.figure(figsize=(12, 8))
plot_tree(dt_regressor, filled=True, feature_names=["Feature"], rounded=True)
plt.title("Decision Tree Visualization")
plt.show()

# Plot predictions vs actual values


plt.figure(figsize=(8, 6))
plt.scatter(X_test, y_test, color="blue", label="Actual Values")
plt.scatter(X_test, y_pred, color="red", label="Predicted Values")
plt.title("Decision Tree Regression: Predictions vs Actual Values")
plt.xlabel("Feature")
plt.ylabel("Target")
plt.legend()
plt.show()

Explanation of the Code

1. Dataset:
o A synthetic regression dataset is created using make_regression with one
feature and some noise added to simulate real-world data.
2. Decision Tree Regressor:
o A DecisionTreeRegressor is trained on the training data to predict the
target variable.
o Default parameters are used for the initial model.
3. Evaluation:
o The performance of the model is evaluated using:
 Mean Squared Error (MSE): Measures the average squared
difference between predicted and actual values.
 R² Score: Indicates how well the model explains the variability of
the target variable (1 indicates perfect fit).
4. Visualization:
o The decision tree is visualized to understand its structure.
o A scatter plot is created to compare the actual values with the predicted
values.

Sample Output

Regression Results:

Decision Tree Regression Results:


Mean Squared Error (MSE): 265.42
R² Score: 0.84

Visualizations:

1. Decision Tree Visualization:


o A graphical representation of the splits made by the decision tree.
2. Predictions vs. Actual Values:
o A scatter plot showing the actual values in blue and the predicted values
in red.
Key Takeaways

 Decision trees can capture complex patterns but may overfit if not pruned or
regularized.
 Visualizing the tree helps in understanding how the model is splitting data.
 For better performance, consider hyperparameter tuning (e.g., max_depth,
min_samples_split) or ensemble methods (e.g., Random Forests).

Let me know if you'd like to extend this example with parameter tuning or advanced
techniques!
Here's a demonstration of the Naïve Bayes Classification algorithm using Python. We'll
use the Gaussian Naïve Bayes model from sklearn and apply it to the Iris dataset to
classify different species of flowers.

This script:

1. Loads the Iris dataset (a common classification dataset).


2. Splits it into training and testing sets (80%-20% split).
3. Trains a Gaussian Naïve Bayes classifier.
4. Makes predictions on the test set.
5. Evaluates the model using accuracy score and classification report.

from sklearn.datasets import load_iris


from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Naïve Bayes classifier


nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

# Make predictions
y_pred = nb_classifier.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", report)

OUTPUT

1. Accuracy: 1.00
2. Classification Report:

You might also like