Machine Learning (BCSL606) Lab Manual
Machine Learning (BCSL606) Lab Manual
Laboratory Components
Page 1
BCSL606 | Machine Learning Lab|
Experiment-01
Develop a program to create histograms for all numerical features and analyze the
distribution of each feature. Generate box plots for all numerical features and identify
any outliers. Use California Housing dataset.
Code:
import pandas as pd
import numpy as np
Page 2
BCSL606 | Machine Learning Lab|
def load_and_prepare_data():
housing = fetch_california_housing()
df = pd.DataFrame(housing.data, columns=housing.feature_names)
df['PRICE'] = housing.target
return df
numerical_features = df.columns
n_features = len(numerical_features)
# Create histograms
plt.figure(figsize=(15, 5*n_rows))
plt.subplot(n_rows, 2, idx)
Page 3
BCSL606 | Machine Learning Lab|
plt.title(f'Distribution of {feature}')
plt.xlabel(feature)
plt.ylabel('Count')
plt.tight_layout()
if save_plots:
plt.savefig('histograms.png')
plt.show()
plt.figure(figsize=(15, 5*n_rows))
plt.subplot(n_rows, 2, idx)
sns.boxplot(data=df[feature])
plt.tight_layout()
if save_plots:
plt.savefig('boxplots.png')
plt.show()
def analyze_distributions(df):
stats_summary = df.describe()
Page 4
BCSL606 | Machine Learning Lab|
outlier_summary = {}
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
outlier_summary[column] = {
'number_of_outliers': len(outliers),
def main():
df = load_and_prepare_data()
Page 5
BCSL606 | Machine Learning Lab|
create_distribution_plots(df)
print("\nStatistical Summary:")
print(stats_summary)
print("\nOutlier Analysis:")
print(f"\n{feature}:")
if __name__ == "__main__":
main()
Output
Page 6
BCSL606 | Machine Learning Lab|
Page 7
BCSL606 | Machine Learning Lab|
Page 8
BCSL606 | Machine Learning Lab|
Page 9
BCSL606 | Machine Learning Lab|
Page 10
BCSL606 | Machine Learning Lab|
Explanation
Introduction
The code performs an exploratory data analysis (EDA) on California housing data. EDA is a
crucial first step in understanding your dataset before performing any advanced analysis or
modeling. This analysis focuses on understanding the distribution of housing features and
prices across California.
The California Housing dataset is a standard dataset in scikit-learn containing housing prices
and related features. The data preparation step converts this into a panda DataFrame, which is
a table-like structure where:
Each column represents a different feature (like house price, income, population)
Distribution Analysis
1. Visual Analysis The distribution plots help understand how values are spread across
each feature:
o Box plots reveal the median, quartiles, and potential outliers in the data
Page 11
BCSL606 | Machine Learning Lab|
o Outlier detection uses the 1.5 × IQR rule: any point beyond 1.5 times the IQR
from the quartiles is considered an outlier
Visualization System
1. Descriptive Statistics
Page 12
BCSL606 | Machine Learning Lab|
1. Distribution Plots
2. Statistical Summary
3. Outlier Analysis
Expected Insights
Page 13
BCSL606 | Machine Learning Lab|
Page 14
BCSL606 | Machine Learning Lab|
Experiment-02
Code:
import pandas as pd
import numpy as np
def load_and_prepare_data():
housing = fetch_california_housing()
df = pd.DataFrame(housing.data, columns=housing.feature_names)
df['PRICE'] = housing.target
return df
def compute_correlation_matrix(df):
Page 15
BCSL606 | Machine Learning Lab|
correlation_matrix = df.corr()
return correlation_matrix
def plot_correlation_heatmap(correlation_matrix):
plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix,
plt.tight_layout()
plt.show()
Page 16
BCSL606 | Machine Learning Lab|
def create_pair_plot(df):
plt.tight_layout()
plt.show()
def analyze_correlations(correlation_matrix):
upper_tri =
correlation_matrix.where(np.triu(np.ones(correlation_matrix.shape),
k=1).astype(bool))
strong_correlations = []
strong_correlations.append({
Page 17
BCSL606 | Machine Learning Lab|
'correlation': value
})
return strong_correlations
def main():
df = load_and_prepare_data()
correlation_matrix = compute_correlation_matrix(df)
Page 18
BCSL606 | Machine Learning Lab|
plot_correlation_heatmap(correlation_matrix)
create_pair_plot(df)
strong_correlations = analyze_correlations(correlation_matrix)
# Print results
correlation = corr['correlation']
if __name__ == "__main__":
main()
Page 19
BCSL606 | Machine Learning Lab|
Output
Page 20
BCSL606 | Machine Learning Lab|
Page 21
BCSL606 | Machine Learning Lab|
Explanation
This code analyzes the California Housing dataset to understand how different
features in houses are related to each other.
+1 means perfect positive correlation (when one goes up, the other goes
up)
-1 means perfect negative correlation (when one goes up, the other goes
down)
1. A Correlation Heatmap:
2. A Pair Plot:
Page 22
BCSL606 | Machine Learning Lab|
The code also automatically finds strong correlations (values above 0.5 or
below -0.5) and prints them, telling you which features are strongly related and
whether the relationship is positive or negative.
1. Function: load_and_prepare_data()
o Steps:
2. Function: compute_correlation_matrix(df)
0: No correlation
Page 23
BCSL606 | Machine Learning Lab|
3. Function: plot_correlation_heatmap(correlation_matrix)
o Settings:
Range: -1 to 1
4. Function: create_pair_plot(df)
o Settings:
5. Function: analyze_correlations(correlation_matrix)
o Steps:
Page 24
BCSL606 | Machine Learning Lab|
6. Function: main()
o Process:
6. Prints findings
7. Output Format
o Visual outputs:
Correlation heatmap
o Text output:
Page 25
BCSL606 | Machine Learning Lab|
Experiment-03
Code:
import numpy as np
import pandas as pd
def load_and_prepare_data():
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
Page 26
BCSL606 | Machine Learning Lab|
df['target'] = iris.target
# Separate features
X = data[feature_names]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
Page 27
BCSL606 | Machine Learning Lab|
explained_variance_ratio = pca.explained_variance_ratio_
loadings = pca.components_
# Create figure
plt.figure(figsize=(10, 8))
targets = sorted(data['target'].unique())
target_names = sorted(data['target_names'].unique())
label=target_name, alpha=0.8)
Page 28
BCSL606 | Machine Learning Lab|
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
def plot_explained_variance(pca):
plt.figure(figsize=(10, 6))
cumsum = np.cumsum(pca.explained_variance_ratio_)
plt.xlabel('Number of Components')
plt.grid(True, alpha=0.3)
Page 29
BCSL606 | Machine Learning Lab|
plt.show()
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.bar(feature_names, loadings[0])
plt.xticks(rotation=45)
plt.subplot(1, 2, 2)
plt.bar(feature_names, loadings[1])
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Page 30
BCSL606 | Machine Learning Lab|
def main():
# Perform PCA
print("\nPerforming PCA...")
print(f"PC1: {explained_variance_ratio[0]:.2%}")
print(f"PC2: {explained_variance_ratio[1]:.2%}")
print(f"Total: {sum(explained_variance_ratio):.2%}")
# Plot results
print("\nCreating visualizations...")
plot_explained_variance(pca)
Page 31
BCSL606 | Machine Learning Lab|
visualize_feature_importance(loadings, feature_names)
print(f"{fname}: {weight:.3f}")
if __name__ == "__main__":
main()
Output
Page 32
BCSL606 | Machine Learning Lab|
Page 33
BCSL606 | Machine Learning Lab|
Explanation
Basic Theory:
Code Functions:
1. load_and_prepare_data()
o Each row represents one flower with its features and species type
2. perform_pca()
o Returns:
Transformed data
3. plot_pca_results()
4. plot_explained_variance()
Page 35
BCSL606 | Machine Learning Lab|
5. visualize_feature_importance()
Page 36
BCSL606 | Machine Learning Lab|
Experiment-04
For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Find-S algorithm to output a description of the set of all hypotheses
consistent with the training examples
Code:
import pandas as pd
import numpy as np
class FindS:
def __init__(self):
self.hypothesis = None
self.features = None
Page 37
BCSL606 | Machine Learning Lab|
"""
"""
new_hypothesis = []
if hyp_val == 'ϕ':
new_hypothesis.append(ex_val)
new_hypothesis.append(hyp_val)
else:
new_hypothesis.append('?')
return new_hypothesis
Page 38
BCSL606 | Machine Learning Lab|
"""
Find the most specific hypothesis consistent with the training examples
Parameters:
"""
X = data.drop(columns=[target_column])
y = data[target_column]
self.features = X.columns.tolist()
# Initialize hypothesis
self.hypothesis = self.initialize_hypothesis(len(self.features))
Page 39
BCSL606 | Machine Learning Lab|
if self.is_positive_example(y[index]):
self.hypothesis = self.generalize_hypothesis(
row.values.tolist(),
self.hypothesis
return self.hypothesis
def print_hypothesis(self):
print("\nFinal Hypothesis:")
print("〈", end='')
print("〉")
else:
def load_data(filename):
Page 40
BCSL606 | Machine Learning Lab|
try:
return pd.read_csv(filename)
except FileNotFoundError:
return None
except Exception as e:
return None
def main():
sample_data = {
Page 41
BCSL606 | Machine Learning Lab|
df = pd.DataFrame(sample_data)
print("\nTraining Data:")
print(df)
find_s = FindS()
find_s.fit(df, target_column='PlayTennis')
# Print results
find_s.print_hypothesis()
print("\nHypothesis Interpretation:")
Page 42
BCSL606 | Machine Learning Lab|
if __name__ == "__main__":
main()
Output
Explanation
1. Purpose
Find-S aims to find the most specific hypothesis that is consistent with
training examples
2. Hypothesis Space
3. Working Principle
4. Generalization Rules
5. Advantages
Computationally efficient
Page 44
BCSL606 | Machine Learning Lab|
6. Limitations
7. Applications
Pattern recognition
8. Example Scenario
Page 45
BCSL606 | Machine Learning Lab|
Experiment-05
a. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊ Class1,
else xi ∊ Class1
b. Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
Code:
import numpy as np
class KNN:
self.k = k
self.X_train = None
self.y_train = None
self.X_train = X
self.y_train = y
Page 46
BCSL606 | Machine Learning Lab|
predictions = []
for x in X:
distances = np.abs(self.X_train - x)
k_nearest_indices = np.argsort(distances)[:self.k]
k_nearest_labels = self.y_train[k_nearest_indices]
most_common = Counter(k_nearest_labels).most_common(1)
predictions.append(most_common[0][0])
return np.array(predictions)
Page 47
BCSL606 | Machine Learning Lab|
def generate_data():
X = np.random.rand(100)
y = np.zeros(100)
return X, y
plt.figure(figsize=(12, 4))
Page 48
BCSL606 | Machine Learning Lab|
plt.xlabel('x')
plt.yticks([])
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
boundary_points = []
Page 49
BCSL606 | Machine Learning Lab|
if y_pred[i] != y_pred[i-1]:
boundary_points.append(X_test[i])
if boundary_points:
print(f"x = {point:.3f}")
else:
def main():
# Generate data
print("Generating dataset...")
X, y = generate_data()
Page 50
BCSL606 | Machine Learning Lab|
sort_idx = np.argsort(X_test)
X_test = X_test[sort_idx]
for k in k_values:
knn = KNN(k=k)
knn.fit(X_train, y_train)
# Make predictions
y_pred = knn.predict(X_test)
# Plot results
Page 51
BCSL606 | Machine Learning Lab|
analyze_boundary_points(X_test, y_pred, k)
class1_pred = np.sum(y_pred == 1)
class2_pred = np.sum(y_pred == 2)
if __name__ == "__main__":
main()
Page 52
BCSL606 | Machine Learning Lab|
Output
Page 53
BCSL606 | Machine Learning Lab|
Page 54
BCSL606 | Machine Learning Lab|
Page 55
BCSL606 | Machine Learning Lab|
Explanation
The KNN class implements the K-Nearest Neighbors algorithm with two
main methods:
For each test point, it finds k closest training points and takes a majority
vote
2. Data Generation
Page 56
BCSL606 | Machine Learning Lab|
3. Visualization Components
analyze_boundary_points function:
Page 57
BCSL606 | Machine Learning Lab|
o Visualizes results
6. Key Features
Page 58
BCSL606 | Machine Learning Lab|
Experiment-06
Code:
import numpy as np
return X, y
"""
Page 59
BCSL606 | Machine Learning Lab|
Parameters:
-----------
X : array-like
y : array-like
Target values
x_pred : array-like
tau : float
Returns:
--------
array-like
"""
X = np.ravel(X)
y = np.ravel(y)
Page 60
BCSL606 | Machine Learning Lab|
x_pred = np.ravel(x_pred)
y_pred = []
for x in x_pred:
W = np.diag(weights)
# Make prediction
y_pred.append(float(x_aug @ theta))
return np.array(y_pred)
Page 61
BCSL606 | Machine Learning Lab|
np.random.seed(42)
X, y = generate_sample_data(n_samples=100, noise=10)
# Plotting
plt.figure(figsize=(12, 6))
plt.xlabel('X')
Page 62
BCSL606 | Machine Learning Lab|
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()
Output
Explanation
1. Data Generation
2. Kernel Function
3. LOWESS Implementation
Key steps:
Page 64
BCSL606 | Machine Learning Lab|
4. Visualization Setup
Page 65
BCSL606 | Machine Learning Lab|
Disadvantages:
Computationally expensive
Page 66
BCSL606 | Machine Learning Lab|
Experiment-07
Code:
import pandas as pd
url =
"https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv
"
boston_df = pd.read_csv(url)
print(boston_df.columns.tolist())
import numpy as np
import pandas as pd
Page 67
BCSL606 | Machine Learning Lab|
import warnings
warnings.filterwarnings('ignore')
print("-" * 50)
url =
"https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv
"
boston_df = pd.read_csv(url)
print("\nDataset Information:")
Page 68
BCSL606 | Machine Learning Lab|
print("\nFeatures:")
print(f"- {name}")
scaler = StandardScaler()
X_train_boston_scaled = scaler.fit_transform(X_train_boston)
X_test_boston_scaled = scaler.transform(X_test_boston)
lr_model = LinearRegression()
lr_model.fit(X_train_boston_scaled, y_train_boston)
Page 69
BCSL606 | Machine Learning Lab|
# Make predictions
y_pred_boston = lr_model.predict(X_test_boston_scaled)
# Calculate metrics
rmse_boston = np.sqrt(mse_boston)
feature_importance = pd.DataFrame({
'Feature': X_boston.columns,
'Coefficient': lr_model.coef_
})
feature_importance['Abs_Coefficient'] = abs(feature_importance['Coefficient'])
feature_importance = feature_importance.sort_values('Abs_Coefficient',
ascending=False)
Page 70
BCSL606 | Machine Learning Lab|
print("\nFeature Importance:")
print(feature_importance[['Feature', 'Coefficient']].to_string(index=False))
plt.figure(figsize=(12, 6))
plt.bar(feature_importance['Feature'], feature_importance['Coefficient'])
plt.xticks(rotation=45)
plt.xlabel('Features')
plt.ylabel('Coefficient Value')
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 6))
Page 71
BCSL606 | Machine Learning Lab|
plt.tight_layout()
plt.show()
print("-" * 50)
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-
mpg.data'
df = df.replace('?', np.nan)
df = df.dropna()
df['Horsepower'] = df['Horsepower'].astype(float)
Page 72
BCSL606 | Machine Learning Lab|
X_mpg = df[['Horsepower']].values
y_mpg = df['MPG'].values
scaler_mpg = StandardScaler()
X_mpg_scaled = scaler_mpg.fit_transform(X_mpg)
degrees = [1, 2, 3]
plt.figure(figsize=(15, 5))
poly_features = PolynomialFeatures(degree=degree)
X_train_poly = poly_features.fit_transform(X_train_mpg)
Page 73
BCSL606 | Machine Learning Lab|
X_test_poly = poly_features.transform(X_test_mpg)
# Train model
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train_mpg)
# Make predictions
y_pred_poly = poly_model.predict(X_test_poly)
# Calculate metrics
rmse_poly = np.sqrt(mse_poly)
# Plot results
Page 74
BCSL606 | Machine Learning Lab|
plt.subplot(1, 3, i)
X_sort_poly = poly_features.transform(X_sort)
y_sort_pred = poly_model.predict(X_sort_poly)
plt.xlabel('Horsepower (scaled)')
plt.ylabel('MPG')
plt.legend()
plt.tight_layout()
plt.show()
Page 75
BCSL606 | Machine Learning Lab|
Output
Page 76
BCSL606 | Machine Learning Lab|
Page 77
BCSL606 | Machine Learning Lab|
Explanation
Key Components:
Page 78
BCSL606 | Machine Learning Lab|
Implementation Steps:
# Data Preparation
# Model Training
# Evaluation
Key Components:
Page 79
BCSL606 | Machine Learning Lab|
Implementation Steps:
# Data Preparation
# Model Training
# Evaluation
3. Key Visualizations:
Page 80
BCSL606 | Machine Learning Lab|
5. Key Insights:
Page 81
BCSL606 | Machine Learning Lab|
Experiment-08
Develop a program to demonstrate the working of the decision tree algorithm. Use
Breast Cancer Data set for building the decision tree and apply this knowledge to
classify a new sample.
Code:
import numpy as np
data = load_breast_cancer()
X = data.data
y = data.target
Page 82
BCSL606 | Machine Learning Lab|
dt_classifier.fit(X_train, y_train)
y_pred = dt_classifier.predict(X_test)
print("\nClassification Report:")
plt.figure(figsize=(10, 8))
cm = confusion_matrix(y_test, y_pred)
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
Page 83
BCSL606 | Machine Learning Lab|
plt.figure(figsize=(20,10))
plot_tree(dt_classifier, feature_names=data.feature_names,
"""
Parameters:
sample (list or array): List of feature values in the same order as the training
data
Returns:
"""
prediction = dt_classifier.predict(sample)
Page 84
BCSL606 | Machine Learning Lab|
probability = dt_classifier.predict_proba(sample)
print("\nClassification Results:")
print(f"{feature}: {importance:.4f}")
example_sample = X_train.mean(axis=0)
print("\nExample Classification:")
Page 85
BCSL606 | Machine Learning Lab|
classify_new_sample(example_sample)
Output
Page 86
BCSL606 | Machine Learning Lab|
Page 87
BCSL606 | Machine Learning Lab|
Explanation
2. Model Configuration
o random_state=42 (reproducibility)
4. Visualization Elements:
Page 88
BCSL606 | Machine Learning Lab|
Provides:
6. Key Features:
Probability Estimates
Page 89
BCSL606 | Machine Learning Lab|
7. Use Cases:
Risk Assessment
Page 90
BCSL606 | Machine Learning Lab|
Experiment-09
Develop a program to implement the Naive Bayesian classifier considering Olivetti Face
Data set for training.
Compute the accuracy of the classifier, considering a few test data sets.
Code:
import numpy as np
faces = fetch_olivetti_faces()
X = faces.data
y = faces.target
Page 91
BCSL606 | Machine Learning Lab|
for i, ax in enumerate(axes):
ax.set_title(f'Person {y[i]}')
ax.axis('off')
plt.tight_layout()
plt.show()
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
# Make predictions
y_pred = nb_classifier.predict(X_test)
# Calculate accuracy
Page 92
BCSL606 | Machine Learning Lab|
# Perform cross-validation
print("Performance Metrics:")
print("\nCross-validation scores:")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
plt.figure(figsize=(12, 8))
cm = confusion_matrix(y_test, y_pred)
plt.title('Confusion Matrix')
Page 93
BCSL606 | Machine Learning Lab|
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
X_samples = X_test[indices]
y_true = y_test[indices]
# Make predictions
y_pred = classifier.predict(X_samples)
probabilities = classifier.predict_proba(X_samples)
# Display results
for i in range(num_samples):
Page 94
BCSL606 | Machine Learning Lab|
axes[0, i].axis('off')
axes[1, i].axis('off')
ha='center', va='center')
if y_true[i] == y_pred[i]:
else:
plt.tight_layout()
plt.show()
Page 95
BCSL606 | Machine Learning Lab|
display_sample_faces(X, y)
if num_display > 0:
for i in range(num_display):
Page 96
BCSL606 | Machine Learning Lab|
if num_display == 1:
ax = axes
else:
ax = axes[i]
ax.axis('off')
plt.tight_layout()
plt.show()
# Analyze misclassifications
print("\nAnalyzing misclassifications:")
Output
Page 97
BCSL606 | Machine Learning Lab|
Page 98
BCSL606 | Machine Learning Lab|
Page 99
BCSL606 | Machine Learning Lab|
Explanation
2. Key Functions:
c) Analyze Misclassifications:
Page 100
BCSL606 | Machine Learning Lab|
3. Model Implementation
4. Performance Evaluation:
Cross-validation scores
Misclassification analysis
5. Visualization Components:
Misclassified examples
6. Key Features:
Page 101
BCSL606 | Machine Learning Lab|
Probability estimation
Error analysis
Cross-validation performance
7. Notable Aspects:
Page 102
BCSL606 | Machine Learning Lab|
Experiment-10
Code:
import numpy as np
import pandas as pd
data = load_breast_cancer()
X = data.data
df = pd.DataFrame(X, columns=data.feature_names)
Page 103
BCSL606 | Machine Learning Lab|
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
inertias = []
silhouette_scores = []
for k in k_values:
kmeans.fit(X)
inertias.append(kmeans.inertia_)
silhouette_scores.append(silhouette_score(X, kmeans.labels_))
Page 104
BCSL606 | Machine Learning Lab|
# Inertia plot
ax1.set_ylabel('Inertia')
ax1.set_title('Elbow Method')
ax2.set_ylabel('Silhouette Score')
ax2.set_title('Silhouette Analysis')
plt.tight_layout()
plt.show()
return k_values[np.argmax(silhouette_scores)]
# Find optimal k
optimal_k = plot_elbow_curve(X_scaled)
Page 105
BCSL606 | Machine Learning Lab|
cluster_labels = kmeans.fit_predict(X_scaled)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
plt.figure(figsize=(12, 8))
plt.colorbar(scatter, label='Cluster')
plt.show()
comparison_df = pd.DataFrame({
Page 106
BCSL606 | Machine Learning Lab|
'Cluster': cluster_labels,
'Actual_Diagnosis': y
})
print(pd.crosstab(comparison_df['Cluster'], comparison_df['Actual_Diagnosis'],
values=np.zeros_like(cluster_labels), aggfunc='count'))
df_analysis['Cluster'] = labels
cluster_means = df_analysis.groupby('Cluster').mean()
plt.figure(figsize=(15, 8))
Page 107
BCSL606 | Machine Learning Lab|
xticklabels=True, yticklabels=True)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
return cluster_means
feature_importance = pd.DataFrame({
Page 108
BCSL606 | Machine Learning Lab|
'Feature': feature_names,
'Importance': centroid_variance
}).sort_values('Importance', ascending=False)
plt.figure(figsize=(12, 6))
plt.tight_layout()
plt.show()
return feature_importance
Page 109
BCSL606 | Machine Learning Lab|
if isinstance(sample, list):
sample_scaled = scaler.transform(sample)
# Predict cluster
cluster = kmeans.predict(sample_scaled)[0]
distances = kmeans.transform(sample_scaled)[0]
Page 110
BCSL606 | Machine Learning Lab|
Output
Page 111
BCSL606 | Machine Learning Lab|
Page 112
BCSL606 | Machine Learning Lab|
Page 113
BCSL606 | Machine Learning Lab|
Explanation
1. Data Preparation:
2. Key Functions:
b) Cluster Analysis:
c) Feature Importance:
Page 114
BCSL606 | Machine Learning Lab|
3. Visualization Components:
4. Model Implementation:
5. Cluster Prediction:
6. Key Features:
Page 115
BCSL606 | Machine Learning Lab|
7. Analysis Components:
Page 116