0% found this document useful (0 votes)
24 views37 pages

Ayush File 1

The document outlines a series of machine learning programs conducted by a student at DPG Institute of Technology and Management, focusing on various regression techniques, classification models, and data analysis methods. Each program includes objectives, datasets used, and evaluation metrics, covering topics such as Linear Regression for housing prices, Logistic Regression for digit recognition, and Naïve Bayes for wine classification. The document serves as a comprehensive report on the student's practical applications of machine learning concepts in different scenarios.

Uploaded by

ayushdevgan26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views37 pages

Ayush File 1

The document outlines a series of machine learning programs conducted by a student at DPG Institute of Technology and Management, focusing on various regression techniques, classification models, and data analysis methods. Each program includes objectives, datasets used, and evaluation metrics, covering topics such as Linear Regression for housing prices, Logistic Regression for digit recognition, and Naïve Bayes for wine classification. The document serves as a comprehensive report on the student's practical applications of machine learning concepts in different scenarios.

Uploaded by

ayushdevgan26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

DPG Institute of Technology and Management

Gurugram 122004 Haryana

Machine Learning Essentials


PCC-DS-302G
6th Semester
CSE-DS ENGINEERING

Submitted to: Submitted by:


Ms. Renu Vadhera Ishant
Asst. Professor B.Tech.(CSE-DS)
Roll No. – 22DS18
6th Semester
INDEX

S.NO NAME OF EXPERIMENT DATE SIGN.


RUBRICS

Evaluation Parameters Max. Marks Marks Allotted


Attendance 5

Conduction of Program 10

Record Maintenance 5

Internal Viva 5

Total 25
PROGRAM - 1

Aim :To develop a machine learning model using Linear Regression for predicting house prices.
Objective: Perform EDA to analyze dataset patterns, preprocess data by handling missing values
and normalizing features, train a Linear Regression model, and evaluate accuracy.
Outcomes: Gain insights into housing price factors, develop a predictive regression model,
assess its accuracy using statistical metrics, and visualize predictions for better interpretation.
Dataset Used:The California Housing Dataset (from sklearn.datasets).

Program:-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Step 1: Load the California Housing Dataset


housing = fetch_california_housing()
df = pd.DataFrame(housing.data, columns=housing.feature_names)
df['Target'] = housing.target # Add the target variable

# Step 2: Perform Exploratory Data Analysis (EDA)


print("First 5 rows of dataset:\n", df.head())
print("\nDataset Summary:\n", df.describe())
print("\nChecking for missing values:\n", df.isnull().sum())
# Step 3: Visualizations for EDA
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Feature Correlation Heatmap")
plt.subplot(1, 2, 2)
sns.histplot(df['Target'], bins=30, kde=True, color='blue')
plt.title("Distribution of House Prices")
plt.tight_layout()
plt.show()
# Step 4: Split Features & Target Variable
X = df.drop(columns=['Target'])
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Normalize Features (Standardization)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 6: Apply Linear Regression Model


model = LinearRegression()
model.fit(X_train, y_train)

# Step 7: Model Evaluation


y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"Mean Absolute Error (MAE): {mae:.4f}")
print(f"R² Score: {r2:.4f}")
# Step 8: Plot Predictions vs Actual Values
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.5, color="blue")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color="red") # Ideal fit line
plt.xlabel("Actual House Prices")
plt.ylabel("Predicted House Prices")
plt.title("Actual vs Predicted Prices (Linear Regression)")
plt.show()
PROGRAM – 2
Aim: To build and evaluate a linear regression model to predict medical costs based on patient
features.
Objective: Perform EDA, preprocess categorical and numerical features, train a linear
regression model, and assess performance using MSE, MAE, and R² score.
Outcome: Gained insights into the factors influencing medical costs, evaluated model
accuracy, and visualized predictions to understand the model's effectiveness.
Dataset: Medical Cost Personal Dataset (Kaggle) .
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Step 1: Load the dataset


url="https://raw.githubusercontent.com/stedy/MachineLearningwithRdatasets/master/insuranc
e.csv"
df = pd.read_csv(url)

# Step 2: Perform Exploratory Data Analysis (EDA)


print("First 5 rows of dataset:\n", df.head())

print("\nDataset Summary:\n", df.describe())


print("\nChecking for missing values:\n", df.isnull().sum())

# Step 3: Visualizations for EDA


plt.figure(figsize=(12, 6))

# Correlation heatmap
plt.subplot(1, 2, 1)
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Feature Correlation Heatmap")
# Distribution of Charges (Target)
plt.subplot(1, 2, 2)
sns.histplot(df['charges'], bins=30, kde=True, color='blue')
plt.title("Distribution of Medical Charges")
plt.tight_layout()
plt.show()

# Step 4: Preprocessing - Convert Categorical Data using One-Hot Encoding


categorical_features = ['sex', 'smoker', 'region']
numeric_features = ['age', 'bmi', 'children']
preprocessor = ColumnTransformer([
('num', StandardScaler(), numeric_features),
('cat', OneHotEncoder(drop='first'), categorical_features)
])

# Step 5: Split Data into Train & Test Sets


X = df.drop(columns=['charges']) # Features
y = df['charges'] # Target Variable
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 6: Apply Preprocessing


X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)
# Step 7: Apply Linear Regression Model
model = LinearRegression()
model.fit(X_train, y_train) # Train the model

# Step 8: Model Evaluation


y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse:.2f}")

print(f"Mean Absolute Error (MAE): {mae:.2f}")

print(f"R² Score: {r2:.4f}")

# Step 9: Plot Predictions vs Actual Values


plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.5, color="blue")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color="red") # Ideal fit line
plt.xlabel("Actual Medical Charges")
plt.ylabel("Predicted Medical Charges")
plt.title("Actual vs Predicted Charges (Linear Regression)")
plt.show()
PROGRAM - 3
Aim: To develop a machine learning model using Ridge Regression for predicting California
housing prices.
Objective: Perform data preparation, split the dataset, train a Ridge Regression model, and
evaluate its accuracy using the R² score.
Outcome: Understand the impact of features on housing prices, build a predictive model, and
assess its performance with accuracy metrics.
Dataset Used: California Housing Dataset (from sklearn.datasets).

Program:
# Importing library
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

# Importing the Dataset


california = fetch_california_housing()

# Preparing the dataset


df_california = pd.DataFrame(california.data, columns=california.feature_names)
df_california['Price'] = california.target

# Assigning the Independent Variables and Dependent Variable


X = df_california.drop(columns='Price')
y = df_california['Price']

# Splitting the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)
# Ridge Regression
ridge = Ridge()

# Training the model


ridge.fit(X_train, y_train)

# Finding the Accuracy


accuracy = ridge.score(X_test, y_test)
print(accuracy)
PROGRAM – 4
Aim: To develop a machine learning model using Lasso Regression for predicting California
housing prices.
Objective: Load and preprocess the dataset, split it into training and test sets, train a Lasso
Regression model, and evaluate its accuracy using the R² score.
Outcome: Understand the influence of different features on housing prices, build a sparse
predictive model, and assess its performance with accuracy metrics.
Dataset Used: California Housing Dataset (from sklearn.datasets).

Program :
# Importing libraries
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

# Load the Dataset


california = fetch_california_housing()

# Preparing the dataset


df_california = pd.DataFrame(california.data, columns=california.feature_names)
df_california['Price'] = california.target

# Assigning the Independent Variables and Dependent Variable


X = df_california.drop(columns='Price')
y = df_california['Price']

# Splitting the Dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

# Lasso Regression
lasso = Lasso()
# Training the Model
lasso.fit(X_train, y_train)

# Finding the Accuracy


accuracy = lasso.score(X_test, y_test)
print(f"Model Accuracy (R² Score): {accuracy:.2f}")
PROGRAM - 5
Aim:- Develop a Logistic Regression model to predict handwritten digits (0-9).
Objective: Perform EDA to understand dataset patterns, preprocess features using
standardization, train a Logistic Regression model, and evaluate performance using accuracy
and confusion matrix.
Outcomes: Analyze digit classification patterns, develop a predictive model, assess accuracy
with statistical metrics, and visualize actual vs. predicted results for better interpretation.
Dataset Used: The Digits Dataset (sklearn.datasets).

Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the Digits dataset


digits = load_digits()
X, y = digits.data, digits.target

# Convert dataset into a DataFrame for better understanding


df = pd.DataFrame(X, columns=[f"Pixel_{i}" for i in range(X.shape[1])])
df["Target"] = y

# Display dataset information


print("Dataset Shape:", X.shape)
print("Unique Classes:", np.unique(y))

# Visualizing some digit samples


fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for i, ax in enumerate(axes.flat):
ax.imshow(digits.images[i], cmap='gray')
ax.set_title(f"Label: {digits.target[i]}")
ax.axis('off')
plt.show()
# Plot the class distribution
plt.figure(figsize=(8, 5))
sns.countplot(x=y, palette="viridis")
plt.title("Distribution of Digits in Dataset")
plt.xlabel("Digit Class")
plt.ylabel("Count")
plt.show()
# Visualizing the average image of each digit
fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for digit, ax in zip(range(10), axes.flat):
mean_image = digits.images[y == digit].mean(axis=0) # Average image of each digit
ax.imshow(mean_image, cmap='gray')
ax.set_title(f"Mean {digit}")
ax.axis('off')
plt.show()

# Splitting dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42,
stratify=y)

# Standardizing features for better performance


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Training the Logistic Regression model


model = LogisticRegression(max_iter=5000, solver='lbfgs', multi_class='auto')
model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)
# Model evaluation

accuracy = accuracy_score(y_test, y_pred)


conf_matrix = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", report)
# Visualizing some predictions
fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for i, ax in enumerate(axes.flat):
ax.imshow(X_test[i].reshape(8, 8), cmap='gray')
ax.set_title(f"Pred: {y_pred[i]}")
ax.axis('off')
plt.show()
PROGRAM – 6

Aim: Develop a Naïve Bayes model for wine classification.


Objective: Perform EDA to understand dataset patterns, visualize class distributions, preprocess
data, train a Gaussian Naïve Bayes model, and evaluate performance using accuracy metrics.
Outcomes: Analyze wine classification patterns, develop a predictive model, assess accuracy
with statistical metrics, and visualize class relationships for better interpretation.
Dataset Used: The Wine Dataset (sklearn.datasets).

Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

wine = datasets.load_wine()
X, y = wine.data, wine.target
feature_names = wine.feature_names

df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

# Exploratory Data Analysis (EDA)


print("Dataset Overview:")
print(df.head())
print("\nSummary Statistics:")
print(df.describe())
# Visualize class distribution
sns.countplot(x='target', data=df)
plt.title("Class Distribution")
plt.show()
# Pairplot to visualize feature relationships
sns.pairplot(df, hue='target', diag_kind='kde')
plt.show()
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Naïve Bayes classifier


classifier = GaussianNB()

# Train the classifier


classifier.fit(X_train, y_train)

# Make predictions
y_pred = classifier.predict(X_test)

# Evaluate the accuracy


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
PROGRAM – 7

Aim: Apply and evaluate KNN models on the Wine dataset.


Objective: Implement and compare KNN classifiers on the Wine dataset using accuracy, F1-
score, and confusion matrix.
Outcome: Gained insights into model performance, strengths, and limitations for classifying
wine types, aiding in selecting the optimal algorithm.
Dataset: Wine dataset from sklearn.datasets.

Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report

# Load the Wine dataset


wine = datasets.load_wine()
X, y = wine.data, wine.target
feature_names = wine.feature_names
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a K-Nearest Neighbors (KNN) classifier


knn_classifier = KNeighborsClassifier(n_neighbors=5)
# Train the classifier
knn_classifier.fit(X_train, y_train)

# Make predictions
y_pred = knn_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"KNN Accuracy: {accuracy:.2f}")

# Calculate F1-score (macro-averaged for multi-class classification)


f1 = f1_score(y_test, y_pred, average='macro')
print(f"KNN F1-score: {f1:.2f}")

# Compute and display the confusion matrix


conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)
# Visualizing the Confusion Matrix using a heatmap
plt.figure(figsize=(6,5))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=wine.target_names,
yticklabels=wine.target_names)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("KNN Confusion Matrix")
plt.show()

# Display classification report for detailed metrics


print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=wine.target_names))
PROGRAM – 8

Aim: Build and evaluate an SVM classifier for on the Wine dataset.
Objective: Load and split the data, train an SVM with an RBF kernel, and evaluate
performance using accuracy, F1-score, confusion matrix, and classification report.
Outcome: Trained an SVM, achieved performance metrics, visualized and generated a
detailed classification report.the confusion matrix,
Dataset: Wine dataset from scikit-learn .
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report

# Load the Wine dataset


wine = datasets.load_wine()
X, y = wine.data, wine.target
feature_names = wine.feature_names
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an SVM classifier with RBF kernel


svm_classifier = SVC(kernel='rbf', C=1.0, gamma='scale') # You can experiment with
different kernels like 'linear', 'poly', 'sigmoid'
# Train the classifier
svm_classifier.fit(X_train, y_train)

# Make predictions
y_pred = svm_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Accuracy: {accuracy:.2f}")

# Calculate F1-score (macro-averaged for multi-class classification)


f1 = f1_score(y_test, y_pred, average='macro')
print(f"SVM F1-score: {f1:.2f}")

# Compute and display the confusion matrix


conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Visualizing the Confusion Matrix using a heatmap


plt.figure(figsize=(6,5))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=wine.target_names,
yticklabels=wine.target_names)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("SVM Confusion Matrix")
plt.show()

# Display classification report for detailed metrics


print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=wine.target_names))
PROGRAM – 9

Aim: To implement and evaluate a Random Forest Classifier on the Iris dataset.

Objective: Train a Random Forest model using the Iris dataset, evaluate its performance using
accuracy and classification report, and interpret the results.

Outcome: Successfully trained a Random Forest model, achieved high classification accuracy,
and analyzed detailed class-wise performance metrics.

Dataset: Iris dataset from scikit-learn with 150 samples, 4 features (sepal/petal length & width),
and 3 classes of Iris flowers.
Program:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris

# Load dataset (Iris dataset as an example)


data = load_iris()
X = data.data
y = data.target

# Split data into training (80%) and testing (20%) sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Random Forest model


rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model


rf_model.fit(X_train, y_train)
# Make predictions
y_pred = rf_model.predict(X_test)

# Evaluate model performance


accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:')
print(classification_report(y_test, y_pred))
PROGRAM – 10

Aim: To implement a feedforward neural network using the sigmoid activation function and
backpropagation algorithm to learn the XOR logical function.

Objective: Build and train a feedforward neural network with one hidden layer to learn the XOR
logic gate using sigmoid activation and gradient descent.

Outcome: Successfully trained a neural network to output correct XOR results through forward
and backpropagation.

Dataset: XOR truth table with 2 input features and 1 output (4 samples total).

Program :
import numpy as np

# Sigmoid activation function


def sigmoid(x):
return 1 / (1 + np.exp(-x))

# Derivative of sigmoid
def sigmoid_derivative(x):
return x * (1 - x)

# Training data (XOR problem)


X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

# Initialize weights randomly


np.random.seed(42)
weights_input_hidden = np.random.rand(2, 2)
weights_hidden_output = np.random.rand(2, 1)

# Training process
epochs = 10000
learning_rate = 0.5
for _ in range(epochs):

# Forward propagation
hidden_layer_input = np.dot(X, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)
final_input = np.dot(hidden_layer_output, weights_hidden_output)
final_output = sigmoid(final_input)

# Backpropagation
error = y - final_output
d_output = error * sigmoid_derivative(final_output)
hidden_layer_error = d_output.dot(weights_hidden_output.T)
d_hidden_layer = hidden_layer_error * sigmoid_derivative(hidden_layer_output)

# Updating weights
weights_hidden_output += hidden_layer_output.T.dot(d_output) * learning_rate
weights_input_hidden += X.T.dot(d_hidden_layer) * learning_rate
print("Final Output after training:\n", final_output)
PROGRAM – 11

Aim: To implement a neural network using the Adam optimizer for binary classification tasks.

Objective: Build and compile a simple feedforward neural network with hidden layers using
the ReLU and Sigmoid activation functions. Use the Adam optimizer and binary cross-entropy
as the loss function.

Outcome: Successfully created and compiled a neural network model using TensorFlow Keras
and viewed its architecture using .summary().

Dataset: Dummy input shape assumed with 10 features for binary classification (actual data
loading not included).

Program:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Create a simple neural network


model = Sequential([
Dense(64, activation='relu', input_shape=(10,)),
Dense(32, activation='relu'),
Dense(1, activation='sigmoid')
])

# Compile the model using Adam optimizer


model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print("Neural Network Summary:")
model.summary()
PROGRAM – 12

Aim: To implement K-Means clustering algorithm for grouping data points into clusters based
on similarity.
Objective: Apply the K-Means algorithm to partition sample data into two clusters and print
the cluster labels and centers.
Outcome: Successfully applied K-Means clustering, obtained cluster assignments for data
points, and identified the cluster centers.
Dataset: Sample 2D data points used for clustering.

Program:
from sklearn.cluster import KMeans
import numpy as np

# Sample data
X = np.array([[1, 2], [2, 3], [3, 4], [10, 11], [11, 12], [12, 13]])

# Apply K-Means clustering


kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(X)
print("Cluster labels:", kmeans.labels_)
print("Cluster centers:", kmeans.cluster_centers_)

You might also like