0% found this document useful (0 votes)
27 views4 pages

Stats Lab (10-12)

The document outlines three programs implementing data analysis techniques using Python: PCA on the Wisconsin breast cancer dataset, LDA on the Iris dataset, and multiple linear regression on the Iris dataset. Each program includes data loading, transformation, visualization, and evaluation of results. The PCA and LDA visualizations display the separation of classes, while the regression analysis provides metrics like Mean Squared Error and R-squared for model performance.

Uploaded by

Sai Kishan .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Stats Lab (10-12)

The document outlines three programs implementing data analysis techniques using Python: PCA on the Wisconsin breast cancer dataset, LDA on the Iris dataset, and multiple linear regression on the Iris dataset. Each program includes data loading, transformation, visualization, and evaluation of results. The PCA and LDA visualizations display the separation of classes, while the regression analysis provides metrics like Mean Squared Error and R-squared for model performance.

Uploaded by

Sai Kishan .s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

10. Program to implement PCA for Wisconsin dataset, visualize and analyze the results.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer

print(df)

# Standardize the data

X_mean = np.mean(X, axis=0)

X_std = np.std(X, axis=0)

X_standardized = (X - X_mean) / X_std

# Compute the covariance matrix

cov_matrix = np.cov(X_standardized, rowvar=False)

print(cov_matrix)

# Compute eigenvalues and eigenvectors

eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort the eigenvalues and eigenvectors

sorted_indices = np.argsort(eigenvalues)[::-1]

eigenvalues_sorted = eigenvalues[sorted_indices]

eigenvectors_sorted = eigenvectors[:, sorted_indices]

# Select the top 2 principal components

k=2

eigenvectors_subset = eigenvectors_sorted[:, :k]

# Transform the data

X_pca = X_standardized.dot(eigenvectors_subset)

# Visualize the PCA results


plt.figure(figsize=(10, 6))

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k', s=50)

plt.title('PCA of Wisconsin Breast Cancer Dataset')

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.colorbar(label='Class Label')

plt.grid()

plt.show()

11. Program to implement the working of linear discriminant analysis using IRIS dataset
and visualize the result.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn import datasets

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# Load the iris dataset

iris = datasets.load_iris()

X = iris.data # Features

y = iris.target # Target classes

print(y)

# Create an instance of LDA

lda = LDA(n_components=2)

# Fit and transform the data

X_lda = lda.fit_transform(X, y)

# Create a DataFrame for visualization


lda_df = pd.DataFrame(data=X_lda, columns=['LD1', 'LD2'])

lda_df['target'] = y

# Map target values to class names

lda_df['target'] = lda_df['target'].map({0: 'Setosa', 1: 'Versicolor', 2: 'Virginica'})

# Plotting

plt.figure(figsize=(10, 6))

sns.scatterplot(data=lda_df, x='LD1', y='LD2', hue='target', palette='viridis', s=100)

plt.title('LDA of Iris Dataset')

plt.xlabel('Linear Discriminant 1')

plt.ylabel('Linear Discriminant 2')

plt.legend(title='Species')

plt.grid()

plt.show()

12. Program to implement multiple linear regression using IRIS dataset, visualize and
analyze the results.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the iris dataset
iris = sns.load_dataset('iris')
print(iris.head())
# Define independent variables (features) and dependent variable (target)
X = iris[['sepal_length', 'sepal_width', 'petal_width']]
y = iris['petal_length']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')


print(f'R-squared: {r2}')
# Visualize the results
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue')
plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', linewidth=2)
plt.title('Actual vs Predicted Petal Length')
plt.xlabel('Actual Petal Length')
plt.ylabel('Predicted Petal Length')
plt.grid()
plt.show()

You might also like