0% found this document useful (0 votes)

16 views28 pages

Complete PDF

ML PDF

Uploaded by

Anshuman Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views28 pages

Complete PDF

ML PDF

Uploaded by

Anshuman Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

K-Nearest Neighbors (KNN) Imputation

Overview:

KNN Imputation is a technique used to handle missing values in a dataset. It replaces missing

values with the mean value of the 'k' nearest neighbors' corresponding feature values.

How it works:

1. Identify Missing Values:

- The algorithm identifies which values are missing in the dataset.

2. Find Neighbors:

- For each missing value, the algorithm finds 'k' nearest neighbors based on other feature values.

3. Compute Imputed Value:

- The missing value is replaced with a weighted average of the corresponding values from its 'k'

nearest neighbors.

KNN Imputation in the provided code:

1. Handling Missing Values:

df['resting bp s'].replace(0, np.nan, inplace=True)

df['cholesterol'].replace(0, np.nan, inplace=True)

- Zero values in 'resting bp s' and 'cholesterol' columns are replaced with NaN to indicate missing

values.
2. Checking Missing Values:

df.isnull().sum()

- The number of missing values in each column is checked.

3. Importing KNNImputer:

from sklearn.impute import KNNImputer

knn = KNNImputer(n_neighbors=3, weights='distance')

df = knn.fit_transform(df)

- The KNNImputer is imported and initialized with n_neighbors=3, meaning it will consider 3

nearest neighbors for imputation.

- weights='distance' means that closer neighbors will have a greater influence on the imputed

value.

- The fit_transform method is used to impute missing values in the dataframe (df).

4. Converting Back to DataFrame:

df = pd.DataFrame(df, columns=['age', 'sex', 'chest pain type', 'resting bp s', 'cholesterol', 'fasting

blood sugar', 'resting ecg', 'max heart rate', 'exercise angina', 'oldpeak', 'ST slope', 'target'])

df.head()

- The imputed data is converted back into a pandas DataFrame with the appropriate column

names.

5. Checking for Missing Values Again:

df.isnull().sum()

- Ensuring that all missing values have been imputed correctly.

K-Nearest Neighbors (KNN) Classifier

Overview:

K-Nearest Neighbors (KNN) is a simple, supervised machine learning algorithm used for classification

and regression tasks. It classifies a data point based on how its neighbors are classified.

How it works:

1. Training Phase:

- KNN stores all the available data points.

- There is no actual training process in KNN; it is a lazy learner, meaning it doesn't learn an

explicit model but memorizes the training instances.

2. Prediction Phase:

- When a new data point needs to be classified, KNN looks at the 'k' nearest data points

(neighbors) in the training dataset.

- 'k' is a user-defined constant. For example, if k=3, the three closest data points to the new point

are considered.

- The class with the majority vote among these 'k' neighbors is assigned to the new data point.

KNN Classifier in the provided code:

1. Importing KNeighborsClassifier:

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier()
2. Training the Model:

model.fit(X_train, Y_train)

- The model is fitted using the training data (X_train, Y_train).

- Although KNN doesn't explicitly train a model, the fit method stores the training data within the

classifier.

3. Cross-Validation:

from sklearn.model_selection import cross_val_score

accuracy_score = cross_val_score(model, X_train, Y_train, scoring='accuracy', cv=10)

precision = cross_val_score(model, X_train, Y_train, scoring='precision', cv=10)

recall = cross_val_score(model, X_train, Y_train, scoring='recall', cv=10)

f1_score = cross_val_score(model, X_train, Y_train, scoring='f1', cv=10)

- Cross-validation is used to evaluate the model's performance. Here, 10-fold cross-validation

(cv=10) is performed.

- Accuracy, precision, recall, and F1-score are calculated and printed to assess the classifier's

performance.

4. Making Predictions:

y_pred = model.predict(X_test)

- Predictions are made on the test data (X_test).

5. Evaluating Predictions:

from sklearn.metrics import confusion_matrix

print(confusion_matrix(Y_test, y_pred))

- A confusion matrix is generated to evaluate the predictions. It provides insight into the true
positives, true negatives, false positives, and false negatives.
PCA Analysis, Selection, and Model Training

Principal Component Analysis (PCA)

PCA Overview:

- PCA is a technique used to reduce the dimensionality of a dataset while retaining most of the

variance in the data. It transforms the data into a new set of variables called principal components.

- Each principal component is a linear combination of the original variables and is orthogonal to the

others, ensuring that there is no redundancy.

Steps in the Code

1. Importing Libraries:

from sklearn.decomposition import PCA

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

2. Loop to Reduce Dimensionality:

accuracies = []

for n_components in range(1, 12):

# Reduce dimensionality

pca = PCA(n_components = n_components)

PCA Analysis, Selection, and Model Training

X_reduced = pca.fit_transform(X)

- A loop is created to iterate over different numbers of principal components (from 1 to 11).

- For each iteration, the PCA model is initialized with a specific number of components.

- The `fit_transform` method is applied to the dataset `X` to reduce its dimensionality.

Data Splitting and Model Training

Data Splitting:

# Split data into training and test sets

X_train, X_test, Y_train, Y_test = train_test_split(X_reduced, Y, test_size=0.2, random_state=32)

- The reduced dataset is split into training and testing sets. 80% of the data is used for training, and

20% for testing.

- The `random_state` parameter ensures that the data is split in the same way each time the code is

run.

Model Training:

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier()

model.fit(X_train, Y_train)

- A k-nearest neighbors (KNN) classifier is initialized and trained using the training data.
PCA Analysis, Selection, and Model Training

Model Evaluation:

# Make predictions on the test set

y_pred = model.predict(X_test)

# Calculate accuracy

from sklearn.model_selection import cross_val_score

accuracy = cross_val_score(model, X_train, Y_train, scoring='accuracy', cv=10)

accuracies.append(np.mean(accuracy))

print(f"Accuracy with {n_components} components: {accuracy}")

- Predictions are made on the test set.

- The accuracy of the model is calculated using 10-fold cross-validation on the training set.

- The mean accuracy for each number of components is stored and printed.

Explained Variance and Selection of Principal Components

Explained Variance:

explained_variance_ratio = model.explained_variance_ratio_

total_explained_variance_ratio = explained_variance_ratio.sum()

print("Explained Variance Ratio", explained_variance_ratio)

print(f"Total Explained Variance Ratio: {total_explained_variance_ratio:.4f}")

PCA Analysis, Selection, and Model Training

- The explained variance ratio for each principal component is calculated.

- The total explained variance ratio is the sum of the explained variance ratios of all principal

components, representing the amount of variance retained by the PCA transformation.

Cumulative Explained Variance:

cumulative_explained_variance = np.cumsum(explained_variance_ratio)

- The cumulative explained variance is calculated to understand how many components are needed

to retain a certain amount of variance.

Plotting Explained Variance:

plt.figure(figsize=(10, 6))

plt.plot(range(1, len(explained_variance_ratio) + 1), cumulative_explained_variance, marker='o',

linestyle='--')

plt.title('Explained Variance by Different Principal Components')

plt.xlabel('Number of Principal Components')

plt.ylabel('Cumulative Explained Variance')

plt.grid()

plt.show()

- The cumulative explained variance is plotted to visualize the relationship between the number of

principal components and the amount of variance retained.

PCA Analysis, Selection, and Model Training

Selecting Number of Components:

threshold = 0.90

num_components_to_keep = np.where(cumulative_explained_variance >= threshold)[0][0] + 1

print(f'Number of components to keep: {num_components_to_keep}')

- A threshold (e.g., 90%) is set to decide how much variance should be retained.

- The number of components needed to retain at least the threshold amount of variance is

determined and printed.

Final Model Training and Evaluation

Dropping Unnecessary Components:

X_pca = X_pca.drop(columns = 'PC3', axis=1)

# ... (similar lines for other PCs)

X_pca.head()

- Principal components that are not needed based on the explained variance threshold are dropped

from the dataset.

Splitting Data and Training Final Model:

from sklearn.model_selection import train_test_split

PCA Analysis, Selection, and Model Training

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)

print(X.shape, X_train.shape, X_test.shape)

- The original data is split into training and testing sets.

- A KNN model is initialized and trained using the training data.

Evaluating the Final Model:

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier()

model.fit(X_train, Y_train)

Cross-validation:

from sklearn.model_selection import cross_val_score

accuracy_score = cross_val_score(model, X_train, Y_train, scoring='accuracy', cv=10)

print("accuracy_score=", np.mean(accuracy_score))

precision = cross_val_score(model, X_train, Y_train, scoring='precision', cv=10)

print("precision=", np.mean(precision))

recall = cross_val_score(model, X_train, Y_train, scoring='recall', cv=10)

print("recall=", np.mean(recall))

- Cross-validation is performed to calculate the accuracy, precision, and recall of the final model on

the training data.

PCA Analysis, Selection, and Model Training

Testing the Final Model:

from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score

y_pred = model.predict(X_test)

accuracy = accuracy_score(Y_test, y_pred)

precision = precision_score(Y_test, y_pred)

recall = recall_score(Y_test, y_pred)

cm = confusion_matrix(Y_test, y_pred)

print("Confusion Matrix:

", cm)

print(f"Accuracy: {accuracy:.4f}")

print(f"Precision: {precision:.4f}")

print(f"Recall: {recall:.4f}")

- Predictions are made on the test set.

- Accuracy, precision, and recall scores are calculated and printed.

- The confusion matrix is printed to visualize the performance of the model.

Data Splitting Based on Target Values
Split the dataset based on the target values:

df0 = df[df['target'] == 0]
df1 = df[df['target'] == 1]
df0.head()

• Splits the DataFrame df into two DataFrames, df0 and df1, based on
the target values (0 and 1). This allows separate analysis for each target
group.

Separating Dependent and Independent Variables

df_dependent = df['target']
df_independent = df.drop(columns='target', axis=1)

• Separates the target column (dependent variable) from the rest of the
dataset (independent variables).

Applying PCA

model = PCA(n_components=11)
# Fit transform
model.fit(df_independent)

• Initializes a PCA model to reduce the dimensionality of the data to 11

components and fits the model to the independent variables.

Summary of Steps
1. Splitting Data Based on Target Values:
o Data is split into two groups based on the target variable.
2. Removing the Target Column:
o The target column is removed from the data to focus on the
features.
3. Converting to Numpy Arrays:
o The data is converted to numpy arrays for mathematical operations.
4. Calculating Frobenius Norm:
The Frobenius norm is calculated to measure the difference
o
between the original and target matrices.
5. Computing Neutrality Target:
o The difference between the Frobenius norms is squared to get the
neutrality target.

This approach helps in understanding the differences in the data before and after
applying transformations, such as SVD, by measuring the Frobenius norm,
which provides an overall error measure.

Centering Data and SVD

Loading and Preparing Data:

df0 = df[df['target'] == 0]
df1 = df[df['target'] == 1]
df0.head()

• df0 and df1 are created by splitting the dataset df based on the target
values (0 and 1).
• The head() method is used to display the first few rows of df0.

Removing the Target Column:

df0 = df0.drop(columns='target', axis=1)

df1 = df1.drop(columns='target', axis=1)
X_target0 = X_target0.drop(columns='target', axis=1)
X_target1 = X_target1.drop(columns='target', axis=1)

• The target column is dropped from df0, df1, X_target0, and

X_target1 to prepare the data for further analysis.

Converting DataFrames to Numpy Arrays:

df0 = df0.to_numpy()
X_target0 = X_target0.to_numpy()
df1 = df1.to_numpy()
X_target1 = X_target1.to_numpy()

• The dataframes are converted to numpy arrays for numerical operations.

Calculating Frobenius Norm:

frobenius_norm0_for_ed = np.linalg.norm(df0 - X_target0)

frobenius_norm0_for_ed
frobenius_norm1_for_ed = np.linalg.norm(df1 - X_target1)
frobenius_norm1_for_ed

• The Frobenius norm is calculated for df0 - X_target0 and df1 -

X_target1. The Frobenius norm is a measure of the difference
between two matrices.
• This norm is stored in frobenius_norm0_for_ed and
frobenius_norm1_for_ed.

Calculating Neutrality Target:

neutrality_target_for_ed = (frobenius_norm1_for_ed -
frobenius_norm0_for_ed) ** 2
neutrality_target_for_ed

• The neutrality target is calculated as the square of the difference between

frobenius_norm1_for_ed and frobenius_norm0_for_ed

Reconstructing the original data from the principal components:

X_reconstructed = model.inverse_transform(X_d)
X_reconstructed

o X_reconstructed reconstructs the original data from the principal

components using the inverse transform method of the PCA model.
o X_reconstructed displays the reconstructed data.

Updating the dataframe with reconstructed data and target values:

df_independent = pd.DataFrame(X_reconstructed, columns=['sex', 'chest

pain type', 'resting bp s', 'cholesterol', 'fasting blood sugar', 'resting ecg',
'max heart rate', 'exercise angina', 'oldpeak', 'ST slope', 'age_group'])
df_independent['target'] = df_dependent
df_independent.head()

o df_independent is updated with the reconstructed data and

appropriate column names.
o The target column is re-added to df_independent.
o df_independent.head() displays the first few rows of the
updated dataframe.

These explanations cover the main steps in the code provided in your PDF,
focusing on data preprocessing, applying PCA, and reconstructing the data.
1. Explained Variance Ratio

Explained Variance Ratio (EVR) indicates the proportion of the dataset's variance that is

capturedby each principal component in PCA. It's a measure of how much information

(variance) can be attributed to each of the principal components.

Variance Calculation: Variance is calculated for each feature to understand its contribution to

the overall data spread.

Total Variance: Sum of the variances of all features.

EVR Calculation: Each feature's variance is divided by the total variance to get the proportion

of variance explained by that feature.

Sorting: Sorting EVR in descending order helps identify which features contribute most to the
variance.

Example Calculation:

EV = df.var()

s=sum(EV)

EVR = EV / s

EVR.sort_values(ascending=False)

This gives a sorted list of features based on their contribution to the total variance.

2. Binning Cholesterol Data

Binning is a process of converting continuous data into categorical data by dividing it into
intervals (bins).
Define Bin Edges: Set boundaries for the bins.

Labels for Bins: Assign labels to these bins for easy

identification. pd.cut(): Use pd.cut() to segment and sort

data values into bins.

Value Counts: Count the number of occurrences in each bin to understand the distribution.

Example Calculation:

bin_edges = [100, 200, 240, 371]

bin_labels = ['100-200', '200-240', '240-400']

df['Cholesterol_bin'] = pd.cut(df['cholesterol'], bins=bin_edges, labels=bin_labels, right=False)

df['Cholesterol_bin'].value_counts()

This categorizes cholesterol levels and counts how many values fall into each category.

3. Label Encoding

Label Encoding converts categorical data into numerical format, which is necessary for

manymachine learning algorithms.

Import LabelEncoder: Import from sklearn.preprocessing.

Fit and Transform: Encode the binned cholesterol data into numerical

labels. Add Encoded Labels: Create a new column with these labels in

the DataFrame.

Example Calculation:
from sklearn.preprocessing import LabelEncoder

label_encode = LabelEncoder()

labels = label_encode.fit_transform(df.Cholesterol_bin)

df['Cholesterol_group'] = labels
new_df = df.drop(columns='Cholesterol_bin', axis=0)

new_df.info()

This converts the binned cholesterol data into numerical values and adds it to the DataFrame.

4. SVD and Reconstruction

Singular Value Decomposition (SVD) decomposes a matrix into three other matrices and is used for

dimensionality reduction and data compression.

Centering Data: Subtract the mean from each feature to center the data. SVD

Decomposition: Decompose the centered data into U, S, Vt matrices.

Projection: Project the centered data onto the singular vectors.

Reconstruction: Use the inverse transform to approximate the original data from the reduced dimensions.

Example Calculation:

X_centered = new_df - new_df.mean()

U, S, Vt = np.linalg.svd(X_centered)

X_d = X_centered.dot(Vt.T[:, :])

X_d = X_d.to_numpy()

X_reconstructed = model1.inverse_transform(X_d)

X_reconstructed

This process reduces the dimensionality and then reconstructs the data to approximate the original data.
5. Creating DataFrame from Reconstructed Data

After reconstruction, the data needs to be converted back into a DataFrame for further analysis.

DataFrame Creation: Convert the numpy array of reconstructed data into a DataFrame. Add

Categorical Labels: Re-add the cholesterol group labels to the DataFrame.

Example Calculation:

X = pd.DataFrame(X_reconstructed, columns=['sex', 'chest pain type', 'resting bp s', 'cholesterol', 'fasting blood

sugar', 'resting ecg', 'max heart rate', 'exercise angina', 'oldpeak', 'ST slope', 'age_group'])

X['Cholesterol_group'] = new_Y

X.head()

This organizes the reconstructed data back into a structured format.

6. Splitting Reconstructed Data

Splitting data based on the categorical labels helps in comparing different groups.

Filter Data: Create separate DataFrames for each cholesterol group.

Example Calculation:

X_cholesterol0 = X[X['Cholesterol_group'] == 0]

X_cholesterol1 = X[X['Cholesterol_group'] == 1]

X_cholesterol2 = X[X['Cholesterol_group'] == 2]

X_cholesterol0.head()
This segregates the data based on cholesterol groups for further analysis.

7. Frobenius Norm Calculation

The Frobenius Norm measures the difference between two matrices, often used to quantifyreconstruction

error.

Convert to Numpy: Convert DataFrames to numpy arrays for computation.

Compute Norm: Calculate the Frobenius norm between original and reconstructed data for eachgroup.

Example Calculation:

df_Cholesterol0 = df_Cholesterol0.to_numpy()

X_cholesterol0 = X_cholesterol0.to_numpy()

frobenius_norm0 = np.linalg.norm(df_Cholesterol0 - X_cholesterol0)

frobenius_norm0

This quantifies how well the reconstructed data matches the original data.

8. Neutrality Calculation

Neutrality measures the consistency of reconstruction errors across different groups.

Squared Differences: Compute the squared differences of Frobenius norms between groups. Average Squared

Differences: Average these squared differences to get the neutrality measure.

Example Calculation:

neutrality_cholesterol = ((frobenius_norm1 - frobenius_norm0) ** 2 + (frobenius_norm2 -

frobenius_norm1) ** 2 + (frobenius_norm2 - frobenius_norm0) ** 2) / 3

neutrality_cholesterol

This provides a measure of how uniformly the reconstruction error is distributed across differentgroups.

9. Plotting Frobenius Norms

Visualizing the Frobenius norms helps in comparing the reconstruction errors across groups.

Set Labels and Values: Define labels and norms for plotting.Plot

Setup: Configure the plot dimensions and bar width.

Plot: Use matplotlib to create the plot (incomplete in the given code).

Example Calculation:

labels = ['cholesterol range >240', 'cholesterol range 200-240', 'cholesterol range <200']

svd_norms = [frobenius_norm0, frobenius_norm1, frobenius_norm2]

x = np.arange(len(labels)) # label

locationswidth = 0.35

# width of the bars

# Plotting

fig, ax = plt.subplots()

rects1 = ax.bar(x - width/2, svd_norms, width, label='SVD Norms')

# Add some text for labels, title and custom x-axis tick labels, etc.

ax.set_ylabel('Frobenius Norms')

ax.set_title('Frobenius Norms by Cholesterol Groups')

ax.set_xticks(x)

ax.set_xticklabels(labels)

ax.legend()

fig.tight_layout()

plt.show()

This would create a bar chart comparing the Frobenius norms for different cholesterol groups.

Each of these steps plays a crucial role in the overall analysis and understanding of the data,ensuring

that the results are meaningful and interpretable.

Step-by-Step Explanation

# Drop the target column from the dataframe

new_df = df.drop(columns='target', axis=1)

This line removes the 'target' column from the df DataFrame, creating a new
DataFrame called new_df.

# Split the dataset based on the chest pain type values

df_chest_pain_type1 = new_df[new_df['chest pain type'] == 1]

df_chest_pain_type2 = new_df[new_df['chest pain type'] == 2]
df_chest_pain_type3 = new_df[new_df['chest pain type'] == 3]
df_chest_pain_type4 = new_df[new_df['chest pain type'] == 4]

These lines filter new_df to create four new DataFrames, each containing rows
where the 'chest pain type' column equals 1, 2, 3, and 4, respectively.

new_Y = new_df['chest pain type']

This line extracts the 'chest pain type' column from new_df and stores it in
new_Y.

model2 = PCA(n_components=11)
# Fit transform
model2.fit(new_df)
X_centered = new_df - new_df.mean()
U, S, Vt = np.linalg.svd(X_centered)
X_d = X_centered.dot(Vt.T[:, :])
X_d = X_d.to_numpy()
X_reconstructed = model.inverse_transform(X_d)
X_reconstructed

1. model2 = PCA(n_components=11): Initializes a PCA model to

reduce the dataset to 11 principal components.
2. model2.fit(new_df): Fits the PCA model to the new_df
DataFrame.
3. X_centered = new_df - new_df.mean(): Centers the data by
subtracting the mean of each column.
4. U, S, Vt = np.linalg.svd(X_centered): Performs Singular
Value Decomposition (SVD) on the centered data.
5. X_d = X_centered.dot(Vt.T[:, :]): Projects the centered
data onto the principal components.
6. X_d = X_d.to_numpy(): Converts the projected data into a numpy
array.
7. X_reconstructed = model.inverse_transform(X_d):
Reconstructs the data from the reduced dimensions using the inverse
transform.
8. X_reconstructed: Displays the reconstructed data.

X = pd.DataFrame(X_reconstructed, columns=['sex', 'chest pain type', 'resting

bp s', 'cholesterol', 'fasting blood sugar', 'resting ecg', 'max heart rate', 'exercise
angina', 'oldpeak', 'ST slope', 'age_group'])

X['chest_pain_type_group'] = new_Y

X.head()

1. X = pd.DataFrame(X_reconstructed, columns=['sex', ...]): Converts the

reconstructed numpy array back into a DataFrame with specified column
names.
2. X['chest_pain_type_group'] = new_Y: Adds the 'chest pain type' column
back to the DataFrame.
3. X.head(): Displays the first few rows of the DataFrame.

# Split the dataset based on the chest pain type values

X_chest_pain_type1 = X[X['chest_pain_type_group'] == 1]
X_chest_pain_type2 = X[X['chest_pain_type_group'] == 2]
X_chest_pain_type3 = X[X['chest_pain_type_group'] == 3]
X_chest_pain_type4 = X[X['chest_pain_type_group'] == 4]

These lines filter X to create four new DataFrames, each containing rows where
the 'chest_pain_type_group' column equals 1, 2, 3, and 4, respectively.

X_chest_pain_type1 = X_chest_pain_type1.drop(columns='chest_pain_type_group', axis=1)

X_chest_pain_type2 = X_chest_pain_type2.drop(columns='chest_pain_type_group', axis=1)
X_chest_pain_type3 = X_chest_pain_type3.drop(columns='chest_pain_type_group', axis=1)
X_chest_pain_type4 = X_chest_pain_type4.drop(columns='chest_pain_type_group', axis=1)
These lines remove the 'chest_pain_type_group' column from each of the four
DataFrames.

df_chest_pain_type1 = df_chest_pain_type1.to_numpy()
X_chest_pain_type1 = X_chest_pain_type1.to_numpy()
frobenius_norm1 = np.linalg.norm(df_chest_pain_type1 - X_chest_pain_type1)
frobenius_norm1

1. df_chest_pain_type1 = df_chest_pain_type1.to_numpy(): Converts

df_chest_pain_type1 to a numpy array.
2. X_chest_pain_type1 = X_chest_pain_type1.to_numpy(): Converts
X_chest_pain_type1 to a numpy array.
3. frobenius_norm1 = np.linalg.norm(df_chest_pain_type1 -
X_chest_pain_type1): Calculates the Frobenius norm between the
original and reconstructed data for chest pain type 1.
4. frobenius_norm1: Displays the calculated Frobenius norm.

df_chest_pain_type2 = df_chest_pain_type2.to_numpy()
X_chest_pain_type2 = X_chest_pain_type2.to_numpy()
frobenius_norm2 = np.linalg.norm(df_chest_pain_type2 - X_chest_pain_type2)
frobenius_norm2

Same as above but for chest pain type 2.

df_chest_pain_type3 = df_chest_pain_type3.to_numpy()
X_chest_pain_type3 = X_chest_pain_type3.to_numpy()
frobenius_norm3 = np.linalg.norm(df_chest_pain_type3 - X_chest_pain_type3)
frobenius_norm3

Same as above but for chest pain type 3.

df_chest_pain_type4 = df_chest_pain_type4.to_numpy()
X_chest_pain_type4 = X_chest_pain_type4.to_numpy()
frobenius_norm4 = np.linalg.norm(df_chest_pain_type4 - X_chest_pain_type4)
frobenius_norm4

Same as above but for chest pain type 4.

labels = ['chest pain scenario 1', 'chest pain scenario 2', 'chest pain scenario 3',
'chest pain scenario 4']
svd_norms = [frobenius_norm1, frobenius_norm2, frobenius_norm3,
frobenius_norm4]
x = np.arange(len(labels)) # label locations
width = 0.35 # width of the bars
1. labels = [...]: Defines labels for the bar plot.
2. svd_norms = [...]: Creates a list of Frobenius norms for each chest pain
type.
3. x = np.arange(len(labels)): Creates an array of label locations.
4. width = 0.35: Sets the width of the bars.

# Plotting
fig, ax = plt.subplots()
rects = ax.bar(x, svd_norms, width, label='SVD')

1. fig, ax = plt.subplots(): Creates a figure and a set of subplots.

2. rects = ax.bar(x, svd_norms, width, label='SVD'): Creates a bar plot of
the Frobenius norms.

# Add some text for labels, title, and custom x-axis tick labels, etc.
ax.set_xlabel('Different chest pain scenario')
ax.set_ylabel('Frobenius Norm')
ax.set_title('Comparison of Frobenius Norms for SVD on chest pain type
attribute')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

1. ax.set_xlabel(...): Sets the x-axis label.

2. ax.set_ylabel(...): Sets the y-axis label.
3. ax.set_title(...): Sets the plot title.
4. ax.set_xticks(x): Sets the x-axis ticks.
5. ax.set_xticklabels(labels): Sets the x-axis tick labels.
6. ax.legend(): Adds a legend to the plot.

# Add a function to label bars with their heights

def autolabel(rects):
"""Attach a text label above each bar in *rects*, displaying its height."""
for rect in rects:
height = rect.get_height()
ax.annotate(f'{height:.2f}', xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')

Defines a function to annotate the bars with their heights.

autolabel(rects)
fig.tight_layout()
plt.show()

1. autolabel(rects): Calls the annotation function to label the bars.

2. fig.tight_layout(): Adjusts the layout for better fit.
3. plt.show(): Displays the plot.

DSA Company Wise
No ratings yet
DSA Company Wise
8 pages
(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Activity 4 Application of Matrix Operations GROUP 1
No ratings yet
Activity 4 Application of Matrix Operations GROUP 1
8 pages
DSP Chapter8 PDF
No ratings yet
DSP Chapter8 PDF
66 pages
Methodology For Speaker Identification and Recognition System
100% (1)
Methodology For Speaker Identification and Recognition System
13 pages
The Answer Is Yes, Worst Case
No ratings yet
The Answer Is Yes, Worst Case
4 pages
TI4101 Perancangan Tata Letak Pabrik: Basic Algorithms For The Layout Problem
No ratings yet
TI4101 Perancangan Tata Letak Pabrik: Basic Algorithms For The Layout Problem
52 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods PDF
41 pages
ML001 (Machine Learning Theory + Practical)
No ratings yet
ML001 (Machine Learning Theory + Practical)
4 pages
DSP Bits
No ratings yet
DSP Bits
5 pages
Guckert Audio Compression SVD MDCT MP3 PDF
No ratings yet
Guckert Audio Compression SVD MDCT MP3 PDF
13 pages
Neville.: TAREA #2. Métodos Numéricos. Por: Ingrid Jiménez López. C.C. 1.152.700.197
No ratings yet
Neville.: TAREA #2. Métodos Numéricos. Por: Ingrid Jiménez López. C.C. 1.152.700.197
11 pages
Papr M
No ratings yet
Papr M
7 pages
Matched Filtering and Digital Pulse Amplitude Modulation (PAM)
No ratings yet
Matched Filtering and Digital Pulse Amplitude Modulation (PAM)
32 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Machine Learning Assignment 3
No ratings yet
Machine Learning Assignment 3
7 pages
Daa Shivaniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
No ratings yet
Daa Shivaniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
6 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Insertion Sort
No ratings yet
Insertion Sort
16 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
Improved Wideband Beamforming Algorithm Based On Microphone Arrays
No ratings yet
Improved Wideband Beamforming Algorithm Based On Microphone Arrays
4 pages
Final ML File
No ratings yet
Final ML File
34 pages
Machine Learning
100% (5)
Machine Learning
56 pages
2018 April EC370-A - Ktu Qbank
No ratings yet
2018 April EC370-A - Ktu Qbank
2 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Certificate
No ratings yet
Certificate
33 pages
DS Manual
No ratings yet
DS Manual
30 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
ML Notes
100% (2)
ML Notes
125 pages
AIT2001 Sample Midterm
No ratings yet
AIT2001 Sample Midterm
8 pages
AMS - 326 - Syllabus - Summer 2024
No ratings yet
AMS - 326 - Syllabus - Summer 2024
2 pages
Cheat Sheet Final
No ratings yet
Cheat Sheet Final
3 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
ML Lab Programs (1-13)
No ratings yet
ML Lab Programs (1-13)
44 pages
KMEANS
No ratings yet
KMEANS
9 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
Recurrence Relations 2
No ratings yet
Recurrence Relations 2
5 pages
Advantages of SJF: Machine Learning Algorithms
No ratings yet
Advantages of SJF: Machine Learning Algorithms
5 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Practical 7
No ratings yet
Practical 7
6 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
KNN Model
No ratings yet
KNN Model
5 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
Binary Search
No ratings yet
Binary Search
5 pages
Major Project
No ratings yet
Major Project
24 pages
Chapter 5 Adversarial Search Algorithms
No ratings yet
Chapter 5 Adversarial Search Algorithms
25 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
Implementing KNN Algorithm On The Iris Dataset
No ratings yet
Implementing KNN Algorithm On The Iris Dataset
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
DSU Important Questios
No ratings yet
DSU Important Questios
4 pages
ML Short
No ratings yet
ML Short
2 pages
Development of A Smart Personalized System To Track and Monitor Alzheimers Patients and Assist in He
No ratings yet
Development of A Smart Personalized System To Track and Monitor Alzheimers Patients and Assist in He
9 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Deep Learning LAB
No ratings yet
Deep Learning LAB
47 pages
Humidifier Using Iot Final
No ratings yet
Humidifier Using Iot Final
21 pages
K-Means Clustering From Scratch
No ratings yet
K-Means Clustering From Scratch
3 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Aml - Lab (1-6)
No ratings yet
Aml - Lab (1-6)
15 pages
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
No ratings yet
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
10 pages
INAIO Stage 2 Sample Problems MLTheory
No ratings yet
INAIO Stage 2 Sample Problems MLTheory
6 pages
ML Lab
No ratings yet
ML Lab
14 pages
Data Analytics Lab Manual - 250402 - 095326
No ratings yet
Data Analytics Lab Manual - 250402 - 095326
58 pages
DA Programs
No ratings yet
DA Programs
44 pages
Big Coding Y11
No ratings yet
Big Coding Y11
250 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Bi 6 New
No ratings yet
Bi 6 New
6 pages
KNN Classifier
No ratings yet
KNN Classifier
5 pages
ML 3
No ratings yet
ML 3
24 pages
DSASSign 4
No ratings yet
DSASSign 4
11 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
Module 3 Lab 2
No ratings yet
Module 3 Lab 2
6 pages
M PDF
No ratings yet
M PDF
13 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
V
No ratings yet
V
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
Comp Network Handwritten Notes - Special
No ratings yet
Comp Network Handwritten Notes - Special
207 pages
Data Science
No ratings yet
Data Science
41 pages
Automata Assignment
No ratings yet
Automata Assignment
24 pages
Solar Energy
No ratings yet
Solar Energy
6 pages
DBMS Ass1 - Again
No ratings yet
DBMS Ass1 - Again
2 pages