0% found this document useful (0 votes)

14 views60 pages

ML Solution

The document contains multiple Python programming tasks related to data analysis and machine learning, including the implementation of algorithms like Apriori, Linear Regression, Logistic Regression, K-means, and K-Nearest Neighbors. It covers various datasets such as groceries, Iris, house prices, wholesale customers, crash data, mall customers, fuel consumption, Boston housing, and employee income data. Each task includes code snippets for loading data, preprocessing, model training, predictions, and visualizations.

Uploaded by

Faizal Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views60 pages

ML Solution

Uploaded by

Faizal Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 60

Slip1

1.Use Apriori algorithm on groceries dataset to find which items are brought
together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt
data = pd.read_csv('C:\csv\groceries.csv')
print(data)
data = data.apply(lambda row: row.dropna().tolist(), axis=1)
one_hot_data =
pd.get_dummies(data.apply(pd.Series).stack()).groupby(level=0).sum()
frequent_itemsets = apriori(one_hot_data,
min_support=0.25,use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)

2. Write a Python program to prepare Scatter Plot for Iris Dataset. Convert
Categorical values in numeric format for a dataset.
# Import libraries
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Iris dataset from a CSV file

iris_df = pd.read_csv('iris.csv')

# Convert categorical 'species' column to numeric format

iris_df['species'] = pd.Categorical(iris_df['species']).codes

# Create a scatter plot with numeric species values

plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris_df, x='sepal length (cm)', y='sepal width (cm)',
hue='species', palette='viridis', s=100)
plt.title('Iris Dataset Scatter Plot with Numeric Species')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.grid(True)
plt.show()

Slip2
Q.1. Write a python program to implement simple Linear Regression for predicting
house price. First find all null values in a given dataset and remove them. [15 M]
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable

X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model

model = LinearRegression().fit(X_train, y_train)
# Predict and visualize
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, model.predict(X_test), color='red', label='Predicted Prices')
plt.title('House Price Prediction')
plt.xlabel('Size of House')
plt.ylabel('Price of House')
plt.legend()
plt.show()

# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")

Q.2. The data set refers to clients of a wholesale distributor. It includes the annual
spending in monetary units on diverse product categories. Using data Wholesale
customer dataset compute agglomerative clustering to find out annual spending
clients in the same region.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler

# Load the dataset

df = pd.read_csv('C:\csv\Wholesale customers data.csv').dropna()

# Prepare features (dropping non-numeric columns)

features = df.drop(['Channel', 'Region'], axis=1)

# Standardize the data

features_scaled = StandardScaler().fit_transform(features)
# Perform agglomerative clustering
clusters = AgglomerativeClustering(n_clusters=3).fit_predict(features_scaled)

# Add cluster labels to the DataFrame

df['Cluster'] = clusters

# Visualize clusters
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Fresh', y='Milk', hue='Cluster', palette='viridis', s=100)
plt.title('Agglomerative Clustering of Wholesale Customers')
plt.xlabel('Annual Spending on Fresh Products')
plt.ylabel('Annual Spending on Milk Products')
plt.legend(title='Cluster')
plt.grid(True)
plt.show()

# Display cluster means

print("\nCluster Distribution:")
print(df.groupby('Cluster').mean())

Slip3
Q.1. Write a python program to implement multiple Linear Regression for a house
price dataset. Divide the dataset into training and testing data. [15 M]
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset

# Replace 'house_prices.csv' with the actual path to your dataset
df = pd.read_csv('C:\csv\house-prices.csv')

# Display the first few rows

print("Dataset Preview:")
print(df.head())

# Define features (independent variables) and target (dependent variable)

# Assume 'size', 'bedrooms', 'age' are features, and 'price' is the target
X = df[['SqFt', 'Bedrooms', 'Bathrooms']] # Adjust column names as necessary
y = df['Price']

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Create and train the model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test data

y_pred = model.predict(X_test)

# Display results
print("\nModel Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

Q.2. Use dataset crash.csv is an accident survivor’s dataset portal for USA hosted
by data.gov. The dataset contains passengers age and speed of vehicle (mph) at the
time of impact and fate of passengers (1 for survived and 0 for not survived) after a
crash. use logistic regression to decide if the age and speed can predict the
survivability of the passengers.
# Import necessary libraries

import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the dataset

df = pd.read_csv('crash.csv')

# Display the first few rows of the dataset

print("Dataset Preview:\n", df.head())

# Define features and target

X = df[['age', 'speed']]

y = df['fate'] # Assuming 'fate' is 1 for survived and 0 for not survived

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train a logistic regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Make predictions

y_pred = model.predict(X_test)

# Display accuracy and classification report

print("\nAccuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))

# Confusion matrix

conf_matrix = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:\n", conf_matrix)

# Plot the results

plt.figure(figsize=(10, 6))

sns.scatterplot(data=df, x='age', y='speed', hue='fate', palette='viridis', s=100)

plt.title('Logistic Regression Prediction of Passenger Survivability')

plt.xlabel('Age')

plt.ylabel('Speed (mph)')

plt.legend(title='Fate (0 = Not Survived, 1 = Survived)')

plt.grid(True)

plt.show()

Slip 4
Q.1. Write a python program to implement k-means algorithm on a
mall_customers dataset.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load dataset
df = pd.read_csv('C:\csv\mall_customers.csv')

# Select features
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

# Standardize the data

X_scaled = StandardScaler().fit_transform(X)
# Apply K-means
kmeans = KMeans(n_clusters=5, random_state=0)
df['Cluster'] = kmeans.fit_predict(X_scaled)

# Plot clusters
plt.figure(figsize=(10, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=df['Cluster'], cmap='viridis', s=100)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,
c='red', label='Centroids')
plt.title('K-means Clustering of Mall Customers')
plt.xlabel('Annual Income (scaled)')
plt.ylabel('Spending Score (scaled)')
plt.legend()
plt.grid(True)
plt.show()

# Display cluster counts

print("\nNumber of customers in each cluster:")
print(df['Cluster'].value_counts())

Q.2. Write a python program to Implement Simple Linear Regression for

predicting house
price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable

X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model

# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")

Slip 5
Q.1. Write a python program to implement Multiple Linear Regression for Fuel
Consumption dataset.
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset

# Replace 'fuel_consumption.csv' with the actual path to your dataset
df = pd.read_csv('fuel_consumption.csv')

# Display the first few rows of the dataset

print("Dataset Preview:\n", df.head())
# Define features (independent variables) and target (dependent variable)
X = df[['engine_size', 'horsepower', 'weight']] # Adjust based on your dataset
y = df['fuel_consumption']

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Create and train the model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test data

y_pred = model.predict(X_test)

# Display results
print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

# Plot actual vs predicted values

plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue')
plt.plot(y_test, y_test, color='red', linewidth=2) # Ideal line
plt.title('Actual vs Predicted Fuel Consumption')
plt.xlabel('Actual Fuel Consumption')
plt.ylabel('Predicted Fuel Consumption')
plt.grid(True)
plt.show()
Q.2. Write a python program to implement k-nearest Neighbors ML algorithm to
build prediction model (Use iris Dataset)
# Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train K-NN model

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predictions and evaluation

y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

# Example prediction
sample = [[5.1, 3.5, 1.4, 0.2]]
pred_class = knn.predict(sample)
print("Predicted Class:", iris.target_names[pred_class[0]])

Slip6
Q.1. Write a python program to implement Polynomial Linear Regression for
Boston Housing Dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset (replace 'boston_housing.csv' with the actual CSV file name)
data = pd.read_csv('C:\\csv\\BostonHousing.csv')
print(data)

# Use 'RM' (average number of rooms per dwelling) as the feature and 'MEDV'
(median house value) as the target
X = data[['rm']]
y = data['medv']

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Transform the feature into polynomial terms (e.g., degree 2)

poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train the Polynomial Regression model

model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict on the test set

y_pred = model.predict(X_test_poly)

# Calculate and print the Mean Squared Error

mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Plot the results

plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='red', label='Polynomial Regression Fit')
plt.xlabel('Average number of rooms (RM)')
plt.ylabel('Median value of homes (MEDV)')
plt.title('Polynomial Linear Regression on Boston Housing Dataset')
plt.legend()
plt.show()

Q.2. Use K-means clustering model and classify the employees into various
income groups or clusters. Preprocess data if require (i.e. drop missing or null
values).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the dataset (replace 'employees.csv' with your actual CSV file name)
data = pd.read_csv('C:\\csv\\employee_data.csv')
# Display the first few rows and info about the dataset
print(data.head())
print(data.info())

# Preprocess: Drop missing values

data = data.dropna()

# Select relevant features (assuming 'Income' is the feature for clustering)

X = data[['EmpID']]

# Standardize the data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply K-means clustering

kmeans = KMeans(n_clusters=3, random_state=42) # Adjust n_clusters as needed
data['Cluster'] = kmeans.fit_predict(X_scaled)

# Display the cluster assignments

print(data[['EmpID', 'Cluster']])
# Visualize the clusters
plt.figure(figsize=(10, 6))
plt.scatter(data['EmpID'], np.zeros_like(data['EmpID']), c=data['Cluster'],
cmap='viridis', marker='o')
plt.title('K-means Clustering of Employees by EmpID')
plt.xlabel('EmpID')
plt.yticks([]) # Hide y-axis ticks
plt.grid(True)
plt.show()

Slip7
Q.1. Fit the simple linear regression model to Salary_positions.csv data. Predict the sa of level 11
and level 12 employees.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Load and preview dataset

df = pd.read_csv('C:\csv\Position_Salaries.csv')
X = df[['Level']]
y = df['Salary']

# Train the model

model = LinearRegression()
model.fit(X, y)

# Predict salaries for levels 11 and 12

predicted_salaries = model.predict([[11], [12]])
print(f"Predicted salary for level 11: ${predicted_salaries[0]:.2f}")
print(f"Predicted salary for level 12: ${predicted_salaries[1]:.2f}")

# Plot data and predictions

plt.scatter(X, y, color='blue', label='Actual Salaries')
plt.plot(X, model.predict(X), color='red', label='Regression Line')
plt.scatter([11, 12], predicted_salaries, color='green', marker='x', s=100, label='Predictions')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.legend()
plt.show()

Q.2.Write a python program to implement Naive Bayes on weather forecast dataset.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

# Load the dataset

data = pd.read_csv('C:\\csv\\weather_forecast.csv')

# Display the first few rows

print(data.head())

# Preprocess: Encode categorical variables

label_encoders = {}
for column in data.columns:
if data[column].dtype == 'object':
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le

# Separate features and target variable

X = data.drop('weather', axis=1) # 'Play' is the target variable here
y = data['weather']
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Naive Bayes classifier

nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict on the test data

y_pred = nb_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Slip8

Q.1. Write a python program to categorize the given news text into one of the
available 20 categories of news groups, using multinomial Naïve Bayes machine
learning model. [15 M]
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the 20 Newsgroups dataset

newsgroups = fetch_20newsgroups(subset='all')

# Prepare the data

X = newsgroups.data # News text
y = newsgroups.target # Corresponding categories

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Convert text data to feature vectors using Count Vectorization

vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the Multinomial Naive Bayes model

model = MultinomialNB()
model.fit(X_train_vectorized, y_train)

# Make predictions
y_pred = model.predict(X_test_vectorized)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')
print('Classification Report:')
print(classification_report(y_test, y_pred,
target_names=newsgroups.target_names))

# Example of predicting a new article

new_article = ["This is a new article about sports and fitness."]
new_article_vectorized = vectorizer.transform(new_article)
predicted_category = model.predict(new_article_vectorized)
print(f'Predicted Category: {newsgroups.target_names[predicted_category[0]]}')

Q.2. Write a python program to implement Decision Tree whether or not to play
Tennis
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import matplotlib.pyplot as plt

# Load the dataset

data = pd.read_csv('C:\\csv\\play_tennis.csv')
print(data)

# Preprocess the data

X = data.drop('play', axis=1) # Features
y = data['play'] # Target variable

# Convert categorical variables to numerical using one-hot encoding

X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train the Decision Tree model

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", classification_report(y_test, y_pred))

# Display the decision tree rules

tree_rules = export_text(model, feature_names=list(X.columns))
print("Decision Tree Rules:\n", tree_rules)

# Visualize the decision tree

plt.figure(figsize=(12, 8))
tree.plot_tree(model, feature_names=X.columns, class_names=['No', 'Yes'],
filled=True)
plt.title("Decision Tree for Play Tennis")
plt.show()
Slip 9
Q.1. Implement Ridge Regression and Lasso regression model using
boston_houses.csv and take only ‘RM’ and ‘Price’ of the houses. Divide the data
as training and testing data. Fit line using Ridge regression and to find price of a
house if it contains 5 rooms and compare results.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso

# Load the dataset

data = pd.read_csv('C:\\csv\\BostonHousing.csv')
print(data)
# Select relevant features
X = data[['rm']]
y = data['medv']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Ridge Regression

ridge_model = Ridge(alpha=1.0) # You can adjust the alpha value as needed
ridge_model.fit(X_train, y_train)

# Fit Lasso Regression

lasso_model = Lasso(alpha=1.0) # You can adjust the alpha value as needed
lasso_model.fit(X_train, y_train)

# Predict price for a house with 5 rooms

rooms = np.array([[5]])
ridge_prediction = ridge_model.predict(rooms)
lasso_prediction = lasso_model.predict(rooms)

# Display the results

print(f"Ridge Regression Prediction for 5 rooms: ${ridge_prediction[0]:.2f}")
print(f"Lasso Regression Prediction for 5 rooms: ${lasso_prediction[0]:.2f}")

Q.2. Write a python program to implement Linear SVM using UniversalBank.csv

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score

# Load the dataset

data = pd.read_csv('C:\\csv\\UniversalBank.csv')

# Display the first few rows of the dataset

print(data.head())

# Preprocess the data

# Assuming 'PersonalLoan' is the target variable and other columns are features
X = data.drop(['Personal Loan', 'ID'], axis=1) # Drop target and non-feature columns
y = data['Personal Loan'] # Target variable
# Convert categorical variables to dummy variables (if any)
X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the Linear SVM model

model = SVC(kernel='linear', random_state=42)
model.fit(X_train_scaled, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test_scaled)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Slip 10
Q.1. Write a python program to transform data with Principal Component Analysis
(PCA). Use iris dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset

data = pd.read_csv('C:\\csv\\iris.csv') # Ensure your CSV is named correctly
print(data)

# Separate features and target variable

X = data.drop('Species', axis=1) # Drop the target column
y = data['Species'] # Target variable

# Standardize the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2) # Reduce to 2 dimensions
X_pca = pca.fit_transform(X_scaled)

# Create a DataFrame with the PCA results

pca_df = pd.DataFrame(data=X_pca, columns=['PC1', 'PC2'])
pca_df['species'] = y

# Plot the PCA results

plt.figure(figsize=(8, 6))
for species in np.unique(y):
plt.scatter(pca_df[pca_df['species'] == species]['PC1'],
pca_df[pca_df['species'] == species]['PC2'],
label=species)

plt.title('PCA of Iris Dataset')

plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid()
plt.show()

Q.2. Write a Python program to prepare Scatter Plot for Iris Dataset. Convert
Categorical values in to numeric
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load and prepare the Iris dataset

iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Create a scatter plot

plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris_df, x='sepal length (cm)', y='sepal width (cm)',
hue='species', style='species', s=100)
plt.title('Iris Dataset Scatter Plot')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.grid(True)
plt.show()

Slip11
Q.1. Write a python program to implement Polynomial Regression for Boston
Housing Dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset

data = pd.read_csv('C:\\csv\\BostonHousing.csv') # Replace with the correct path
to your dataset

# Use 'RM' (average number of rooms) as the feature and 'MEDV' (median house
value) as the target
X = data[['rm']] # Features
y = data['medv'] # Target variable

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Transform the feature into polynomial terms (degree 2)

poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train the Polynomial Regression model

model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict on the test set

y_pred = model.predict(X_test_poly)

# Calculate Mean Squared Error

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Plot the results

plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='red', label='Polynomial Regression Fit')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('Median Value of Homes (MEDV)')
plt.title('Polynomial Regression on Boston Housing Dataset')
plt.legend()
plt.grid()
plt.show()

Q.2. Write a python program to Implement Decision Tree classifier model on Data
which is extracted from images that were taken from genuine and forged banknote-
like specimens.
(refer UCI dataset https://archive.ics.uci.edu/dataset/267/banknote+authentication)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset from the UCI repository

url =
"https://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.
txt"
column_names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
data = pd.read_csv(url, header=None, names=column_names)

# Separate features and target variable

X = data.drop('Class', axis=1)
y = data['Class']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Decision Tree Classifier

classifier = DecisionTreeClassifier(random_state=42)
classifier.fit(X_train, y_train)

# Make predictions on the test set

y_pred = classifier.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

# Plot the Decision Tree

plt.figure(figsize=(20, 10))
plot_tree(classifier, feature_names=X.columns, class_names=['Not Authenticated',
'Authenticated'], filled=True, rounded=True)
plt.show()

Slip12
Q.1. Write a python program to implement k-nearest Neighbors ML algorithm to
build prediction model (Use iris Dataset). [15 M]
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train K-NN model

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predictions and evaluation

y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

# Example prediction
sample = [[5.1, 3.5, 1.4, 0.2]]
pred_class = knn.predict(sample)
print("Predicted Class:", iris.target_names[pred_class[0]])

2. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_splitfrom sklearn.metrics import
mean_squared_error

# Load the dataset

data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable

X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Simple Linear Regression

simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models

mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')

print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')

# Predict salaries for levels 11 and 12

level_11_salary = poly_model.predict(poly.transform([[11]]))[0]
level_12_salary = poly_model.predict(poly.transform([[12]]))[0]
print(f'Predicted salary for level 11: ${level_11_salary:.2f}')
print(f'Predicted salary for level 12: ${level_12_salary:.2f}')

# Plotting the results

plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)plt.scatter(X_test, y_pred_poly, color='green',
label='Polynomial Regression Predictions', alpha=0.5)

# Plot Polynomial Regression Fit

X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')

plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()
Slip 13
Q.1. Create RNN model and analyze the Google stock price dataset. Find out
increasing or decreasing trends of stock price for the next day. [15 M]
Q.2. Write a python program to implement simple Linear Regression for predicting
house price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable

X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model

# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")

Slip 14
Q.1. Create a CNN model and train it on mnist handwritten digit dataset. Using
model find out the digit written by a hand in a given image.
Import mnist dataset from tensorflow.keras.datasets. [15 M]
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing import image

# Load the MNIST dataset

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess the data

X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255 # Normalize and
reshape
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255

y_train = to_categorical(y_train, 10) # One-hot encode target labels

y_test = to_categorical(y_test, 10)

# Build the CNN model

model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])

# Compile the model

model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model

model.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model

test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_accuracy:.4f}")

# Function to predict a digit in a new image

def predict_digit(img_path):
img = image.load_img(img_path, target_size=(28, 28), color_mode='grayscale')
img = image.img_to_array(img).astype('float32') / 255
img = np.expand_dims(img, axis=0) # Reshape to match input shape of the
model
prediction = model.predict(img)
digit = np.argmax(prediction)
print(f"Predicted Digit: {digit}")
plt.imshow(img.reshape(28, 28), cmap='gray')
plt.title(f"Predicted Digit: {digit}")
plt.show()

# Example usage
# Save a test image (28x28 grayscale image of a digit) in your working directory
with the name 'digit.png'
predict_digit('digit.png')
Q.2. Write a python program to find all null values in a given dataset and remove
them.
Create your own dataset.
import pandas as pd
import numpy as np

# Create a sample dataset with some null values

data = {
'Name': ['Alice', 'Bob', 'Charlie', np.nan, 'Eve'],
'Age': [25, 30, np.nan, 28, 22],
'City': ['New York', 'Los Angeles', np.nan, 'Chicago', 'Houston']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the original DataFrame

print("Original DataFrame:")
print(df)

# Check for null values

null_values = df.isnull().sum()
print("\nNull Values in Each Column:")
print(null_values)

# Remove rows with null values

df_cleaned = df.dropna()

# Display the cleaned DataFrame

print("\nDataFrame after removing null values:")
print(df_cleaned)

Slip15
Q.1. Create an ANN and train it on house price dataset classify the house price is
above average or below average. [15 M]
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from tensorflow.keras import layers

# Load the dataset

data = pd.read_csv('C:\\csv\\house-prices.csv')

# Create binary labels based on average price

data['Price_Label'] = (data['Price'] > data['Price'].mean()).astype(int)

# Features and target

X = data[['SqFt', 'Bedrooms', 'Bathrooms']]
y = data['Price_Label']

# Split and scale data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train_scaled = StandardScaler().fit_transform(X_train)
X_test_scaled = StandardScaler().fit_transform(X_test)

# Build the ANN model

model = keras.Sequential([
layers.Dense(16, activation='relu', input_shape=(X_train_scaled.shape[1],)),
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='sigmoid')
])

# Compile and train the model

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X_train_scaled, y_train, epochs=100, batch_size=10,
validation_split=0.2)

# Evaluate the model

accuracy = model.evaluate(X_test_scaled, y_test)[1]
print(f'Accuracy: {accuracy:.4f}')

# Plot training history

plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Q.2. Write a python program to implement multiple Linear Regression for a house
price dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the house price dataset

data = pd.read_csv('C:\\csv\\house-prices.csv')
print(data)

# Features and target variable

X = data[['SqFt', 'Bedrooms', 'Bathrooms']] # Independent variables
y = data['Price'] # Dependent variable

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create and fit the model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Display the coefficients

print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")

# Visualize the results

plt.scatter(y_test, y_pred)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Prices')
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=2) # Line for perfect
predictions
plt.show()

Slip 16
Q.1. Create a two layered neural network with relu and sigmoid activation
function.
import numpy as np

# Activation functions
def relu(x):
return np.maximum(0, x)

def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Initialize parameters
np.random.seed(0)
weights_input_hidden = np.random.rand(3, 4) # 3 inputs, 4 hidden neurons
weights_hidden_output = np.random.rand(4, 1) # 4 hidden neurons, 1 output

# Forward pass
def forward_pass(X):
hidden_output = relu(np.dot(X, weights_input_hidden))
final_output = sigmoid(np.dot(hidden_output, weights_hidden_output))
return hidden_output, final_output

# Example input
X = np.array([[0.1, 0.2, 0.3]])
hidden_output, final_output = forward_pass(X)
print("Hidden Layer Output:", hidden_output)
print("Final Output:", final_output)

Q.2. Write a python program to implement Simple Linear Regression for Boston
housing dataset
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load the dataset from a CSV file

data = pd.read_csv('C:\\csv\\Bostonousing.csv') # Make sure to replace with your
file path

# Display the first few rows of the dataset (optional)

print(data.head())

# Define the features (X) and the target variable (y)

X = data.drop('medv', axis=1) # Drop the target column
y = data['medv'] # Target column

# Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create a linear regression model

model = LinearRegression()

# Train the model

model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("R^2 Score:", r2)

# Plotting the true values vs predicted values

plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', edgecolor='k', alpha=0.7)
plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', lw=2) # Line for perfect
predictions
plt.title('True Values vs Predicted Values')
plt.xlabel('True Values (MEDV)')
plt.ylabel('Predicted Values (MEDV)')
plt.grid(True)
plt.show().

Slip17
Q.1. Implement Ensemble ML algorithm on Pima Indians Diabetes Database with
bagging (random forest), boosting, voting and Stacking methods and display
analysis accordingly. Compare result. [15 M]
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,
GradientBoostingClassifier, VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load the Pima Indians Diabetes dataset

data = pd.read_csv('C:\\csv\\diabetes.csv')
X = data.drop('Outcome', axis=1) # Features
y = data['Outcome'] # Target variable

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Standardize the features

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize models
models = {
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'Gradient Boosting': GradientBoostingClassifier(n_estimators=100,
random_state=42),
'Voting Classifier': VotingClassifier(estimators=[
('rf', RandomForestClassifier(n_estimators=100)),
('gb', GradientBoostingClassifier(n_estimators=100))
], voting='soft'),
'Stacking Classifier': StackingClassifier(estimators=[
('rf', RandomForestClassifier(n_estimators=100)),
('gb', GradientBoostingClassifier(n_estimators=100))
], final_estimator=LogisticRegression())
}

# Train models and evaluate accuracy

results = {}
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
results[name] = accuracy

# Display results
print("Model Accuracies:")
for model_name, accuracy in results.items():
print(f"{model_name}: {accuracy:.4f}")

# Plotting the results

plt.bar(results.keys(), results.values(), color=['blue', 'orange', 'green', 'red'])
plt.ylabel('Accuracy')
plt.title('Ensemble Learning Model Comparison')
plt.ylim([0, 1])
plt.xticks(rotation=45)
plt.show()
Q.2. Write a python program to implement Multiple Linear Regression for a house
price dataset.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset from CSV

data = pd.read_csv('C:\\csv\\house-prices.csv') # Replace with your dataset file
path
print(data.head())

# Separate features and target variable

X = data[['SqFt', 'Bedrooms', 'Bathrooms']] # Replace with relevant features
y = data['Price'] # Target variable

# Split data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create and train the model

model = LinearRegression()
model.fit(X_train, y_train)

# Predict on the test set

y_pred = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")

# Plot actual vs. predicted prices

plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue')
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=2)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()

Slip18
Q.1. Write a python program to implement k-means algorithm on a Diabetes
dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the Pima Indians Diabetes dataset

data = pd.read_csv('C:\\csv\\diabetes.csv')
X = data.drop('Outcome', axis=1) # Features

# Standardizing the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Determine the optimal number of clusters using the elbow method

wcss = []
for i in range(2, 11): # Testing cluster sizes from 2 to 10
kmeans = KMeans(n_clusters=i, random_state=42)
kmeans.fit(X_scaled)
wcss.append(kmeans.inertia_)

# Plotting the elbow method

plt.plot(range(2, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()

# Choosing an optimal number of clusters based on elbow plot

optimal_clusters = 3 # Adjust this based on the plot

# Fit the K-Means model with the chosen number of clusters

kmeans_final = KMeans(n_clusters=optimal_clusters, random_state=42)
y_kmeans = kmeans_final.fit_predict(X_scaled)

# Add the cluster labels to the original dataset

data['Cluster'] = y_kmeans

# Visualize the clusters

plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y_kmeans, s=50, cmap='viridis')
plt.scatter(kmeans_final.cluster_centers_[:, 0], kmeans_final.cluster_centers_[:, 1],
s=200, c='red', label='Centroids')
plt.title('K-Means Clustering of Diabetes Dataset')
plt.xlabel('Feature 1 (Standardized)')
plt.ylabel('Feature 2 (Standardized)')
plt.legend()
plt.show()

Q.2. Write a python program to implement Polynomial Linear Regression for

salary_positions dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error

# Load the dataset from a local CSV file

data = pd.read_csv('C:\\csv\\Position_Salaries.csv') # Ensure this CSV file has
'Level' and 'Salary' columns

# Define the features and target variable

X = data[['Level']] # Features (Level)
y = data['Salary'] # Target variable (Salary)

# Polynomial Feature Transformation

poly_degree = 4 # Degree of polynomial features
poly_features = PolynomialFeatures(degree=poly_degree)
X_poly = poly_features.fit_transform(X)

# Fit Polynomial Regression

poly_model = LinearRegression()
poly_model.fit(X_poly, y)

# Predict using the Polynomial Model

y_pred = poly_model.predict(X_poly)

# Calculate Mean Squared Error for evaluation

mse = mean_squared_error(y, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Plotting the Polynomial Regression Results

plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Actual Salaries') # Scatter plot of actual
salaries
plt.plot(X, y_pred, color='red', label='Polynomial Regression (Degree 4)') #
Polynomial fit line
plt.title('Polynomial Regression for Salary vs Position Level')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.legend()
plt.grid(True)
plt.show()

Slip 19
Q.1. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the dataset

data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable

X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Simple Linear Regression

simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models

mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)
print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')
print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')
# Predict salaries for levels 11 and 12
level_11_salary = poly_model.predict(poly.transform([[11]]))[0]
level_12_salary = poly_model.predict(poly.transform([[12]]))[0]
print(f'Predicted salary for level 11: ${level_11_salary:.2f}')
print(f'Predicted salary for level 12: ${level_12_salary:.2f}')

# Plotting the results

plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)
plt.scatter(X_test, y_pred_poly, color='green', label='Polynomial Regression
Predictions', alpha=0.5)

# Plot Polynomial Regression Fit

X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')

plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()

Q2.Write a python program to implement Naive Bayes on weather forecast dataset.

# Load the dataset

data = pd.read_csv('C:\\csv\\weather_forecast.csv')

# Display the first few rows

print(data.head())

# Preprocess: Encode categorical variables

label_encoders = {}
for column in data.columns: if data[column].dtype == 'object':
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le

# Separate features and target variable

X = data.drop('weather', axis=1) # 'Play' is the target variable here
y = data['weather']

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Naive Bayes classifier

nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict on the test data

y_pred = nb_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

slip 20
Q1.Implement Ridge Regression, Lasso regression model using boston_houses.csv and
take only ‘RM’ and ‘Price’ of the houses. divide the data as training and testing
data. Fit line using Ridge regression and to find price of a house if it contains 5
rooms. and compare results.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso

# Load the dataset

data = pd.read_csv('C:\\csv\\BostonHousing.csv')
print(data)

# Select relevant features

X = data[['rm']]
y = data['medv']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Fit Ridge Regression
ridge_model = Ridge(alpha=1.0) # You can adjust the alpha value as needed
ridge_model.fit(X_train, y_train)

# Fit Lasso Regression

lasso_model = Lasso(alpha=1.0) # You can adjust the alpha value as needed
lasso_model.fit(X_train, y_train)

# Predict price for a house with 5 rooms

rooms = np.array([[5]])
ridge_prediction = ridge_model.predict(rooms)
lasso_prediction = lasso_model.predict(rooms)

# Display the results

print(f"Ridge Regression Prediction for 5 rooms: ${ridge_prediction[0]:.2f}")
print(f"Lasso Regression Prediction for 5 rooms: ${lasso_prediction[0]:.2f}")

Q.2. Write a python program to implement Decision Tree whether or not to play Tennis.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import matplotlib.pyplot as plt

# Load the dataset

data = pd.read_csv('C:\\csv\\play_tennis.csv')
print(data)

# Preprocess the data

X = data.drop('play', axis=1) # Features
y = data['play'] # Target variable

# Convert categorical variables to numerical using one-hot encoding

X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Decision Tree model

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", classification_report(y_test, y_pred))

# Display the decision tree rules

tree_rules = export_text(model, feature_names=list(X.columns))
print("Decision Tree Rules:\n", tree_rules)

# Visualize the decision tree

plt.figure(figsize=(12, 8))
tree.plot_tree(model, feature_names=X.columns, class_names=['No', 'Yes'], filled=True)
plt.title("Decision Tree for Play Tennis")
plt.show()

Slip21
Q.1. Create a multiple linear regression model for house price dataset divide dataset into
train and test data while giving it to model and predict prices of house.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset from a local CSV file

data = pd.read_csv('C:\\csv\\BostonHousing.csv')

# Split data into features and target

X = data.drop('medv', axis=1) # Features
y = data['medv'] # Target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model

model = LinearRegression()
model.fit(X_train, y_train)

# Predict prices
y_pred = model.predict(X_test)

# Calculate and display mean squared error

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
# Display predictions for the first 5 houses
print("Predicted Prices for the first 5 houses:", y_pred[:5])

# Plotting the predicted prices vs actAual prices

plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', alpha=0.6)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', lw=2) # Diagonal
line
plt.title('Predicted Prices vs Actual Prices')
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.grid(True)
plt.show()

Q.2. Write a python program to implement Linear SVM using UniversalBank.csv.

# Load the dataset

data = pd.read_csv('C:\\csv\\UniversalBank.csv')

# Display the first few rows of the dataset

print(data.head())

# Preprocess the data

# Convert categorical variables to dummy variables (if any)

X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the Linear SVM model

model = SVC(kernel='linear', random_state=42)
model.fit(X_train_scaled, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Slip22
Q.1. Write a python program to implement simple Linear Regression for predicting house
price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable

X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model

Q.2. Use Apriori algorithm on groceries dataset to find which items are brought together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

# Load the dataset

data = pd.read_csv('C:\csv\groceries.csv')
print(data)

# Convert each transaction into a one-hot encoded DataFrame

# Apply Apriori algorithm with a minimum support of 0.25

frequent_itemsets = apriori(one_hot_data, min_support=0.25,
use_colnames=True)

# Generate association rules with lift metric

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
# Display results
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)

Slip23
Q.1. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the dataset

data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable

X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Simple Linear Regression

simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models

mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')

print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')

# Predict salaries for levels 11 and 12

# Plotting the results

plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)
plt.scatter(X_test, y_pred_poly, color='green', label='Polynomial Regression
Predictions', alpha=0.5)

# Plot Polynomial Regression Fit

X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')

plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()
Q.2. Write a python program to find all null values from a dataset and remove them
import pandas as pd
# Load dataset (replace with your dataset path)
data = pd.read_csv('C:\\csv\\iris.csv')

# Display initial null value counts

print("Null values before removal:")
print(data.isnull().sum())

# Remove rows with any null values

cleaned_data = data.dropna()

# Display null value counts after removal

print("\nNull values after removal:")
print(cleaned_data.isnull().sum())

# Optionally, save the cleaned dataset

cleaned_data.to_csv('iris.csv', index=False)
Slip24
Q.1. Write a python program to Implement Decision Tree classifier model on Data which is
extracted from images that were taken from genuine and forged banknote-like
specimens.
(refer UCI dataset https://archive.ics.uci.edu/dataset/267/banknote+authentication)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset from the UCI repository

# Separate features and target variable

X = data.drop('Class', axis=1)
y = data['Class']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Decision Tree Classifier

classifier = DecisionTreeClassifier(random_state=42)
classifier.fit(X_train, y_train)

# Make predictions on the test set

y_pred = classifier.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

# Plot the Decision Tree

plt.figure(figsize=(20, 10))
plot_tree(classifier, feature_names=X.columns, class_names=['Not Authenticated',
'Authenticated'], filled=True, rounded=True)
plt.show()

Q.2. Write a python program to implement linear SVM using UniversalBank.csv.

# Load the dataset

data = pd.read_csv('C:\\csv\\UniversalBank.csv')

# Display the first few rows of the dataset

print(data.head())

# Preprocess the data

# Convert categorical variables to dummy variables (if any)

X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the Linear SVM model

model = SVC(kernel='linear', random_state=42)
model.fit(X_train_scaled, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test_scaled)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Slip25
Q.1. Write a python program to implement Polynomial Regression for house price dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset

data = pd.read_csv('C:\\csv\\BostonHousing.csv') # Replace with the correct path
to your dataset

# Use 'RM' (average number of rooms) as the feature and 'MEDV' (median house
value) as the target
X = data[['rm']] # Features
y = data['medv'] # Target variable

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Transform the feature into polynomial terms (degree 2)

poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Train the Polynomial Regression model
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict on the test set

y_pred = model.predict(X_test_poly)

# Calculate Mean Squared Error

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Plot the results

plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='red', label='Polynomial Regression Fit')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('Median Value of Homes (MEDV)')
plt.title('Polynomial Regression on Boston Housing Dataset')
plt.legend()
plt.grid()
plt.show()

Q.2. Create a two layered neural network with relu and sigmoid activation function.
import numpy as np

# Activation functions
def relu(x):
return np.maximum(0, x)

def sigmoid(x):
return 1 / (1 + np.exp(-x))

# Initialize parameters
np.random.seed(0)
weights_input_hidden = np.random.rand(3, 4) # 3 inputs, 4 hidden neurons
weights_hidden_output = np.random.rand(4, 1) # 4 hidden neurons, 1 output
# Forward pass
def forward_pass(X):
hidden_output = relu(np.dot(X, weights_input_hidden))
final_output = sigmoid(np.dot(hidden_output, weights_hidden_output))
return hidden_output, final_output

# Example input
X = np.array([[0.1, 0.2, 0.3]])
hidden_output, final_output = forward_pass(X)
print("Hidden Layer Output:", hidden_output)
print("Final Output:", final_output)

Slip26
Q.1. Create KNN model on Indian diabetes patient’s database and predict whether a new
patient is diabetic (1) or not (0). Find optimal value of K.

# Load the dataset

data = pd.read_csv('C:\csv\groceries.csv')
print(data)

# Convert each transaction into a one-hot encoded DataFrame

# Each column represents an item, and each row represents a transaction
data = data.apply(lambda row: row.dropna().tolist(), axis=1)
one_hot_data =
pd.get_dummies(data.apply(pd.Series).stack()).groupby(level=0).sum()
# Apply Apriori algorithm with a minimum support of 0.25
frequent_itemsets = apriori(one_hot_data, min_support=0.25,
use_colnames=True)

# Generate association rules with lift metric

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Display results
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)

Slip 27
Q.1. Create a multiple linear regression model for house price dataset divide dataset into
train and test data while giving it to model and predict prices of house.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset from a local CSV file

data = pd.read_csv('C:\\csv\\BostonHousing.csv')

# Split data into features and target

X = data.drop('medv', axis=1) # Features
y = data['medv'] # Target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model

model = LinearRegression()
model.fit(X_train, y_train)
# Predict prices
y_pred = model.predict(X_test)

# Calculate and display mean squared error

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Display predictions for the first 5 houses

print("Predicted Prices for the first 5 houses:", y_pred[:5])

# Plotting the predicted prices vs actAual prices

Q.2. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given data.
Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the dataset

data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable

X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Fit Simple Linear Regression
simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models

mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')

print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')

# Predict salaries for levels 11 and 12

# Plotting the results

plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)
plt.scatter(X_test, y_pred_poly, color='green', label='Polynomial Regression
Predictions', alpha=0.5)

# Plot Polynomial Regression Fit

X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')
plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()

Slip 28
Q.1. Write a python program to categorize the given news text into one of the available 20
categories of news groups, using multinomial Naïve Bayes machine learning model.
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the 20 Newsgroups dataset

newsgroups = fetch_20newsgroups(subset='all')

# Prepare the data

X = newsgroups.data # News text
y = newsgroups.target # Corresponding categories

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Convert text data to feature vectors using Count Vectorization

vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the Multinomial Naive Bayes model

model = MultinomialNB()
model.fit(X_train_vectorized, y_train)

# Make predictions
y_pred = model.predict(X_test_vectorized)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')
print('Classification Report:')
print(classification_report(y_test, y_pred,
target_names=newsgroups.target_names))

# Example of predicting a new article

Q.2. Classify the iris flowers dataset using SVM and find out the flower type depending on
the given input data like sepal length, sepal width, petal length and petal width. Find
accuracy of all SVM kernels.
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the iris dataset

data=pd.read_csv('C:\\csv\\iris.csv')
X = data[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']] # Features
y = data['Species'] # Target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# List of SVM kernels to evaluate

kernels = ['linear', 'poly', 'rbf', 'sigmoid']
accuracies = {}

# Train and evaluate SVM with different kernels

for kernel in kernels:
model = SVC(kernel=kernel)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies[kernel] = accuracy
print(f"Accuracy with {kernel} kernel: {accuracy:.4f}")

# Example input for prediction (sepal length, sepal width, petal length, petal width)
new_flower = [[5.1, 3.5, 1.4, 0.2]] # Example input
predictions = {kernel: model.predict(new_flower)[0] for kernel in kernels}

# Display predictions for the new flower

print("\nPredicted flower types for the new input:")
for kernel, flower_type in predictions.items():
print(f"Kernel: {kernel}")

Slip 29
Q.1. Take iris flower dataset and reduce 4D data to 2D data using PCA. Then train the
model and predict new flower with given measurements.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset from CSV

data = pd.read_csv('C:\\csv\\iris.csv')
print(data)
X = data[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']] # Features
y = data['Species'] # Target

# Split the dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# SVM Kernels
kernels = ['linear', 'poly', 'rbf', 'sigmoid']

# Train and evaluate models

for kernel in kernels:
model = SVC(kernel=kernel)
model.fit(X_train, y_train)
accuracy = accuracy_score(y_test, model.predict(X_test))
print(f"Accuracy with {kernel} kernel: {accuracy:.4f}")

# Prediction for a new sample

new_flower = [[5.1, 3.5, 1.4, 0.2]]
print("\nPredictions for the new flower input:")
for kernel in kernels:
prediction = model.predict(new_flower)[0]
print(f"Kernel: {kernel}")

Q.2. Use K-means clustering model and classify the employees into various income groups
or clusters. Preprocess data if require (i.e. drop missing or null values). Use elbow
method and Silhouette Score to find value of k.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Load dataset
data = pd.read_csv('C:\\csv\\employee_data.csv').dropna() # Drop missing values
print(data.head())
# Select features for clustering
X = data[['EmpID']] # Modify as needed

# Find optimal K using Elbow and Silhouette methods

inertia = []
silhouette_scores = []

for k in range(2, 11):

kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X)
inertia.append(kmeans.inertia_)
silhouette_scores.append(silhouette_score(X, kmeans.labels_))

# Plot Elbow and Silhouette scores

plt.figure(figsize=(10, 4))
plt.plot(range(2, 11), inertia, 'o-', label='Inertia')
plt.plot(range(2, 11), silhouette_scores, 'o-', label='Silhouette Score')
plt.xlabel('Number of clusters (K)')
plt.legend()
plt.show()

# Optimal K and final clustering

optimal_k = 3 # Set based on elbow and silhouette analysis
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
data['Cluster'] = kmeans.fit_predict(X)

# Display results
print(data.head())

Midterm: (15 Points) : Indian Institute of Management Bangalore Decision Science II Old Exams
0% (1)
Midterm: (15 Points) : Indian Institute of Management Bangalore Decision Science II Old Exams
72 pages
PMP ITTO Process Chart PMBOK Guide 6th Edition PDF
75% (4)
PMP ITTO Process Chart PMBOK Guide 6th Edition PDF
15 pages
ASTM E178 - 2008 - Standard Practice For Dealing With Outlying Observations PDF
100% (3)
ASTM E178 - 2008 - Standard Practice For Dealing With Outlying Observations PDF
18 pages
Fourth Quarter Periodical Test in Grade Eleven Statistics and Probability
100% (2)
Fourth Quarter Periodical Test in Grade Eleven Statistics and Probability
2 pages
Practical No - 01: Aim: Data Collection, Data Curation and Management For Unstructured Data (Nosql) Using Apache Couchdb
No ratings yet
Practical No - 01: Aim: Data Collection, Data Curation and Management For Unstructured Data (Nosql) Using Apache Couchdb
79 pages
Thyt 5
No ratings yet
Thyt 5
65 pages
Anomaly Detection With Machine Learning
No ratings yet
Anomaly Detection With Machine Learning
12 pages
Research Competence and Productivity Among School Heads and Teachers: Basis For District Research Capacity Building
No ratings yet
Research Competence and Productivity Among School Heads and Teachers: Basis For District Research Capacity Building
6 pages
Community Festivals - Involvement and Inclusion
No ratings yet
Community Festivals - Involvement and Inclusion
16 pages
Smart Manufacturing Technology 5
No ratings yet
Smart Manufacturing Technology 5
15 pages
Ass1 Mam Tim
No ratings yet
Ass1 Mam Tim
11 pages
Corroletion & Regeression1 Mrs Sahar
No ratings yet
Corroletion & Regeression1 Mrs Sahar
33 pages
Profile
No ratings yet
Profile
2 pages
Thesis PDF
No ratings yet
Thesis PDF
55 pages
Microproject On ETL Process For Data Analytics
No ratings yet
Microproject On ETL Process For Data Analytics
6 pages
3.independent Sample T-Test
No ratings yet
3.independent Sample T-Test
10 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Dengue Fever Prediction Using Data Mining Technique: Abstract
No ratings yet
Dengue Fever Prediction Using Data Mining Technique: Abstract
8 pages
ML File
No ratings yet
ML File
37 pages
Wagenmakers 2021
No ratings yet
Wagenmakers 2021
8 pages
Syllabus SBE204 Spring2024
No ratings yet
Syllabus SBE204 Spring2024
4 pages
Final ML File
No ratings yet
Final ML File
34 pages
SOS FDP Brochure #
No ratings yet
SOS FDP Brochure #
6 pages
Data Science
No ratings yet
Data Science
18 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Ken Black QA 5th Chapter17 Solution
No ratings yet
Ken Black QA 5th Chapter17 Solution
44 pages
Mad, Mse, Mape Formulas
No ratings yet
Mad, Mse, Mape Formulas
18 pages
Measuring The Maturity of Digitalization Transformation From Operational Excellence's Perspective
No ratings yet
Measuring The Maturity of Digitalization Transformation From Operational Excellence's Perspective
16 pages
Aayushi ML File
No ratings yet
Aayushi ML File
37 pages
Critical Value Spearman
No ratings yet
Critical Value Spearman
1 page
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Deepak Data Analysis 1
No ratings yet
Deepak Data Analysis 1
31 pages
Fpubh 12 1422203
No ratings yet
Fpubh 12 1422203
14 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Chapter 9,10,11,12 - Công TH C
No ratings yet
Chapter 9,10,11,12 - Công TH C
9 pages
AI Teacher HandbookXII
No ratings yet
AI Teacher HandbookXII
217 pages
PSMOD - Sample Practical Test (A)
No ratings yet
PSMOD - Sample Practical Test (A)
3 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Wa0003
No ratings yet
Wa0003
16 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Argha's ML LAB - 240927 - 121838
No ratings yet
Argha's ML LAB - 240927 - 121838
13 pages
DM Lab Internal
No ratings yet
DM Lab Internal
37 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
Set 2
No ratings yet
Set 2
19 pages
Data Analytics Program
No ratings yet
Data Analytics Program
11 pages
Exercise6 Solution
No ratings yet
Exercise6 Solution
8 pages
Bank Marketing Targets 1724510938
No ratings yet
Bank Marketing Targets 1724510938
13 pages
Train
No ratings yet
Train
17 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
ML
No ratings yet
ML
17 pages
Ass6 (DMDS)
No ratings yet
Ass6 (DMDS)
7 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
External
No ratings yet
External
11 pages
Forecasting Techniques - Regression Analysis
No ratings yet
Forecasting Techniques - Regression Analysis
11 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
Data Analytics Assignment Solutions
No ratings yet
Data Analytics Assignment Solutions
20 pages
AIML Assignment - Merged
No ratings yet
AIML Assignment - Merged
7 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
ML Yogesh
No ratings yet
ML Yogesh
23 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
ML Spy Programs
No ratings yet
ML Spy Programs
16 pages
ML 3
No ratings yet
ML 3
24 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
ML Practical Solutions
No ratings yet
ML Practical Solutions
15 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
M PDF
No ratings yet
M PDF
13 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Machine Learning Record VR19
No ratings yet
Machine Learning Record VR19
46 pages
ML Programs
No ratings yet
ML Programs
14 pages
ML Minimized Programs
No ratings yet
ML Minimized Programs
9 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ML Manual
No ratings yet
ML Manual
30 pages
1
No ratings yet
1
13 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
ML Record
No ratings yet
ML Record
19 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet