ML Solution
ML Solution
1.Use Apriori algorithm on groceries dataset to find which items are brought
together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt
data = pd.read_csv('C:\csv\groceries.csv')
print(data)
data = data.apply(lambda row: row.dropna().tolist(), axis=1)
one_hot_data =
pd.get_dummies(data.apply(pd.Series).stack()).groupby(level=0).sum()
frequent_itemsets = apriori(one_hot_data,
min_support=0.25,use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)
2. Write a Python program to prepare Scatter Plot for Iris Dataset. Convert
Categorical values in numeric format for a dataset.
# Import libraries
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Slip2
Q.1. Write a python program to implement simple Linear Regression for predicting
house price. First find all null values in a given dataset and remove them. [15 M]
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")
Q.2. The data set refers to clients of a wholesale distributor. It includes the annual
spending in monetary units on diverse product categories. Using data Wholesale
customer dataset compute agglomerative clustering to find out annual spending
clients in the same region.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
# Visualize clusters
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Fresh', y='Milk', hue='Cluster', palette='viridis', s=100)
plt.title('Agglomerative Clustering of Wholesale Customers')
plt.xlabel('Annual Spending on Fresh Products')
plt.ylabel('Annual Spending on Milk Products')
plt.legend(title='Cluster')
plt.grid(True)
plt.show()
Slip3
Q.1. Write a python program to implement multiple Linear Regression for a house
price dataset. Divide the dataset into training and testing data. [15 M]
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
# Display results
print("\nModel Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))
Q.2. Use dataset crash.csv is an accident survivor’s dataset portal for USA hosted
by data.gov. The dataset contains passengers age and speed of vehicle (mph) at the
time of impact and fate of passengers (1 for survived and 0 for not survived) after a
crash. use logistic regression to decide if the age and speed can predict the
survivability of the passengers.
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('crash.csv')
X = df[['age', 'speed']]
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
plt.figure(figsize=(10, 6))
plt.xlabel('Age')
plt.ylabel('Speed (mph)')
plt.grid(True)
plt.show()
Slip 4
Q.1. Write a python program to implement k-means algorithm on a
mall_customers dataset.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load dataset
df = pd.read_csv('C:\csv\mall_customers.csv')
# Select features
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
# Plot clusters
plt.figure(figsize=(10, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=df['Cluster'], cmap='viridis', s=100)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,
c='red', label='Centroids')
plt.title('K-means Clustering of Mall Customers')
plt.xlabel('Annual Income (scaled)')
plt.ylabel('Spending Score (scaled)')
plt.legend()
plt.grid(True)
plt.show()
# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")
Slip 5
Q.1. Write a python program to implement Multiple Linear Regression for Fuel
Consumption dataset.
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
# Display results
print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Example prediction
sample = [[5.1, 3.5, 1.4, 0.2]]
pred_class = knn.predict(sample)
print("Predicted Class:", iris.target_names[pred_class[0]])
Slip6
Q.1. Write a python program to implement Polynomial Linear Regression for
Boston Housing Dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Load the dataset (replace 'boston_housing.csv' with the actual CSV file name)
data = pd.read_csv('C:\\csv\\BostonHousing.csv')
print(data)
# Use 'RM' (average number of rooms per dwelling) as the feature and 'MEDV'
(median house value) as the target
X = data[['rm']]
y = data['medv']
Q.2. Use K-means clustering model and classify the employees into various
income groups or clusters. Preprocess data if require (i.e. drop missing or null
values).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load the dataset (replace 'employees.csv' with your actual CSV file name)
data = pd.read_csv('C:\\csv\\employee_data.csv')
# Display the first few rows and info about the dataset
print(data.head())
print(data.info())
Slip7
Q.1. Fit the simple linear regression model to Salary_positions.csv data. Predict the sa of level 11
and level 12 employees.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Slip8
Q.1. Write a python program to categorize the given news text into one of the
available 20 categories of news groups, using multinomial Naïve Bayes machine
learning model. [15 M]
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
# Make predictions
y_pred = model.predict(X_test_vectorized)
Q.2. Write a python program to implement Decision Tree whether or not to play
Tennis
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import matplotlib.pyplot as plt
Slip 10
Q.1. Write a python program to transform data with Principal Component Analysis
(PCA). Use iris dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Apply PCA
pca = PCA(n_components=2) # Reduce to 2 dimensions
X_pca = pca.fit_transform(X_scaled)
Q.2. Write a Python program to prepare Scatter Plot for Iris Dataset. Convert
Categorical values in to numeric
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
Slip11
Q.1. Write a python program to implement Polynomial Regression for Boston
Housing Dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Use 'RM' (average number of rooms) as the feature and 'MEDV' (median house
value) as the target
X = data[['rm']] # Features
y = data['medv'] # Target variable
Q.2. Write a python program to Implement Decision Tree classifier model on Data
which is extracted from images that were taken from genuine and forged banknote-
like specimens.
(refer UCI dataset https://archive.ics.uci.edu/dataset/267/banknote+authentication)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
Slip12
Q.1. Write a python program to implement k-nearest Neighbors ML algorithm to
build prediction model (Use iris Dataset). [15 M]
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Example prediction
sample = [[5.1, 3.5, 1.4, 0.2]]
pred_class = knn.predict(sample)
print("Predicted Class:", iris.target_names[pred_class[0]])
2. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_splitfrom sklearn.metrics import
mean_squared_error
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)
plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()
Slip 13
Q.1. Create RNN model and analyze the Google stock price dataset. Find out
increasing or decreasing trends of stock price for the next day. [15 M]
Q.2. Write a python program to implement simple Linear Regression for predicting
house price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")
Slip 14
Q.1. Create a CNN model and train it on mnist handwritten digit dataset. Using
model find out the digit written by a hand in a given image.
Import mnist dataset from tensorflow.keras.datasets. [15 M]
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing import image
# Example usage
# Save a test image (28x28 grayscale image of a digit) in your working directory
with the name 'digit.png'
predict_digit('digit.png')
Q.2. Write a python program to find all null values in a given dataset and remove
them.
Create your own dataset.
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame(data)
Slip15
Q.1. Create an ANN and train it on house price dataset classify the house price is
above average or below average. [15 M]
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from tensorflow.keras import layers
# Make predictions
y_pred = model.predict(X_test)
Slip 16
Q.1. Create a two layered neural network with relu and sigmoid activation
function.
import numpy as np
# Activation functions
def relu(x):
return np.maximum(0, x)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Initialize parameters
np.random.seed(0)
weights_input_hidden = np.random.rand(3, 4) # 3 inputs, 4 hidden neurons
weights_hidden_output = np.random.rand(4, 1) # 4 hidden neurons, 1 output
# Forward pass
def forward_pass(X):
hidden_output = relu(np.dot(X, weights_input_hidden))
final_output = sigmoid(np.dot(hidden_output, weights_hidden_output))
return hidden_output, final_output
# Example input
X = np.array([[0.1, 0.2, 0.3]])
hidden_output, final_output = forward_pass(X)
print("Hidden Layer Output:", hidden_output)
print("Final Output:", final_output)
Q.2. Write a python program to implement Simple Linear Regression for Boston
housing dataset
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# Make predictions
y_pred = model.predict(X_test)
Slip17
Q.1. Implement Ensemble ML algorithm on Pima Indians Diabetes Database with
bagging (random forest), boosting, voting and Stacking methods and display
analysis accordingly. Compare result. [15 M]
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,
GradientBoostingClassifier, VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# Initialize models
models = {
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'Gradient Boosting': GradientBoostingClassifier(n_estimators=100,
random_state=42),
'Voting Classifier': VotingClassifier(estimators=[
('rf', RandomForestClassifier(n_estimators=100)),
('gb', GradientBoostingClassifier(n_estimators=100))
], voting='soft'),
'Stacking Classifier': StackingClassifier(estimators=[
('rf', RandomForestClassifier(n_estimators=100)),
('gb', GradientBoostingClassifier(n_estimators=100))
], final_estimator=LogisticRegression())
}
# Display results
print("Model Accuracies:")
for model_name, accuracy in results.items():
print(f"{model_name}: {accuracy:.4f}")
Slip18
Q.1. Write a python program to implement k-means algorithm on a Diabetes
dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
Slip 19
Q.1. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)
plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
slip 20
Q1.Implement Ridge Regression, Lasso regression model using boston_houses.csv and
take only ‘RM’ and ‘Price’ of the houses. divide the data as training and testing
data. Fit line using Ridge regression and to find price of a house if it contains 5
rooms. and compare results.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
Q.2. Write a python program to implement Decision Tree whether or not to play Tennis.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import matplotlib.pyplot as plt
Slip21
Q.1. Create a multiple linear regression model for house price dataset divide dataset into
train and test data while giving it to model and predict prices of house.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# Predict prices
y_pred = model.predict(X_test)
Slip22
Q.1. Write a python program to implement simple Linear Regression for predicting house
price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)
Q.2. Use Apriori algorithm on groceries dataset to find which items are brought together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt
Slip23
Q.1. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)
plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()
Q.2. Write a python program to find all null values from a dataset and remove them
import pandas as pd
# Load dataset (replace with your dataset path)
data = pd.read_csv('C:\\csv\\iris.csv')
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
Slip25
Q.1. Write a python program to implement Polynomial Regression for house price dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Use 'RM' (average number of rooms) as the feature and 'MEDV' (median house
value) as the target
X = data[['rm']] # Features
y = data['medv'] # Target variable
Q.2. Create a two layered neural network with relu and sigmoid activation function.
import numpy as np
# Activation functions
def relu(x):
return np.maximum(0, x)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Initialize parameters
np.random.seed(0)
weights_input_hidden = np.random.rand(3, 4) # 3 inputs, 4 hidden neurons
weights_hidden_output = np.random.rand(4, 1) # 4 hidden neurons, 1 output
# Forward pass
def forward_pass(X):
hidden_output = relu(np.dot(X, weights_input_hidden))
final_output = sigmoid(np.dot(hidden_output, weights_hidden_output))
return hidden_output, final_output
# Example input
X = np.array([[0.1, 0.2, 0.3]])
hidden_output, final_output = forward_pass(X)
print("Hidden Layer Output:", hidden_output)
print("Final Output:", final_output)
Slip26
Q.1. Create KNN model on Indian diabetes patient’s database and predict whether a new
patient is diabetic (1) or not (0). Find optimal value of K.
Q.2. Use Apriori algorithm on groceries dataset to find which items are brought together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt
# Display results
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)
Slip 27
Q.1. Create a multiple linear regression model for house price dataset divide dataset into
train and test data while giving it to model and predict prices of house.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
Q.2. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given data.
Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)
Slip 28
Q.1. Write a python program to categorize the given news text into one of the available 20
categories of news groups, using multinomial Naïve Bayes machine learning model.
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
# Make predictions
y_pred = model.predict(X_test_vectorized)
Q.2. Classify the iris flowers dataset using SVM and find out the flower type depending on
the given input data like sepal length, sepal width, petal length and petal width. Find
accuracy of all SVM kernels.
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Example input for prediction (sepal length, sepal width, petal length, petal width)
new_flower = [[5.1, 3.5, 1.4, 0.2]] # Example input
predictions = {kernel: model.predict(new_flower)[0] for kernel in kernels}
Slip 29
Q.1. Take iris flower dataset and reduce 4D data to 2D data using PCA. Then train the
model and predict new flower with given measurements.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# SVM Kernels
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
Q.2. Use K-means clustering model and classify the employees into various income groups
or clusters. Preprocess data if require (i.e. drop missing or null values). Use elbow
method and Silhouette Score to find value of k.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
# Load dataset
data = pd.read_csv('C:\\csv\\employee_data.csv').dropna() # Drop missing values
print(data.head())
# Select features for clustering
X = data[['EmpID']] # Modify as needed
# Display results
print(data.head())