0% found this document useful (0 votes)
14 views60 pages

ML Solution

The document contains multiple Python programming tasks related to data analysis and machine learning, including the implementation of algorithms like Apriori, Linear Regression, Logistic Regression, K-means, and K-Nearest Neighbors. It covers various datasets such as groceries, Iris, house prices, wholesale customers, crash data, mall customers, fuel consumption, Boston housing, and employee income data. Each task includes code snippets for loading data, preprocessing, model training, predictions, and visualizations.

Uploaded by

Faizal Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views60 pages

ML Solution

The document contains multiple Python programming tasks related to data analysis and machine learning, including the implementation of algorithms like Apriori, Linear Regression, Logistic Regression, K-means, and K-Nearest Neighbors. It covers various datasets such as groceries, Iris, house prices, wholesale customers, crash data, mall customers, fuel consumption, Boston housing, and employee income data. Each task includes code snippets for loading data, preprocessing, model training, predictions, and visualizations.

Uploaded by

Faizal Shaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Slip1

1.Use Apriori algorithm on groceries dataset to find which items are brought
together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt
data = pd.read_csv('C:\csv\groceries.csv')
print(data)
data = data.apply(lambda row: row.dropna().tolist(), axis=1)
one_hot_data =
pd.get_dummies(data.apply(pd.Series).stack()).groupby(level=0).sum()
frequent_itemsets = apriori(one_hot_data,
min_support=0.25,use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)

2. Write a Python program to prepare Scatter Plot for Iris Dataset. Convert
Categorical values in numeric format for a dataset.
# Import libraries
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Iris dataset from a CSV file


iris_df = pd.read_csv('iris.csv')

# Convert categorical 'species' column to numeric format


iris_df['species'] = pd.Categorical(iris_df['species']).codes

# Create a scatter plot with numeric species values


plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris_df, x='sepal length (cm)', y='sepal width (cm)',
hue='species', palette='viridis', s=100)
plt.title('Iris Dataset Scatter Plot with Numeric Species')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.grid(True)
plt.show()

Slip2
Q.1. Write a python program to implement simple Linear Regression for predicting
house price. First find all null values in a given dataset and remove them. [15 M]
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable


X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model


model = LinearRegression().fit(X_train, y_train)
# Predict and visualize
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, model.predict(X_test), color='red', label='Predicted Prices')
plt.title('House Price Prediction')
plt.xlabel('Size of House')
plt.ylabel('Price of House')
plt.legend()
plt.show()

# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")

Q.2. The data set refers to clients of a wholesale distributor. It includes the annual
spending in monetary units on diverse product categories. Using data Wholesale
customer dataset compute agglomerative clustering to find out annual spending
clients in the same region.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler

# Load the dataset


df = pd.read_csv('C:\csv\Wholesale customers data.csv').dropna()

# Prepare features (dropping non-numeric columns)


features = df.drop(['Channel', 'Region'], axis=1)

# Standardize the data


features_scaled = StandardScaler().fit_transform(features)
# Perform agglomerative clustering
clusters = AgglomerativeClustering(n_clusters=3).fit_predict(features_scaled)

# Add cluster labels to the DataFrame


df['Cluster'] = clusters

# Visualize clusters
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='Fresh', y='Milk', hue='Cluster', palette='viridis', s=100)
plt.title('Agglomerative Clustering of Wholesale Customers')
plt.xlabel('Annual Spending on Fresh Products')
plt.ylabel('Annual Spending on Milk Products')
plt.legend(title='Cluster')
plt.grid(True)
plt.show()

# Display cluster means


print("\nCluster Distribution:")
print(df.groupby('Cluster').mean())

Slip3
Q.1. Write a python program to implement multiple Linear Regression for a house
price dataset. Divide the dataset into training and testing data. [15 M]
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset


# Replace 'house_prices.csv' with the actual path to your dataset
df = pd.read_csv('C:\csv\house-prices.csv')

# Display the first few rows


print("Dataset Preview:")
print(df.head())

# Define features (independent variables) and target (dependent variable)


# Assume 'size', 'bedrooms', 'age' are features, and 'price' is the target
X = df[['SqFt', 'Bedrooms', 'Bathrooms']] # Adjust column names as necessary
y = df['Price']

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Create and train the model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test data


y_pred = model.predict(X_test)

# Display results
print("\nModel Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

Q.2. Use dataset crash.csv is an accident survivor’s dataset portal for USA hosted
by data.gov. The dataset contains passengers age and speed of vehicle (mph) at the
time of impact and fate of passengers (1 for survived and 0 for not survived) after a
crash. use logistic regression to decide if the age and speed can predict the
survivability of the passengers.
# Import necessary libraries

import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the dataset

df = pd.read_csv('crash.csv')

# Display the first few rows of the dataset

print("Dataset Preview:\n", df.head())

# Define features and target

X = df[['age', 'speed']]

y = df['fate'] # Assuming 'fate' is 1 for survived and 0 for not survived

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train a logistic regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Make predictions

y_pred = model.predict(X_test)

# Display accuracy and classification report

print("\nAccuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))


# Confusion matrix

conf_matrix = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:\n", conf_matrix)

# Plot the results

plt.figure(figsize=(10, 6))

sns.scatterplot(data=df, x='age', y='speed', hue='fate', palette='viridis', s=100)

plt.title('Logistic Regression Prediction of Passenger Survivability')

plt.xlabel('Age')

plt.ylabel('Speed (mph)')

plt.legend(title='Fate (0 = Not Survived, 1 = Survived)')

plt.grid(True)

plt.show()

Slip 4
Q.1. Write a python program to implement k-means algorithm on a
mall_customers dataset.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load dataset
df = pd.read_csv('C:\csv\mall_customers.csv')

# Select features
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

# Standardize the data


X_scaled = StandardScaler().fit_transform(X)
# Apply K-means
kmeans = KMeans(n_clusters=5, random_state=0)
df['Cluster'] = kmeans.fit_predict(X_scaled)

# Plot clusters
plt.figure(figsize=(10, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=df['Cluster'], cmap='viridis', s=100)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,
c='red', label='Centroids')
plt.title('K-means Clustering of Mall Customers')
plt.xlabel('Annual Income (scaled)')
plt.ylabel('Spending Score (scaled)')
plt.legend()
plt.grid(True)
plt.show()

# Display cluster counts


print("\nNumber of customers in each cluster:")
print(df['Cluster'].value_counts())

Q.2. Write a python program to Implement Simple Linear Regression for


predicting house
price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable


X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model


model = LinearRegression().fit(X_train, y_train)
# Predict and visualize
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, model.predict(X_test), color='red', label='Predicted Prices')
plt.title('House Price Prediction')
plt.xlabel('Size of House')
plt.ylabel('Price of House')
plt.legend()
plt.show()

# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")

Slip 5
Q.1. Write a python program to implement Multiple Linear Regression for Fuel
Consumption dataset.
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset


# Replace 'fuel_consumption.csv' with the actual path to your dataset
df = pd.read_csv('fuel_consumption.csv')

# Display the first few rows of the dataset


print("Dataset Preview:\n", df.head())
# Define features (independent variables) and target (dependent variable)
X = df[['engine_size', 'horsepower', 'weight']] # Adjust based on your dataset
y = df['fuel_consumption']

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Create and train the model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test data


y_pred = model.predict(X_test)

# Display results
print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

# Plot actual vs predicted values


plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue')
plt.plot(y_test, y_test, color='red', linewidth=2) # Ideal line
plt.title('Actual vs Predicted Fuel Consumption')
plt.xlabel('Actual Fuel Consumption')
plt.ylabel('Predicted Fuel Consumption')
plt.grid(True)
plt.show()
Q.2. Write a python program to implement k-nearest Neighbors ML algorithm to
build prediction model (Use iris Dataset)
# Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train K-NN model


knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predictions and evaluation


y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

# Example prediction
sample = [[5.1, 3.5, 1.4, 0.2]]
pred_class = knn.predict(sample)
print("Predicted Class:", iris.target_names[pred_class[0]])

Slip6
Q.1. Write a python program to implement Polynomial Linear Regression for
Boston Housing Dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset (replace 'boston_housing.csv' with the actual CSV file name)
data = pd.read_csv('C:\\csv\\BostonHousing.csv')
print(data)

# Use 'RM' (average number of rooms per dwelling) as the feature and 'MEDV'
(median house value) as the target
X = data[['rm']]
y = data['medv']

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Transform the feature into polynomial terms (e.g., degree 2)


poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train the Polynomial Regression model


model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict on the test set


y_pred = model.predict(X_test_poly)

# Calculate and print the Mean Squared Error


mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Plot the results


plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='red', label='Polynomial Regression Fit')
plt.xlabel('Average number of rooms (RM)')
plt.ylabel('Median value of homes (MEDV)')
plt.title('Polynomial Linear Regression on Boston Housing Dataset')
plt.legend()
plt.show()

Q.2. Use K-means clustering model and classify the employees into various
income groups or clusters. Preprocess data if require (i.e. drop missing or null
values).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the dataset (replace 'employees.csv' with your actual CSV file name)
data = pd.read_csv('C:\\csv\\employee_data.csv')
# Display the first few rows and info about the dataset
print(data.head())
print(data.info())

# Preprocess: Drop missing values


data = data.dropna()

# Select relevant features (assuming 'Income' is the feature for clustering)


X = data[['EmpID']]

# Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply K-means clustering


kmeans = KMeans(n_clusters=3, random_state=42) # Adjust n_clusters as needed
data['Cluster'] = kmeans.fit_predict(X_scaled)

# Display the cluster assignments


print(data[['EmpID', 'Cluster']])
# Visualize the clusters
plt.figure(figsize=(10, 6))
plt.scatter(data['EmpID'], np.zeros_like(data['EmpID']), c=data['Cluster'],
cmap='viridis', marker='o')
plt.title('K-means Clustering of Employees by EmpID')
plt.xlabel('EmpID')
plt.yticks([]) # Hide y-axis ticks
plt.grid(True)
plt.show()

Slip7
Q.1. Fit the simple linear regression model to Salary_positions.csv data. Predict the sa of level 11
and level 12 employees.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Load and preview dataset


df = pd.read_csv('C:\csv\Position_Salaries.csv')
X = df[['Level']]
y = df['Salary']

# Train the model


model = LinearRegression()
model.fit(X, y)

# Predict salaries for levels 11 and 12


predicted_salaries = model.predict([[11], [12]])
print(f"Predicted salary for level 11: ${predicted_salaries[0]:.2f}")
print(f"Predicted salary for level 12: ${predicted_salaries[1]:.2f}")

# Plot data and predictions


plt.scatter(X, y, color='blue', label='Actual Salaries')
plt.plot(X, model.predict(X), color='red', label='Regression Line')
plt.scatter([11, 12], predicted_salaries, color='green', marker='x', s=100, label='Predictions')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.legend()
plt.show()

Q.2.Write a python program to implement Naive Bayes on weather forecast dataset.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

# Load the dataset


data = pd.read_csv('C:\\csv\\weather_forecast.csv')

# Display the first few rows


print(data.head())

# Preprocess: Encode categorical variables


label_encoders = {}
for column in data.columns:
if data[column].dtype == 'object':
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le

# Separate features and target variable


X = data.drop('weather', axis=1) # 'Play' is the target variable here
y = data['weather']
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Naive Bayes classifier


nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict on the test data


y_pred = nb_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Slip8

Q.1. Write a python program to categorize the given news text into one of the
available 20 categories of news groups, using multinomial Naïve Bayes machine
learning model. [15 M]
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the 20 Newsgroups dataset


newsgroups = fetch_20newsgroups(subset='all')

# Prepare the data


X = newsgroups.data # News text
y = newsgroups.target # Corresponding categories

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Convert text data to feature vectors using Count Vectorization


vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the Multinomial Naive Bayes model


model = MultinomialNB()
model.fit(X_train_vectorized, y_train)

# Make predictions
y_pred = model.predict(X_test_vectorized)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')
print('Classification Report:')
print(classification_report(y_test, y_pred,
target_names=newsgroups.target_names))

# Example of predicting a new article


new_article = ["This is a new article about sports and fitness."]
new_article_vectorized = vectorizer.transform(new_article)
predicted_category = model.predict(new_article_vectorized)
print(f'Predicted Category: {newsgroups.target_names[predicted_category[0]]}')

Q.2. Write a python program to implement Decision Tree whether or not to play
Tennis
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import matplotlib.pyplot as plt

# Load the dataset


data = pd.read_csv('C:\\csv\\play_tennis.csv')
print(data)

# Preprocess the data


X = data.drop('play', axis=1) # Features
y = data['play'] # Target variable

# Convert categorical variables to numerical using one-hot encoding


X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train the Decision Tree model


model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate


y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", classification_report(y_test, y_pred))

# Display the decision tree rules


tree_rules = export_text(model, feature_names=list(X.columns))
print("Decision Tree Rules:\n", tree_rules)

# Visualize the decision tree


plt.figure(figsize=(12, 8))
tree.plot_tree(model, feature_names=X.columns, class_names=['No', 'Yes'],
filled=True)
plt.title("Decision Tree for Play Tennis")
plt.show()
Slip 9
Q.1. Implement Ridge Regression and Lasso regression model using
boston_houses.csv and take only ‘RM’ and ‘Price’ of the houses. Divide the data
as training and testing data. Fit line using Ridge regression and to find price of a
house if it contains 5 rooms and compare results.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso

# Load the dataset


data = pd.read_csv('C:\\csv\\BostonHousing.csv')
print(data)
# Select relevant features
X = data[['rm']]
y = data['medv']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Ridge Regression


ridge_model = Ridge(alpha=1.0) # You can adjust the alpha value as needed
ridge_model.fit(X_train, y_train)

# Fit Lasso Regression


lasso_model = Lasso(alpha=1.0) # You can adjust the alpha value as needed
lasso_model.fit(X_train, y_train)

# Predict price for a house with 5 rooms


rooms = np.array([[5]])
ridge_prediction = ridge_model.predict(rooms)
lasso_prediction = lasso_model.predict(rooms)

# Display the results


print(f"Ridge Regression Prediction for 5 rooms: ${ridge_prediction[0]:.2f}")
print(f"Lasso Regression Prediction for 5 rooms: ${lasso_prediction[0]:.2f}")

Q.2. Write a python program to implement Linear SVM using UniversalBank.csv


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score

# Load the dataset


data = pd.read_csv('C:\\csv\\UniversalBank.csv')

# Display the first few rows of the dataset


print(data.head())

# Preprocess the data


# Assuming 'PersonalLoan' is the target variable and other columns are features
X = data.drop(['Personal Loan', 'ID'], axis=1) # Drop target and non-feature columns
y = data['Personal Loan'] # Target variable
# Convert categorical variables to dummy variables (if any)
X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the Linear SVM model


model = SVC(kernel='linear', random_state=42)
model.fit(X_train_scaled, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test_scaled)

# Evaluate the model


print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Slip 10
Q.1. Write a python program to transform data with Principal Component Analysis
(PCA). Use iris dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset


data = pd.read_csv('C:\\csv\\iris.csv') # Ensure your CSV is named correctly
print(data)

# Separate features and target variable


X = data.drop('Species', axis=1) # Drop the target column
y = data['Species'] # Target variable

# Standardize the features


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2) # Reduce to 2 dimensions
X_pca = pca.fit_transform(X_scaled)

# Create a DataFrame with the PCA results


pca_df = pd.DataFrame(data=X_pca, columns=['PC1', 'PC2'])
pca_df['species'] = y

# Plot the PCA results


plt.figure(figsize=(8, 6))
for species in np.unique(y):
plt.scatter(pca_df[pca_df['species'] == species]['PC1'],
pca_df[pca_df['species'] == species]['PC2'],
label=species)

plt.title('PCA of Iris Dataset')


plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid()
plt.show()

Q.2. Write a Python program to prepare Scatter Plot for Iris Dataset. Convert
Categorical values in to numeric
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load and prepare the Iris dataset


iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Create a scatter plot


plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris_df, x='sepal length (cm)', y='sepal width (cm)',
hue='species', style='species', s=100)
plt.title('Iris Dataset Scatter Plot')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.grid(True)
plt.show()

Slip11
Q.1. Write a python program to implement Polynomial Regression for Boston
Housing Dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset


data = pd.read_csv('C:\\csv\\BostonHousing.csv') # Replace with the correct path
to your dataset

# Use 'RM' (average number of rooms) as the feature and 'MEDV' (median house
value) as the target
X = data[['rm']] # Features
y = data['medv'] # Target variable

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Transform the feature into polynomial terms (degree 2)


poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

# Train the Polynomial Regression model


model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict on the test set


y_pred = model.predict(X_test_poly)

# Calculate Mean Squared Error


mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Plot the results


plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='red', label='Polynomial Regression Fit')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('Median Value of Homes (MEDV)')
plt.title('Polynomial Regression on Boston Housing Dataset')
plt.legend()
plt.grid()
plt.show()

Q.2. Write a python program to Implement Decision Tree classifier model on Data
which is extracted from images that were taken from genuine and forged banknote-
like specimens.
(refer UCI dataset https://archive.ics.uci.edu/dataset/267/banknote+authentication)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset from the UCI repository


url =
"https://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.
txt"
column_names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
data = pd.read_csv(url, header=None, names=column_names)

# Separate features and target variable


X = data.drop('Class', axis=1)
y = data['Class']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Decision Tree Classifier


classifier = DecisionTreeClassifier(random_state=42)
classifier.fit(X_train, y_train)

# Make predictions on the test set


y_pred = classifier.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

# Plot the Decision Tree


plt.figure(figsize=(20, 10))
plot_tree(classifier, feature_names=X.columns, class_names=['Not Authenticated',
'Authenticated'], filled=True, rounded=True)
plt.show()

Slip12
Q.1. Write a python program to implement k-nearest Neighbors ML algorithm to
build prediction model (Use iris Dataset). [15 M]
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train K-NN model


knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predictions and evaluation


y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

# Example prediction
sample = [[5.1, 3.5, 1.4, 0.2]]
pred_class = knn.predict(sample)
print("Predicted Class:", iris.target_names[pred_class[0]])

2. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_splitfrom sklearn.metrics import
mean_squared_error

# Load the dataset


data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable


X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Simple Linear Regression


simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression


poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models


mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')


print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')

# Predict salaries for levels 11 and 12


level_11_salary = poly_model.predict(poly.transform([[11]]))[0]
level_12_salary = poly_model.predict(poly.transform([[12]]))[0]
print(f'Predicted salary for level 11: ${level_11_salary:.2f}')
print(f'Predicted salary for level 12: ${level_12_salary:.2f}')

# Plotting the results


plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)plt.scatter(X_test, y_pred_poly, color='green',
label='Polynomial Regression Predictions', alpha=0.5)

# Plot Polynomial Regression Fit


X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')

plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()
Slip 13
Q.1. Create RNN model and analyze the Google stock price dataset. Find out
increasing or decreasing trends of stock price for the next day. [15 M]
Q.2. Write a python program to implement simple Linear Regression for predicting
house price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable


X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model


model = LinearRegression().fit(X_train, y_train)
# Predict and visualize
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, model.predict(X_test), color='red', label='Predicted Prices')
plt.title('House Price Prediction')
plt.xlabel('Size of House')
plt.ylabel('Price of House')
plt.legend()
plt.show()

# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")

Slip 14
Q.1. Create a CNN model and train it on mnist handwritten digit dataset. Using
model find out the digit written by a hand in a given image.
Import mnist dataset from tensorflow.keras.datasets. [15 M]
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing import image

# Load the MNIST dataset


(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess the data


X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255 # Normalize and
reshape
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255

y_train = to_categorical(y_train, 10) # One-hot encode target labels


y_test = to_categorical(y_test, 10)

# Build the CNN model


model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])

# Compile the model


model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model


model.fit(X_train, y_train, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model


test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_accuracy:.4f}")

# Function to predict a digit in a new image


def predict_digit(img_path):
img = image.load_img(img_path, target_size=(28, 28), color_mode='grayscale')
img = image.img_to_array(img).astype('float32') / 255
img = np.expand_dims(img, axis=0) # Reshape to match input shape of the
model
prediction = model.predict(img)
digit = np.argmax(prediction)
print(f"Predicted Digit: {digit}")
plt.imshow(img.reshape(28, 28), cmap='gray')
plt.title(f"Predicted Digit: {digit}")
plt.show()

# Example usage
# Save a test image (28x28 grayscale image of a digit) in your working directory
with the name 'digit.png'
predict_digit('digit.png')
Q.2. Write a python program to find all null values in a given dataset and remove
them.
Create your own dataset.
import pandas as pd
import numpy as np

# Create a sample dataset with some null values


data = {
'Name': ['Alice', 'Bob', 'Charlie', np.nan, 'Eve'],
'Age': [25, 30, np.nan, 28, 22],
'City': ['New York', 'Los Angeles', np.nan, 'Chicago', 'Houston']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the original DataFrame


print("Original DataFrame:")
print(df)

# Check for null values


null_values = df.isnull().sum()
print("\nNull Values in Each Column:")
print(null_values)

# Remove rows with null values


df_cleaned = df.dropna()

# Display the cleaned DataFrame


print("\nDataFrame after removing null values:")
print(df_cleaned)

Slip15
Q.1. Create an ANN and train it on house price dataset classify the house price is
above average or below average. [15 M]
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from tensorflow.keras import layers

# Load the dataset


data = pd.read_csv('C:\\csv\\house-prices.csv')

# Create binary labels based on average price


data['Price_Label'] = (data['Price'] > data['Price'].mean()).astype(int)

# Features and target


X = data[['SqFt', 'Bedrooms', 'Bathrooms']]
y = data['Price_Label']

# Split and scale data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train_scaled = StandardScaler().fit_transform(X_train)
X_test_scaled = StandardScaler().fit_transform(X_test)

# Build the ANN model


model = keras.Sequential([
layers.Dense(16, activation='relu', input_shape=(X_train_scaled.shape[1],)),
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='sigmoid')
])

# Compile and train the model


model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X_train_scaled, y_train, epochs=100, batch_size=10,
validation_split=0.2)

# Evaluate the model


accuracy = model.evaluate(X_test_scaled, y_test)[1]
print(f'Accuracy: {accuracy:.4f}')

# Plot training history


plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Q.2. Write a python program to implement multiple Linear Regression for a house
price dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the house price dataset


data = pd.read_csv('C:\\csv\\house-prices.csv')
print(data)

# Features and target variable


X = data[['SqFt', 'Bedrooms', 'Bathrooms']] # Independent variables
y = data['Price'] # Dependent variable

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create and fit the model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Display the coefficients


print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")

# Visualize the results


plt.scatter(y_test, y_pred)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Prices')
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=2) # Line for perfect
predictions
plt.show()

Slip 16
Q.1. Create a two layered neural network with relu and sigmoid activation
function.
import numpy as np

# Activation functions
def relu(x):
return np.maximum(0, x)

def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Initialize parameters
np.random.seed(0)
weights_input_hidden = np.random.rand(3, 4) # 3 inputs, 4 hidden neurons
weights_hidden_output = np.random.rand(4, 1) # 4 hidden neurons, 1 output

# Forward pass
def forward_pass(X):
hidden_output = relu(np.dot(X, weights_input_hidden))
final_output = sigmoid(np.dot(hidden_output, weights_hidden_output))
return hidden_output, final_output

# Example input
X = np.array([[0.1, 0.2, 0.3]])
hidden_output, final_output = forward_pass(X)
print("Hidden Layer Output:", hidden_output)
print("Final Output:", final_output)

Q.2. Write a python program to implement Simple Linear Regression for Boston
housing dataset
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load the dataset from a CSV file


data = pd.read_csv('C:\\csv\\Bostonousing.csv') # Make sure to replace with your
file path

# Display the first few rows of the dataset (optional)


print(data.head())

# Define the features (X) and the target variable (y)


X = data.drop('medv', axis=1) # Drop the target column
y = data['medv'] # Target column

# Split the dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create a linear regression model


model = LinearRegression()

# Train the model


model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)


print("R^2 Score:", r2)

# Plotting the true values vs predicted values


plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', edgecolor='k', alpha=0.7)
plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', lw=2) # Line for perfect
predictions
plt.title('True Values vs Predicted Values')
plt.xlabel('True Values (MEDV)')
plt.ylabel('Predicted Values (MEDV)')
plt.grid(True)
plt.show().

Slip17
Q.1. Implement Ensemble ML algorithm on Pima Indians Diabetes Database with
bagging (random forest), boosting, voting and Stacking methods and display
analysis accordingly. Compare result. [15 M]
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,
GradientBoostingClassifier, VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load the Pima Indians Diabetes dataset


data = pd.read_csv('C:\\csv\\diabetes.csv')
X = data.drop('Outcome', axis=1) # Features
y = data['Outcome'] # Target variable

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Standardize the features


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize models
models = {
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'Gradient Boosting': GradientBoostingClassifier(n_estimators=100,
random_state=42),
'Voting Classifier': VotingClassifier(estimators=[
('rf', RandomForestClassifier(n_estimators=100)),
('gb', GradientBoostingClassifier(n_estimators=100))
], voting='soft'),
'Stacking Classifier': StackingClassifier(estimators=[
('rf', RandomForestClassifier(n_estimators=100)),
('gb', GradientBoostingClassifier(n_estimators=100))
], final_estimator=LogisticRegression())
}

# Train models and evaluate accuracy


results = {}
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
results[name] = accuracy

# Display results
print("Model Accuracies:")
for model_name, accuracy in results.items():
print(f"{model_name}: {accuracy:.4f}")

# Plotting the results


plt.bar(results.keys(), results.values(), color=['blue', 'orange', 'green', 'red'])
plt.ylabel('Accuracy')
plt.title('Ensemble Learning Model Comparison')
plt.ylim([0, 1])
plt.xticks(rotation=45)
plt.show()
Q.2. Write a python program to implement Multiple Linear Regression for a house
price dataset.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset from CSV


data = pd.read_csv('C:\\csv\\house-prices.csv') # Replace with your dataset file
path
print(data.head())

# Separate features and target variable


X = data[['SqFt', 'Bedrooms', 'Bathrooms']] # Replace with relevant features
y = data['Price'] # Target variable

# Split data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create and train the model


model = LinearRegression()
model.fit(X_train, y_train)

# Predict on the test set


y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")

# Plot actual vs. predicted prices


plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue')
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=2)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()

Slip18
Q.1. Write a python program to implement k-means algorithm on a Diabetes
dataset. [15 M]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the Pima Indians Diabetes dataset


data = pd.read_csv('C:\\csv\\diabetes.csv')
X = data.drop('Outcome', axis=1) # Features

# Standardizing the features


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Determine the optimal number of clusters using the elbow method


wcss = []
for i in range(2, 11): # Testing cluster sizes from 2 to 10
kmeans = KMeans(n_clusters=i, random_state=42)
kmeans.fit(X_scaled)
wcss.append(kmeans.inertia_)

# Plotting the elbow method


plt.plot(range(2, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()

# Choosing an optimal number of clusters based on elbow plot


optimal_clusters = 3 # Adjust this based on the plot

# Fit the K-Means model with the chosen number of clusters


kmeans_final = KMeans(n_clusters=optimal_clusters, random_state=42)
y_kmeans = kmeans_final.fit_predict(X_scaled)

# Add the cluster labels to the original dataset


data['Cluster'] = y_kmeans

# Visualize the clusters


plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y_kmeans, s=50, cmap='viridis')
plt.scatter(kmeans_final.cluster_centers_[:, 0], kmeans_final.cluster_centers_[:, 1],
s=200, c='red', label='Centroids')
plt.title('K-Means Clustering of Diabetes Dataset')
plt.xlabel('Feature 1 (Standardized)')
plt.ylabel('Feature 2 (Standardized)')
plt.legend()
plt.show()

Q.2. Write a python program to implement Polynomial Linear Regression for


salary_positions dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error

# Load the dataset from a local CSV file


data = pd.read_csv('C:\\csv\\Position_Salaries.csv') # Ensure this CSV file has
'Level' and 'Salary' columns

# Define the features and target variable


X = data[['Level']] # Features (Level)
y = data['Salary'] # Target variable (Salary)

# Polynomial Feature Transformation


poly_degree = 4 # Degree of polynomial features
poly_features = PolynomialFeatures(degree=poly_degree)
X_poly = poly_features.fit_transform(X)

# Fit Polynomial Regression


poly_model = LinearRegression()
poly_model.fit(X_poly, y)

# Predict using the Polynomial Model


y_pred = poly_model.predict(X_poly)

# Calculate Mean Squared Error for evaluation


mse = mean_squared_error(y, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Plotting the Polynomial Regression Results


plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Actual Salaries') # Scatter plot of actual
salaries
plt.plot(X, y_pred, color='red', label='Polynomial Regression (Degree 4)') #
Polynomial fit line
plt.title('Polynomial Regression for Salary vs Position Level')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.legend()
plt.grid(True)
plt.show()

Slip 19
Q.1. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the dataset


data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable


X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Simple Linear Regression


simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression


poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models


mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)
print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')
print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')
# Predict salaries for levels 11 and 12
level_11_salary = poly_model.predict(poly.transform([[11]]))[0]
level_12_salary = poly_model.predict(poly.transform([[12]]))[0]
print(f'Predicted salary for level 11: ${level_11_salary:.2f}')
print(f'Predicted salary for level 12: ${level_12_salary:.2f}')

# Plotting the results


plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)
plt.scatter(X_test, y_pred_poly, color='green', label='Polynomial Regression
Predictions', alpha=0.5)

# Plot Polynomial Regression Fit


X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')

plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()

Q2.Write a python program to implement Naive Bayes on weather forecast dataset.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

# Load the dataset


data = pd.read_csv('C:\\csv\\weather_forecast.csv')

# Display the first few rows


print(data.head())

# Preprocess: Encode categorical variables


label_encoders = {}
for column in data.columns: if data[column].dtype == 'object':
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
label_encoders[column] = le

# Separate features and target variable


X = data.drop('weather', axis=1) # 'Play' is the target variable here
y = data['weather']

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Naive Bayes classifier


nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict on the test data


y_pred = nb_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

slip 20
Q1.Implement Ridge Regression, Lasso regression model using boston_houses.csv and
take only ‘RM’ and ‘Price’ of the houses. divide the data as training and testing
data. Fit line using Ridge regression and to find price of a house if it contains 5
rooms. and compare results.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso

# Load the dataset


data = pd.read_csv('C:\\csv\\BostonHousing.csv')
print(data)

# Select relevant features


X = data[['rm']]
y = data['medv']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Fit Ridge Regression
ridge_model = Ridge(alpha=1.0) # You can adjust the alpha value as needed
ridge_model.fit(X_train, y_train)

# Fit Lasso Regression


lasso_model = Lasso(alpha=1.0) # You can adjust the alpha value as needed
lasso_model.fit(X_train, y_train)

# Predict price for a house with 5 rooms


rooms = np.array([[5]])
ridge_prediction = ridge_model.predict(rooms)
lasso_prediction = lasso_model.predict(rooms)

# Display the results


print(f"Ridge Regression Prediction for 5 rooms: ${ridge_prediction[0]:.2f}")
print(f"Lasso Regression Prediction for 5 rooms: ${lasso_prediction[0]:.2f}")

Q.2. Write a python program to implement Decision Tree whether or not to play Tennis.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import matplotlib.pyplot as plt

# Load the dataset


data = pd.read_csv('C:\\csv\\play_tennis.csv')
print(data)

# Preprocess the data


X = data.drop('play', axis=1) # Features
y = data['play'] # Target variable

# Convert categorical variables to numerical using one-hot encoding


X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Decision Tree model


model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", classification_report(y_test, y_pred))

# Display the decision tree rules


tree_rules = export_text(model, feature_names=list(X.columns))
print("Decision Tree Rules:\n", tree_rules)

# Visualize the decision tree


plt.figure(figsize=(12, 8))
tree.plot_tree(model, feature_names=X.columns, class_names=['No', 'Yes'], filled=True)
plt.title("Decision Tree for Play Tennis")
plt.show()

Slip21
Q.1. Create a multiple linear regression model for house price dataset divide dataset into
train and test data while giving it to model and predict prices of house.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset from a local CSV file


data = pd.read_csv('C:\\csv\\BostonHousing.csv')

# Split data into features and target


X = data.drop('medv', axis=1) # Features
y = data['medv'] # Target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model


model = LinearRegression()
model.fit(X_train, y_train)

# Predict prices
y_pred = model.predict(X_test)

# Calculate and display mean squared error


mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
# Display predictions for the first 5 houses
print("Predicted Prices for the first 5 houses:", y_pred[:5])

# Plotting the predicted prices vs actAual prices


plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', alpha=0.6)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', lw=2) # Diagonal
line
plt.title('Predicted Prices vs Actual Prices')
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.grid(True)
plt.show()

Q.2. Write a python program to implement Linear SVM using UniversalBank.csv.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score

# Load the dataset


data = pd.read_csv('C:\\csv\\UniversalBank.csv')

# Display the first few rows of the dataset


print(data.head())

# Preprocess the data


# Assuming 'PersonalLoan' is the target variable and other columns are features
X = data.drop(['Personal Loan', 'ID'], axis=1) # Drop target and non-feature columns
y = data['Personal Loan'] # Target variable

# Convert categorical variables to dummy variables (if any)


X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the Linear SVM model


model = SVC(kernel='linear', random_state=42)
model.fit(X_train_scaled, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test_scaled)

# Evaluate the model


print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Slip22
Q.1. Write a python program to implement simple Linear Regression for predicting house
price.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load dataset
df = pd.read_csv('C:\csv\house-prices.csv').dropna()
print(df)

# Define features and target variable


X = df[['Bedrooms']] # Adjust column names as necessary
y = df['Price']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=0)

# Train the model


model = LinearRegression().fit(X_train, y_train)
# Predict and visualize
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, model.predict(X_test), color='red', label='Predicted Prices')
plt.title('House Price Prediction')
plt.xlabel('Size of House')
plt.ylabel('Price of House')
plt.legend()
plt.show()
# Print coefficients
print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}")

Q.2. Use Apriori algorithm on groceries dataset to find which items are brought together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

# Load the dataset


data = pd.read_csv('C:\csv\groceries.csv')
print(data)

# Convert each transaction into a one-hot encoded DataFrame


# Each column represents an item, and each row represents a transaction
data = data.apply(lambda row: row.dropna().tolist(), axis=1)
one_hot_data =
pd.get_dummies(data.apply(pd.Series).stack()).groupby(level=0).sum()

# Apply Apriori algorithm with a minimum support of 0.25


frequent_itemsets = apriori(one_hot_data, min_support=0.25,
use_colnames=True)

# Generate association rules with lift metric


rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
# Display results
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)

Slip23
Q.1. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given
data. Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the dataset


data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable


X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Fit Simple Linear Regression


simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression


poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models


mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')


print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')

# Predict salaries for levels 11 and 12


level_11_salary = poly_model.predict(poly.transform([[11]]))[0]
level_12_salary = poly_model.predict(poly.transform([[12]]))[0]
print(f'Predicted salary for level 11: ${level_11_salary:.2f}')
print(f'Predicted salary for level 12: ${level_12_salary:.2f}')

# Plotting the results


plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)
plt.scatter(X_test, y_pred_poly, color='green', label='Polynomial Regression
Predictions', alpha=0.5)

# Plot Polynomial Regression Fit


X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')

plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()
Q.2. Write a python program to find all null values from a dataset and remove them
import pandas as pd
# Load dataset (replace with your dataset path)
data = pd.read_csv('C:\\csv\\iris.csv')

# Display initial null value counts


print("Null values before removal:")
print(data.isnull().sum())

# Remove rows with any null values


cleaned_data = data.dropna()

# Display null value counts after removal


print("\nNull values after removal:")
print(cleaned_data.isnull().sum())

# Optionally, save the cleaned dataset


cleaned_data.to_csv('iris.csv', index=False)
Slip24
Q.1. Write a python program to Implement Decision Tree classifier model on Data which is
extracted from images that were taken from genuine and forged banknote-like
specimens.
(refer UCI dataset https://archive.ics.uci.edu/dataset/267/banknote+authentication)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset from the UCI repository


url =
"https://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.
txt"
column_names = ['Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class']
data = pd.read_csv(url, header=None, names=column_names)

# Separate features and target variable


X = data.drop('Class', axis=1)
y = data['Class']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train the Decision Tree Classifier


classifier = DecisionTreeClassifier(random_state=42)
classifier.fit(X_train, y_train)

# Make predictions on the test set


y_pred = classifier.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

# Plot the Decision Tree


plt.figure(figsize=(20, 10))
plot_tree(classifier, feature_names=X.columns, class_names=['Not Authenticated',
'Authenticated'], filled=True, rounded=True)
plt.show()

Q.2. Write a python program to implement linear SVM using UniversalBank.csv.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score

# Load the dataset


data = pd.read_csv('C:\\csv\\UniversalBank.csv')

# Display the first few rows of the dataset


print(data.head())

# Preprocess the data


# Assuming 'PersonalLoan' is the target variable and other columns are features
X = data.drop(['Personal Loan', 'ID'], axis=1) # Drop target and non-feature columns
y = data['Personal Loan'] # Target variable

# Convert categorical variables to dummy variables (if any)


X = pd.get_dummies(X, drop_first=True)

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the feature values


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the Linear SVM model


model = SVC(kernel='linear', random_state=42)
model.fit(X_train_scaled, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test_scaled)

# Evaluate the model


print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Slip25
Q.1. Write a python program to implement Polynomial Regression for house price dataset.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset


data = pd.read_csv('C:\\csv\\BostonHousing.csv') # Replace with the correct path
to your dataset

# Use 'RM' (average number of rooms) as the feature and 'MEDV' (median house
value) as the target
X = data[['rm']] # Features
y = data['medv'] # Target variable

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Transform the feature into polynomial terms (degree 2)


poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Train the Polynomial Regression model
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predict on the test set


y_pred = model.predict(X_test_poly)

# Calculate Mean Squared Error


mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Plot the results


plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='red', label='Polynomial Regression Fit')
plt.xlabel('Average Number of Rooms (RM)')
plt.ylabel('Median Value of Homes (MEDV)')
plt.title('Polynomial Regression on Boston Housing Dataset')
plt.legend()
plt.grid()
plt.show()

Q.2. Create a two layered neural network with relu and sigmoid activation function.
import numpy as np

# Activation functions
def relu(x):
return np.maximum(0, x)

def sigmoid(x):
return 1 / (1 + np.exp(-x))

# Initialize parameters
np.random.seed(0)
weights_input_hidden = np.random.rand(3, 4) # 3 inputs, 4 hidden neurons
weights_hidden_output = np.random.rand(4, 1) # 4 hidden neurons, 1 output
# Forward pass
def forward_pass(X):
hidden_output = relu(np.dot(X, weights_input_hidden))
final_output = sigmoid(np.dot(hidden_output, weights_hidden_output))
return hidden_output, final_output

# Example input
X = np.array([[0.1, 0.2, 0.3]])
hidden_output, final_output = forward_pass(X)
print("Hidden Layer Output:", hidden_output)
print("Final Output:", final_output)

Slip26
Q.1. Create KNN model on Indian diabetes patient’s database and predict whether a new
patient is diabetic (1) or not (0). Find optimal value of K.

Q.2. Use Apriori algorithm on groceries dataset to find which items are brought together.
Use minimum support =0.25
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

# Load the dataset


data = pd.read_csv('C:\csv\groceries.csv')
print(data)

# Convert each transaction into a one-hot encoded DataFrame


# Each column represents an item, and each row represents a transaction
data = data.apply(lambda row: row.dropna().tolist(), axis=1)
one_hot_data =
pd.get_dummies(data.apply(pd.Series).stack()).groupby(level=0).sum()
# Apply Apriori algorithm with a minimum support of 0.25
frequent_itemsets = apriori(one_hot_data, min_support=0.25,
use_colnames=True)

# Generate association rules with lift metric


rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Display results
print("Frequent Itemsets:\n", frequent_itemsets)
print("\nAssociation Rules:\n", rules)

Slip 27
Q.1. Create a multiple linear regression model for house price dataset divide dataset into
train and test data while giving it to model and predict prices of house.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset from a local CSV file


data = pd.read_csv('C:\\csv\\BostonHousing.csv')

# Split data into features and target


X = data.drop('medv', axis=1) # Features
y = data['medv'] # Target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model


model = LinearRegression()
model.fit(X_train, y_train)
# Predict prices
y_pred = model.predict(X_test)

# Calculate and display mean squared error


mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Display predictions for the first 5 houses


print("Predicted Prices for the first 5 houses:", y_pred[:5])

# Plotting the predicted prices vs actAual prices


plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', alpha=0.6)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', lw=2) # Diagonal
line
plt.title('Predicted Prices vs Actual Prices')
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.grid(True)
plt.show()

Q.2. Fit the simple linear regression and polynomial linear regression models to
Salary_positions.csv data. Find which one is more accurately fitting to the given data.
Also predict the salaries of level 11 and level 12 employees.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the dataset


data = pd.read_csv('C:\\csv\\Position_salaries.csv')

# Features and target variable


X = data[['Level']]
y = data['Salary']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Fit Simple Linear Regression
simple_model = LinearRegression()
simple_model.fit(X_train, y_train)
y_pred_simple = simple_model.predict(X_test)

# Fit Polynomial Linear Regression


poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
X_train_poly, X_test_poly, y_train, y_test = train_test_split(X_poly, y,
test_size=0.2, random_state=42)

poly_model = LinearRegression()
poly_model.fit(X_train_poly, y_train)
y_pred_poly = poly_model.predict(X_test_poly)

# Calculate Mean Squared Error for both models


mse_simple = mean_squared_error(y_test, y_pred_simple)
mse_poly = mean_squared_error(y_test, y_pred_poly)

print(f'MSE of Simple Linear Regression: {mse_simple:.2f}')


print(f'MSE of Polynomial Linear Regression: {mse_poly:.2f}')

# Predict salaries for levels 11 and 12


level_11_salary = poly_model.predict(poly.transform([[11]]))[0]
level_12_salary = poly_model.predict(poly.transform([[12]]))[0]
print(f'Predicted salary for level 11: ${level_11_salary:.2f}')
print(f'Predicted salary for level 12: ${level_12_salary:.2f}')

# Plotting the results


plt.scatter(X, y, color='blue', label='Actual Data')
plt.scatter(X_test, y_pred_simple, color='red', label='Simple Regression
Predictions', alpha=0.5)
plt.scatter(X_test, y_pred_poly, color='green', label='Polynomial Regression
Predictions', alpha=0.5)

# Plot Polynomial Regression Fit


X_plot = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_plot = poly_model.predict(poly.transform(X_plot))
plt.plot(X_plot, y_plot, color='orange', label='Polynomial Regression Fit')
plt.title('Salary Prediction')
plt.xlabel('Employee Level')
plt.ylabel('Salary')
plt.legend()
plt.grid()
plt.show()

Slip 28
Q.1. Write a python program to categorize the given news text into one of the available 20
categories of news groups, using multinomial Naïve Bayes machine learning model.
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Load the 20 Newsgroups dataset


newsgroups = fetch_20newsgroups(subset='all')

# Prepare the data


X = newsgroups.data # News text
y = newsgroups.target # Corresponding categories

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Convert text data to feature vectors using Count Vectorization


vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the Multinomial Naive Bayes model


model = MultinomialNB()
model.fit(X_train_vectorized, y_train)

# Make predictions
y_pred = model.predict(X_test_vectorized)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.4f}')
print('Classification Report:')
print(classification_report(y_test, y_pred,
target_names=newsgroups.target_names))

# Example of predicting a new article


new_article = ["This is a new article about sports and fitness."]
new_article_vectorized = vectorizer.transform(new_article)
predicted_category = model.predict(new_article_vectorized)
print(f'Predicted Category: {newsgroups.target_names[predicted_category[0]]}')

Q.2. Classify the iris flowers dataset using SVM and find out the flower type depending on
the given input data like sepal length, sepal width, petal length and petal width. Find
accuracy of all SVM kernels.
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the iris dataset


data=pd.read_csv('C:\\csv\\iris.csv')
X = data[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']] # Features
y = data['Species'] # Target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# List of SVM kernels to evaluate


kernels = ['linear', 'poly', 'rbf', 'sigmoid']
accuracies = {}

# Train and evaluate SVM with different kernels


for kernel in kernels:
model = SVC(kernel=kernel)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies[kernel] = accuracy
print(f"Accuracy with {kernel} kernel: {accuracy:.4f}")

# Example input for prediction (sepal length, sepal width, petal length, petal width)
new_flower = [[5.1, 3.5, 1.4, 0.2]] # Example input
predictions = {kernel: model.predict(new_flower)[0] for kernel in kernels}

# Display predictions for the new flower


print("\nPredicted flower types for the new input:")
for kernel, flower_type in predictions.items():
print(f"Kernel: {kernel}")

Slip 29
Q.1. Take iris flower dataset and reduce 4D data to 2D data using PCA. Then train the
model and predict new flower with given measurements.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset from CSV


data = pd.read_csv('C:\\csv\\iris.csv')
print(data)
X = data[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']] # Features
y = data['Species'] # Target

# Split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# SVM Kernels
kernels = ['linear', 'poly', 'rbf', 'sigmoid']

# Train and evaluate models


for kernel in kernels:
model = SVC(kernel=kernel)
model.fit(X_train, y_train)
accuracy = accuracy_score(y_test, model.predict(X_test))
print(f"Accuracy with {kernel} kernel: {accuracy:.4f}")

# Prediction for a new sample


new_flower = [[5.1, 3.5, 1.4, 0.2]]
print("\nPredictions for the new flower input:")
for kernel in kernels:
prediction = model.predict(new_flower)[0]
print(f"Kernel: {kernel}")

Q.2. Use K-means clustering model and classify the employees into various income groups
or clusters. Preprocess data if require (i.e. drop missing or null values). Use elbow
method and Silhouette Score to find value of k.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Load dataset
data = pd.read_csv('C:\\csv\\employee_data.csv').dropna() # Drop missing values
print(data.head())
# Select features for clustering
X = data[['EmpID']] # Modify as needed

# Find optimal K using Elbow and Silhouette methods


inertia = []
silhouette_scores = []

for k in range(2, 11):


kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X)
inertia.append(kmeans.inertia_)
silhouette_scores.append(silhouette_score(X, kmeans.labels_))

# Plot Elbow and Silhouette scores


plt.figure(figsize=(10, 4))
plt.plot(range(2, 11), inertia, 'o-', label='Inertia')
plt.plot(range(2, 11), silhouette_scores, 'o-', label='Silhouette Score')
plt.xlabel('Number of clusters (K)')
plt.legend()
plt.show()

# Optimal K and final clustering


optimal_k = 3 # Set based on elbow and silhouette analysis
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
data['Cluster'] = kmeans.fit_predict(X)

# Display results
print(data.head())

You might also like