0% found this document useful (0 votes)
2 views25 pages

R22 ML Lab Manual

The document is a machine learning lab manual detailing various experiments involving Python programming for statistical analysis, data manipulation, and machine learning applications. It covers topics such as central tendency measures, basic Python libraries (Statistics, Math, NumPy, SciPy), and implementations of different regression and classification algorithms using libraries like sklearn. Each experiment includes example code and explanations for practical understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views25 pages

R22 ML Lab Manual

The document is a machine learning lab manual detailing various experiments involving Python programming for statistical analysis, data manipulation, and machine learning applications. It covers topics such as central tendency measures, basic Python libraries (Statistics, Math, NumPy, SciPy), and implementations of different regression and classification algorithms using libraries like sklearn. Each experiment includes example code and explanations for practical understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

1

MACHINE LEARNING LAB MANUAL

Experiment - 1 : Write a python program to compute central tendency


measures : Mean, Median, Mode, Mode Measure of Dispersion: Variance,
Standard Deviation

import pandas as pd

data = {
'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Smith', 'Jack',
'Lee',Chanchal', 'Gasper', 'Naviya', ‘Andres']),

'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46]),

'Ra ng': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80,
4.10, 3.65])
}

df = pd.DataFrame(data)
print(df)

# Calculating mean, median, and mode for Rating

age_mean = df['Age'].mean()
age_median = df['Age'].median()
age_mode = df['Age'].mode()
ra ng_mean = df['Ra ng'].mean()
ra ng_median = df['Ra ng'].median()
ra ng_mode = df['Ra ng'].mode()

print("Mean Age:", age_mean)


print("Median Age:", age_median)
print("Mode Age:", age_mode)
print("Mean Ra ng:", ra ng_mean)
print("Median Ra ng:", ra ng_median)
print("Mode Ra ng:", ra ng_mode)
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
2

# Calculating variance and standard deviation

age_variance = df['Age'].var()
age_standard_devia on = df['Age'].std()
ra ng_variance = df['Ra ng'].var()
ra ng_standard_devia on = df['Ra ng'].std()

print("Variance...Age:", age_variance)
print("Standard devia on...Age:", age_standard_devia on)
print("Variance...Ra ng:", ra ng_variance)
print("Standard devia on...Ra ng:", ra ng_standard_devia on)
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
3

Experiment - 2 : Study of basic Python libraries such as Statistics, Math,


Numpy and Scipy.

1. Statistics Library
• The statistics module provides functions for statistical computations like mean,
median, mode, and standard deviation.
• It is built into Python, so no extra installation is required.
• Useful for simple data analysis and numerical summaries.
• Supports working with lists, tuples, and other iterable data structures.
• Ideal for beginners to perform basic statistical calculations.

Example Program:
import statistics
data = ([1,2,3,4,5])
print("Mean:", statistics.mean(data))
print("Median:", statistics.median(data))
print("Mode:", statistics.mode(data))

2. Math Library
• The math module provides mathematical functions such as square root, power,
trigonometry, and logarithms.
• It is built-in and does not require installation.
• Contains constants like pi and e.
• Helps perform complex mathematical operations ef ciently.
• Ideal for engineering, physics, and general numerical computations.

Example Program:
import math
a=16
b=4
print(“a+b=“,a+b)
print(“a-b=“,a-b)
print(“a*b=“,a*b)
print(“a/b=“,a/b)
print(“a%b=“,a%b)
print("Square root:", math.sqrt(num))
fi
4

3. NumPy Library
• NumPy (Numerical Python) is used for array manipulations and numerical computations.
• Provides multi-dimensional array objects (ndarray) with fast operations.
• Supports mathematical functions like linear algebra and statistics.
• Requires installation using pip install numpy.
• Widely used in scienti c computing and machine learning.

Example Program:
import numpy as np
Data = np.array([1, 2, 3, 4, 5])
print("Mean:", np.mean(data))
print("Median:", np.mean(data))
print(arr.ndim)

4. SciPy Library
• SciPy (Scienti c Python) is built on NumPy for advanced mathematical operations.
• Contains modules for optimization, integration, interpolation, and statistics.
• Useful for scienti c and engineering applications.
• Requires installation using pip install scipy.
• Provides specialized functions like signal processing and image manipulation.

Example Program:
from scipy.special import gcd
print("GCD:", gcd(8, 12))
fi
fi
fi
5

Experiment - 3 : Study of basic Python libraries for ML application such as


Pandas and Matplotlib.

Pandas :
Pandas is a powerful Python library for data manipulation and analysis.
It provides data structures like Series (1D) and DataFrame (2D) for handling
structured data.
Pandas supports data cleaning, ltering, aggregation, and visualization with built-in
functions.
It ef ciently handles CSV, Excel, SQL, JSON, and other le formats.
Pandas is widely used in data science, nance, and machine learning for preprocessing
data.
Example Program :
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)

Matplotlib :
Matplotlib is a Python library for creating static, animated, and interactive visualizations.
It provides the pyplot module, which offers a MATLAB-like interface for easy plotting.
Matplotlib supports line plots, bar charts, histograms, scatter plots, and more.
It allows customization of axes, labels, legends, colors, and styles for detailed
visualization.
Widely used in data science, machine learning, and engineering for data representation.

Example Program :
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(500, 4),
columns =['a', 'b', 'c', 'd'])
fi
fi
fi
fi
6
df.plot.scatter(x ='a', y ='b')
plt.show()

Output :
7

4. Write a Python program to implement Simple Linear Regression

import numpy as np
import matplotlib.pyplot as plt

# Func on to implement simple linear regression


def simple_linear_regression(X, y):
# Add a column of ones to X for the bias term (intercept)
X = np.c_[np.ones(X.shape[0]), X]

# Normal equa on: theta = (X^T * X)^-1 * X^T * y


theta = np.linalg.inv(X.T @ X) @ X.T @ y

return theta

# Func on to predict using the learned model


def predict(X, theta):
# Add a column of ones to X for the bias term (intercept)
X = np.c_[np.ones(X.shape[0]), X]

return X @ theta

# Genera ng some example data (linear rela onship)


np.random.seed(0)
X = 2 * np.random.rand(100, 1) # 100 data points with 1 feature
y = 4 + 3 * X + np.random.randn(100, 1) # Linear equa on: y = 4 + 3*X + noise

# Applying simple linear regression to nd theta (parameters)


theta = simple_linear_regression(X, y)

print(f"Learned Parameters (theta): \n{theta}")


ti
ti
ti
ti
fi
ti
ti
8
# Predict using the learned model
y_pred = predict(X, theta)

# Plo ng the results


plt.sca er(X, y, color='blue', label='Data Points')
plt.plot(X, y_pred, color='red', label='Regression Line')
plt.xlabel('X')
plt.ylabel('y')
plt. tle('Simple Linear Regression')
plt.legend()
plt.show()

Output :
ti
tti
tt
9
10
5. Implementation of Multiple Linear Regression for House Price
Prediction using sklearn

import numpy as np
import pandas as pd
from sklearn.model_selec on import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Example: Hypothe cal dataset (Replace this with actual data)


# Create a pandas DataFrame with house features and prices
data = {
'Square_Feet': [1500, 1800, 2400, 3000, 3500],
'Num_Bedrooms': [3, 4, 3, 5, 4],
'Num_Bathrooms': [2, 3, 2, 3, 3],
'Age_of_House': [10, 15, 20, 5, 8],
'Price': [400000, 500000, 600000, 650000, 700000] # Target variable
}

# Create a pandas DataFrame from the data


df = pd.DataFrame(data)

# Features (X) and target variable (y)


X = df[['Square_Feet', 'Num_Bedrooms', 'Num_Bathrooms', 'Age_of_House']]
# Independent variables
y = df['Price'] # Dependent variable (house price)

# Split the dataset into training and tes ng sets (80% training, 20% tes ng)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
ti
ti
ti
ti
11
# Ini alize the Mul ple Linear Regression model
model = LinearRegression()

# Train the model on the training data


model. t(X_train, y_train)

# Make predic ons on the test set


y_pred = model.predict(X_test)

# Evaluate the model performance


mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

print(f"Mean Squared Error: {mse}")


print(f"Root Mean Squared Error: {rmse}")
print(f"Model Coe cients: {model.coef_}")
print(f"Intercept: {model.intercept_}")

# Visualize the true vs predicted prices (if needed, this will work well for
smaller datasets)
plt.sca er(y_test, y_pred)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red',
linestyle='--')
plt.xlabel('True Prices')
plt.ylabel('Predicted Prices')
plt. tle('True vs Predicted House Prices')
plt.show()
ti
ti
tt
fi
ti
ffi
ti
12
Output :
13
6. Implementation of Decision tree using sklearn and its parameter tuning

from sklearn.datasets import load_iris


from sklearn.model_selec on import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassi er
from sklearn.metrics import classi ca on_report, accuracy_score
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

# 1. Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# 2. Split into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# 3. Create a Decision Tree classi er


dt = DecisionTreeClassi er(random_state=42)

# 4. De ne parameter grid for tuning


param_grid = {
'criterion': ['gini', 'entropy'], # or 'log_loss' for newer versions
'max_depth': [None, 2, 3, 4, 5],
'min_samples_split': [2, 3, 4],
'min_samples_leaf': [1, 2, 3]
}
fi
fi
ti
fi
fi
ti
fi
14
# 5. Use GridSearchCV to nd the best parameters
grid_search = GridSearchCV(es mator=dt, param_grid=param_grid, cv=5,
scoring='accuracy')
grid_search. t(X_train, y_train)

# 6. Print the best parameters


print("Best Parameters:", grid_search.best_params_)

# 7. Evaluate the best model


best_dt = grid_search.best_es mator_
y_pred = best_dt.predict(X_test)

# 8. Print evalua on metrics


print("\nAccuracy:", accuracy_score(y_test, y_pred))
print("\nClassi ca on Report:\n", classi ca on_report(y_test, y_pred))

# 9. Visualize the decision tree


plt. gure( gsize=(12, 8))
plot_tree(best_dt,
feature_names=iris.feature_names,
class_names=iris.target_names,
lled=True,
rounded=True)
plt. tle("Decision Tree Visualiza on (Best Es mator)")
plt.show()
ti
fi
fi
fi
fi
fi
ti
ti
fi
ti
ti
ti
fi
ti
ti
15
Output
16
17
7. Implementation of KNN using sklearn

from sklearn.datasets import load_iris


from sklearn.model_selec on import train_test_split
from sklearn.neighbors import KNeighborsClassi er
from sklearn.metrics import accuracy_score, classi ca on_report

# 1. Load the Iris dataset


iris = load_iris()
X = iris.data
y = iris.target

# 2. Split into training and test data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# 3. Create and train the KNN classi er


knn = KNeighborsClassi er(n_neighbors=3) # Using k=3
knn. t(X_train, y_train)

# 4. Predict the test set results


y_pred = knn.predict(X_test)

# 5. Evaluate the model


print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classi ca on Report:\n", classi ca on_report(y_test, y_pred))
fi
fi
ti
fi
ti
fi
fi
ti
fi
fi
ti
18
Output
19
8. Implementation of Logistic Regression using sklearn

# Step 1: Import necessary libraries


from sklearn.datasets import load_iris
from sklearn.linear_model import Logis cRegression
from sklearn.model_selec on import train_test_split
from sklearn.metrics import accuracy_score, classi ca on_report

# Step 2: Load dataset


iris = load_iris()
X = iris.data
y = iris.target

# For binary classi ca on, use only two classes


X = X[y != 2]
y = y[y != 2]

# Step 3: Split into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Step 4: Create logis c regression model


model = Logis cRegression()

# Step 5: Train the model


model. t(X_train, y_train)

# Step 6: Predict on test set


y_pred = model.predict(X_test)

# Step 7: Evaluate the model


print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classi ca on Report:\n", classi ca on_report(y_test, y_pred))
fi
fi
ti
ti
fi
ti
ti
ti
fi
ti
ti
fi
ti
20
Output :
21
9. Implementation of K-Means Clustering

# Step 1: Import necessary libraries


import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Step 2: Create synthe c dataset


X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.60,
random_state=0)

# Step 3: Apply KMeans


kmeans = KMeans(n_clusters=3, random_state=0)
kmeans. t(X)
y_kmeans = kmeans.predict(X)

# Step 4: Plot the results


plt.sca er(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.sca er(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X')
plt. tle("K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
ti
tt
tt
fi
ti
22
Output :
23
10. Performance analysis of Classification Algorithms on a specific dataset
(Mini Project)

Note : Download “Credit card fraud detection” Dataset from Kaggle website
using this link : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Program :
import pandas as pd
import numpy as np
from sklearn.model_selec on import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Logis cRegression
from sklearn.ensemble import RandomForestClassi er
from sklearn.metrics import classi ca on_report, confusion_matrix,
roc_auc_score, precision_recall_curve, auc

# Load the dataset


data = pd.read_csv('creditcard.csv')

# Separate features and target


X = data.drop('Class', axis=1)
y = data['Class']

# Scale the 'Amount' and 'Time' features


scaler = StandardScaler()
X['Amount'] = scaler. t_transform(X['Amount'].values.reshape(-1, 1))
X['Time'] = scaler. t_transform(X['Time'].values.reshape(-1, 1))

# Split the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42, stra fy=y)

# Ini alize classi ers


lr_model = Logis cRegression(max_iter=1000, class_weight='balanced',
random_state=42)
rf_model = RandomForestClassi er(n_es mators=100,
class_weight='balanced', random_state=42)

# Train classi ers


lr_model. t(X_train, y_train)
rf_model. t(X_train, y_train)
ti
fi
fi
fi
fi
ti
fi
fi
ti
ti
fi
fi
ti
ti
ti
fi
24

# Predict and evaluate


models = {'Logis c Regression': lr_model, 'Random Forest': rf_model}

for name, model in models.items():


y_pred = model.predict(X_test)
y_score = model.predict_proba(X_test)[:, 1]
print(f"\n{name} Evalua on")
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("Classi ca on Report:")
print(classi ca on_report(y_test, y_pred))
print("ROC AUC Score:", roc_auc_score(y_test, y_score))
precision, recall, _ = precision_recall_curve(y_test, y_score)
pr_auc = auc(recall, precision)
print("Precision-Recall AUC:", pr_auc)

# Note: Add model saving, further tuning, or use of other classi ers like
XGBoost if required.
fi
fi
ti
ti
ti
ti
fi
25
Output :

You might also like