0% found this document useful (0 votes)

15 views11 pages

ML Lab Manual 4-8

The document outlines the implementation of various machine learning algorithms using the sklearn library, including Multiple Linear Regression, Decision Trees, K-Nearest Neighbors (KNN), Logistic Regression, and K-Means Clustering, primarily using the Iris and California Housing datasets. Each section details the objective, theoretical background, dataset description, evaluation metrics, and code snippets for model training and evaluation. The document emphasizes the importance of model evaluation metrics such as accuracy, confusion matrix, and various clustering performance metrics.

Uploaded by

naiduusha227

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

ML Lab Manual 4-8

Uploaded by

naiduusha227

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

5.

Multiple Linear Regression for House Price Prediction

Objective
To implement a Multiple Linear Regression model using the sklearn library to predict house prices
based on various features of a given housing dataset.

Theory
Multiple Linear Regression is a supervised learning algorithm that models the relationship between
a dependent variable y and multiple independent variables x1 , x2 , . . . , xn . The model’s goal is to find a
linear relationship in the form:

y = b0 + b1 x1 + b2 x2 + · · · + bn xn

where b0 is the intercept, and b1 , b2 , . . . , bn are the coefficients.

Evaluation Metrics:
Mean Squared Error (MSE): Measures the average squared difference between actual and
predicted values. Lower values indicate a better fit.
R-squared Score (R2 ): Represents the proportion of variance explained by the model. An R2
closer to 1 indicates a better fit.

Dataset
The California Housing Dataset contains information about California housing, including median
house prices and features like median income, housing median age, etc. The dataset is accessed using
fetch california housing() from sklearn.datasets.

Program Code
# Required Libraries
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load Dataset
house_data = fetch_california_housing()
x = pd.DataFrame(house_data.data, columns=house_data.feature_names)
y = pd.Series(house_data.target)

# Display Dataset Description

print(house_data.DESCR)

# Splitting the Dataset

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

1
# Initialize and Train the Model
model = LinearRegression().fit(X_train, y_train)

# Make Predictions
y_pred = model.predict(X_test)

# Model Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Output Model Coefficients and Metrics

print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Mean Squared Error:", mse)
print("R-squared:", r2)

# Display Actual vs Predicted Values

comparison = pd.DataFrame({'Actual Price': y_test, 'Predicted Price': y_pred})
print(comparison.head())

# Visualization
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', edgecolor='k', alpha=0.7)

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)],

color='red', linestyle='--', linewidth=2)

plt.title('Actual vs Predicted House Prices')

plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.grid(True)
plt.show()

Conclusion
This program implements muliple linear regression to predict house price from the dataset imported
from sklearn.

2
6. Implementation of Decision Tree using sklearn and Parameter
Tuning
Objective
To implement a Decision Tree classifier on the Iris dataset, understand its structure, and enhance the
model’s performance using parameter tuning techniques with sklearn.

Theory
Decision Tree Classifier: A decision tree is a supervised learning algorithm used for both classification
and regression tasks. It splits the dataset into branches based on feature values, creating a structure
resembling a tree to classify or predict the target variable. For classification tasks, the decision tree uses
different criteria, such as Gini impurity or Entropy, to determine the best split for data at each node.
Decision trees can easily overfit; therefore, parameters like max depth and min samples split are tuned
to prevent excessive branching.
Parameter Tuning: By adjusting parameters in the decision tree model, we can improve its accuracy
and avoid issues such as overfitting (where the model learns noise instead of patterns) or underfitting
(where the model is too simple). For a Decision Tree, the key parameters to tune include:
criterion: Measures the quality of the split, using either gini or entropy.

max depth: Limits the tree’s depth to prevent overfitting.

min samples split: Sets the minimum number of samples required to split an internal node.

Dataset
Iris Dataset: The Iris dataset is a commonly used dataset for classification. It contains 150 instances,
with each instance described by four features (sepal length, sepal width, petal length, petal width) and
classified into one of three species (Setosa, Versicolor, Virginica).

Features:

– Sepal length (cm)

– Sepal width (cm)
– Petal length (cm)
– Petal width (cm)

Target: Species (Setosa, Versicolor, Virginica)

Evaluation Parameters
To assess the model’s performance, the following metrics are used:
Accuracy: Measures the proportion of correct predictions out of total predictions.

Cross-Validation Score: During parameter tuning, cross-validation helps verify how well the
model generalizes across different data splits.

Code
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import metrics
import matplotlib.pyplot as plt

3
from sklearn import tree

# Load the Iris dataset

iris = load_iris()
X = iris.data # Features
y = iris.target # Target

# Description of Iris Dataset

print(iris.DESCR)

# Splitting the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating the Decision Tree model with initial parameters

model = DecisionTreeClassifier(criterion='entropy', random_state=42)

# Training the model

model.fit(X_train, y_train)

# Predicting the target for the test data

y_pred = model.predict(X_test)

# Evaluating the model's accuracy

accuracy = metrics.accuracy_score(y_test, y_pred)
print("Accuracy before tuning:", accuracy)

# Parameter tuning using GridSearchCV

param_grid = {
'criterion': ['gini', 'entropy'],
'max_depth': [3, 5, 7, None],
'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5)

grid_search.fit(X_train, y_train)

# Best parameters and accuracy after tuning

best_model = grid_search.best_estimator_
print("Best Parameters:", grid_search.best_params_)
print("Best Model Accuracy after tuning:", grid_search.best_score_)

# Final prediction using the best model

y_pred_best = best_model.predict(X_test)
final_accuracy = metrics.accuracy_score(y_test, y_pred_best)
print("Final Accuracy with tuned model:", final_accuracy)

# Visualizing the best Decision Tree

plt.figure(figsize=(12, 8))
tree.plot_tree(best_model, filled=True, feature_names=iris.feature_names, class_names=iris.target_na
plt.title("Tuned Decision Tree")
plt.show()

Analysis

Effect of Parameter Tuning: Parameter tuning has a significant effect on model performance.
By optimizing max depth and min samples split, we reduce overfitting and make the tree more
generalized.
Cross-Validation and Best Parameters: The GridSearchCV technique helps identify the best

4
combination of parameters using cross-validation, which verifies the model’s stability across differ-
ent training data subsets.
Model Accuracy: The accuracy improved after tuning, demonstrating the effectiveness of select-
ing optimal parameters. This parameter tuning process is essential for achieving a balanced model
performance.
Conclusion This Program demonstrates how to implement a Decision Tree classifier using sklearn,
visualize it, and enhance its performance through parameter tuning. Proper tuning can help achieve a
better accuracy score and an optimized decision tree structure, which generalizes well on test data.

5
7. Implementation of K-Nearest Neighbors (KNN) Using sklearn
Objective
To implement the K-Nearest Neighbors (KNN) algorithm using the sklearn library to classify the Iris
dataset, exploring the effect of different values of k (number of neighbors).

Theory
K-Nearest Neighbors (KNN) is a simple, instance-based learning algorithm used for classification
and regression tasks. For a given data point, it finds the k-closest points in the training dataset and
classifies the point based on the majority class of these neighbors.
Distance Metric: The algorithm typically uses Euclidean distance to measure the proximity of data
points. The choice of k (number of neighbors) influences the algorithm’s bias and variance.

Low k (e.g., 1) can lead to high variance (overfitting).

High k can increase bias, making the model too simple.

Dataset
Iris Dataset: This dataset contains measurements of sepal length, sepal width, petal length, and petal
width for 150 iris flowers, divided into three classes: Iris Setosa, Iris Versicolour, and Iris Virginica.
Features Used: For simplicity, only the first two features (sepal length and sepal width) are used
in this implementation.

Evaluation Parameters
Accuracy: Measures the percentage of correctly classified samples in the test set.

Confusion Matrix: Displays the counts of actual vs. predicted classifications, helping to evaluate
the model’s accuracy per class.

Code

# Required Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, confusion_matrix

# Load Iris dataset

iris = load_iris()
X = iris.data[:, :2] # Take only the first two features for simplicity
y = iris.target
print(iris.DESCR) # Display dataset description

# Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize KNN model with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Train the model

knn.fit(X_train, y_train)

6
# Predict on test set
y_pred = knn.predict(X_test)

# Evaluation Criteria
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Confusion Matrix:")
print(conf_matrix)

# Visualization
plt.figure(figsize=(8, 6))

plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train,

cmap='coolwarm', marker='*', s=100, label='Training Data')

plt.scatter(X_test[:, 0], X_test[:, 1], c='green', marker='x', s=200, label='Test Data')

# Draw lines to nearest neighbors for each test point

for test_point in X_test:
distances, indices = knn.kneighbors([test_point])
for index in indices[0]:
neighbor = X_train[index]
plt.plot([test_point[0], neighbor[0]], [test_point[1], neighbor[1]], 'k--', lw=1)

plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('K-Nearest Neighbors Visualization (Iris Dataset)')
plt.legend()
plt.show()

Conclusion
The K-Nearest Neighbors algorithm is sensitive to the choice of k, which impacts classification accuracy.
This program implements KNN algorithm for IRIS dataset and uses proximity to classify data and
outputs the accuracy metrics.

7
8. Implementation of Logistic Regression Using sklearn
Objective
To implement a Logistic Regression model using the sklearn library for binary classification using the
Iris dataset and evaluate its performance based on accuracy, confusion matrix, and classification report.

Theory
Logistic Regression is a supervised learning algorithm used for binary classification problems. It
models the probability of a binary outcome based on one or more predictor variables.
The model uses the sigmoid function to convert linear predictions into probabilities, where:
1
σ(z) =
1 + e−z

and z = wT x + b.
Binary Classification: For binary outcomes, logistic regression predicts either 0 or 1 based on a
threshold, typically 0.5.
Evaluation Metrics: Common metrics include accuracy, precision, recall, F1-score, and the
confusion matrix.

Dataset
Iris Dataset: This dataset contains measurements of various features for three types of Iris flowers.
For simplicity, only two of the three classes are used in this implementation (binary classification).
Features Used: Only the first two features (sepal length and sepal width) are used to simplify
visualization.

Evaluation Parameters
Accuracy: The proportion of correctly classified samples out of the total samples.

Confusion Matrix: A table layout that allows visualization of the performance of an algorithm,
showing actual vs. predicted classes.
Classification Report: Provides precision, recall, F1-score, and support for each class, which
helps understand model performance in a more detailed manner.

Code
# Required Libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Loading the dataset (using Iris dataset as an example)

data = load_iris()
X = data.data
y = data.target

# Selecting only two classes for binary classification

X = X[y != 2, :2] # Using only the first two features for easy visualization
y = y[y != 2]

# Splitting the dataset into training and testing sets

8
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Creating the Logistic Regression model

model = LogisticRegression()

# Training the model

model.fit(X_train, y_train)

# Making predictions
y_pred = model.predict(X_test)

# Evaluating the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Printing evaluation metrics

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

Conclusion
This Program demonstrates the implementation of Logistic Regression for binary classification on the
Iris dataset:
Accuracy: The model’s accuracy shows the proportion of correctly classified instances in the test
data.
Confusion Matrix: The confusion matrix allows analysis of misclassifications and correctly clas-
sified instances for each class.
Classification Report: Shows detailed metrics (precision, recall, F1-score) for each class, helping
in evaluating the model’s effectiveness.

9
9.Implementation of K-Means Clustering Using sklearn
Objective
To implement the K-Means clustering algorithm using the sklearn library for unsupervised clustering
on the Iris dataset and to evaluate clustering performance using metrics such as inertia, adjusted Rand
index, and silhouette score.

Theory
K-Means Clustering is an unsupervised learning algorithm that partitions a dataset into K distinct,
non-overlapping clusters. Each cluster is defined by its centroid, which is the mean of the points in that
cluster.
Steps of K-Means:

1. Randomly initialize K centroids.

2. Assign each data point to the closest centroid, forming clusters.
3. Recalculate the centroid of each cluster based on the points assigned to it.
4. Repeat steps 2 and 3 until centroids stabilize.

Distance Metric: The clusters are assigned using Euclidean distance, which measures the
straight-line distance between data points and centroids.
Hyperparameter K: The number of clusters K must be set manually.

Dataset
Iris Dataset: Contains measurements of features for three Iris flower species. Only the first two
features are used for 2D visualization.
Features Used: Sepal length and sepal width.

Evaluation Parameters
Cluster Centroids: Mean points of each cluster.

Inertia: Sum of squared distances between each point and its cluster centroid.

Adjusted Rand Index (ARI): Measures similarity between true labels and clustering labels.
ARI is 1 for perfect clustering and 0 for random.
Silhouette Score: Measures how similar each point is to its own cluster compared to other
clusters. Scores range from -1 (poor) to +1 (ideal clustering).

Code
# Required Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.metrics import adjusted_rand_score, silhouette_score

# Load Iris Dataset

iris = load_iris()
X = iris.data[:, :2] # Use only the first two features for 2D visualization
y = iris.target # True labels for comparison (not used in clustering)

# Plotting the Original Data

plt.figure(figsize=(12, 5))

10
# Original Data
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', marker='o', edgecolor='k', s=100)
plt.title('Original Iris Dataset')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid()

# K-Means Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
predictions = kmeans.predict(X)

# Evaluation Metrics
inertia = kmeans.inertia_
ari = adjusted_rand_score(y, predictions)
silhouette_avg = silhouette_score(X, predictions)

print(f"Inertia: {inertia:.2f}")
print(f"Adjusted Rand Index (ARI): {ari:.2f}")
print(f"Silhouette Score: {silhouette_avg:.2f}")

# Plotting the K-Means Result

plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=predictions, cmap='viridis', marker='o', edgecolor='k', s=100)

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],

c='red', marker='X', s=200, label='Centroids')

plt.title('K-Means Clustering Result')

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.grid()

plt.tight_layout()
plt.show()

Conclusion
This experiment demonstrates the implementation of K-Means clustering and includes evaluation metrics
for a deeper understanding:
Centroids: The centroids serve as the centers of each cluster.

Euclidean Distance for Assignment: Data points are assigned to clusters based on the mini-
mum Euclidean distance to the cluster centroids.
Inertia: Measures compactness within clusters.

Adjusted Rand Index (ARI): Shows similarity between true labels and predicted clusters.

Silhouette Score: Measures the quality of clustering; higher values indicate well-defined clusters.

This experiment provides insights into unsupervised clustering and evaluates clustering quality through
multiple metrics.

Comparing Linear Regression and Decision Trees For Housing Price Prediction
No ratings yet
Comparing Linear Regression and Decision Trees For Housing Price Prediction
8 pages
Ai Combined Update
No ratings yet
Ai Combined Update
274 pages
Sklearn
No ratings yet
Sklearn
141 pages
K-Nearest Neighbors Classifiers 2025
No ratings yet
K-Nearest Neighbors Classifiers 2025
33 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Mod3 Classification
No ratings yet
Mod3 Classification
32 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
DTS 101 Lecture 4
No ratings yet
DTS 101 Lecture 4
27 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
10 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
ABHAYMLFILE
No ratings yet
ABHAYMLFILE
16 pages
CO3
No ratings yet
CO3
8 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
ML Lab Manual
No ratings yet
ML Lab Manual
6 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
QB 1
No ratings yet
QB 1
11 pages
ML Exp-5,6
No ratings yet
ML Exp-5,6
6 pages
Big Data Practical
No ratings yet
Big Data Practical
20 pages
Machine Learning Presentaion
No ratings yet
Machine Learning Presentaion
15 pages
ML Manual
No ratings yet
ML Manual
24 pages
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
No ratings yet
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
5 pages
ML Record
No ratings yet
ML Record
19 pages
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
No ratings yet
VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1
24 pages
Comparison of Classifiers
No ratings yet
Comparison of Classifiers
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
19 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
ML Experiment - 9 - Final
No ratings yet
ML Experiment - 9 - Final
6 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Lab 4 - Logistic Regression - KNN - Notes
No ratings yet
Lab 4 - Logistic Regression - KNN - Notes
6 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
CR Lab
No ratings yet
CR Lab
5 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Practical 1ritesh
No ratings yet
Practical 1ritesh
3 pages
HousepricedataDT - Ipynb - Colab
No ratings yet
HousepricedataDT - Ipynb - Colab
3 pages
ML 4
No ratings yet
ML 4
5 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
CART Practical 6
No ratings yet
CART Practical 6
2 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
AIML Lab 3 4
No ratings yet
AIML Lab 3 4
5 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Decision - Tree - Regression - Ipynb - Colab
No ratings yet
Decision - Tree - Regression - Ipynb - Colab
3 pages
Essentials of Statistics For Business & Economics 9th Edition David R. Anderson - Ebook PDF All Chapters Instant Download
100% (3)
Essentials of Statistics For Business & Economics 9th Edition David R. Anderson - Ebook PDF All Chapters Instant Download
62 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
Decision Tree
No ratings yet
Decision Tree
1 page
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
Prac5 AAM
No ratings yet
Prac5 AAM
2 pages
Ex 6, EX 7 AIML
No ratings yet
Ex 6, EX 7 AIML
9 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Lab 6
No ratings yet
Lab 6
4 pages
Ludic - Workshop - Iris - Copie
No ratings yet
Ludic - Workshop - Iris - Copie
5 pages
FDP Session 4 (Decision Tree)
No ratings yet
FDP Session 4 (Decision Tree)
1 page
AIML 7 To 11
No ratings yet
AIML 7 To 11
7 pages
Henderson 1984 PDF
No ratings yet
Henderson 1984 PDF
384 pages
Financial Econometrics, Mathematics and Statistics
No ratings yet
Financial Econometrics, Mathematics and Statistics
19 pages
Research and Medical Statistics Questions and Notes For Bams Students
No ratings yet
Research and Medical Statistics Questions and Notes For Bams Students
42 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Poisson CDF Table
No ratings yet
Poisson CDF Table
6 pages
Python For Data Science - Unit 6 - Week 4
No ratings yet
Python For Data Science - Unit 6 - Week 4
5 pages
Statistics Amp Probability Summative Test PDF Free
No ratings yet
Statistics Amp Probability Summative Test PDF Free
1 page
4 How To Estimate Parameters of A Weibull Distribution
No ratings yet
4 How To Estimate Parameters of A Weibull Distribution
6 pages
Fabozzi 2009
No ratings yet
Fabozzi 2009
30 pages
9 Regression Analysis
No ratings yet
9 Regression Analysis
38 pages
Edited Book - Recent Applied Research in Mathematical, Statistical and Computational Sciences
No ratings yet
Edited Book - Recent Applied Research in Mathematical, Statistical and Computational Sciences
129 pages
Statistics Theory Notes 2025
No ratings yet
Statistics Theory Notes 2025
15 pages
Random Process
No ratings yet
Random Process
71 pages
Team 19 Project Report
No ratings yet
Team 19 Project Report
27 pages
Lesson 7 - Linear Correlation and Simple Linear Regression
No ratings yet
Lesson 7 - Linear Correlation and Simple Linear Regression
8 pages
Probability R, V
No ratings yet
Probability R, V
91 pages
Data Management
No ratings yet
Data Management
44 pages
Supervised Learning With Scikit-Learn: Preprocessing Data
No ratings yet
Supervised Learning With Scikit-Learn: Preprocessing Data
32 pages
Sempro
No ratings yet
Sempro
24 pages
Cap 7
No ratings yet
Cap 7
32 pages
Annal Horowitz Mammen 2004
No ratings yet
Annal Horowitz Mammen 2004
32 pages
Modeling Uncertainty P2
No ratings yet
Modeling Uncertainty P2
11 pages
215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
Jayawardhana - Samaranayake - 2003 - Accelerated Testing - Weibull
No ratings yet
Jayawardhana - Samaranayake - 2003 - Accelerated Testing - Weibull
16 pages
Gomez 2019 - Converting Brix To TDS
No ratings yet
Gomez 2019 - Converting Brix To TDS
29 pages
One Tailed Test
No ratings yet
One Tailed Test
10 pages
(ENGDAT2) Exercise 3
No ratings yet
(ENGDAT2) Exercise 3
10 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
4 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

ML Lab Manual 4-8

Uploaded by

ML Lab Manual 4-8

Uploaded by

5.

Multiple Linear Regression for House Price Prediction

where b0 is the intercept, and b1 , b2 , . . . , bn are the coefficients.

# Display Dataset Description

# Splitting the Dataset

# Output Model Coefficients and Metrics

# Display Actual vs Predicted Values

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)],

plt.title('Actual vs Predicted House Prices')

 max depth: Limits the tree’s depth to prevent overfitting.

– Sepal length (cm)

 Target: Species (Setosa, Versicolor, Virginica)

# Load the Iris dataset

# Description of Iris Dataset

# Creating the Decision Tree model with initial parameters

# Training the model

# Predicting the target for the test data

# Evaluating the model's accuracy

# Parameter tuning using GridSearchCV

grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5)

# Best parameters and accuracy after tuning

# Final prediction using the best model

# Visualizing the best Decision Tree

 Low k (e.g., 1) can lead to high variance (overfitting).

 High k can increase bias, making the model too simple.

# Load Iris dataset

# Split the dataset into training and test sets

# Initialize KNN model with k=3

# Train the model

plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train,

plt.scatter(X_test[:, 0], X_test[:, 1], c='green', marker='x', s=200, label='Test Data')

# Draw lines to nearest neighbors for each test point

# Loading the dataset (using Iris dataset as an example)

# Selecting only two classes for binary classification

# Splitting the dataset into training and testing sets

# Creating the Logistic Regression model

# Training the model

# Evaluating the model

# Printing evaluation metrics

1. Randomly initialize K centroids.

# Load Iris Dataset

# Plotting the Original Data

# Plotting the K-Means Result

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],

plt.title('K-Means Clustering Result')

You might also like

max depth: Limits the tree’s depth to prevent overfitting.

Target: Species (Setosa, Versicolor, Virginica)

Low k (e.g., 1) can lead to high variance (overfitting).

High k can increase bias, making the model too simple.