0% found this document useful (0 votes)

106 views62 pages

Presentation - Final Thesis

The document presents a comparative study of machine learning algorithms for early cost estimation of building projects in Nepal. It discusses traditional cost estimation methods and the potential of machine learning models to provide more accurate predictions. Various machine learning algorithms are implemented and compared, including linear regression, decision trees, random forests, neural networks and others.

Uploaded by

anjuli sapkota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views62 pages

Presentation - Final Thesis

Uploaded by

anjuli sapkota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

A COMPARATIVE STUDY OF MACHINE LEARNING

ALGORITHMS FOR EARLY COST ESTIMATION OF

BUILDING PROJECTS IN NEPAL

FINAL THESIS
PRESENTATION
M.Sc. In Construction Management
Department of Civil Engineering, Pulchowk Campus, IOE, TU, Nepal
21th November, 2023

PRESENTED BY
PRESENTED BY
ANJULI SAPKOTA
SUPERVISED BY
Er Samrakshya Karki,
ANJULI SAPKOTA (078/MScOM/002) (078/MSCOM/002)
Building Design Authority Pvt.Ltd
OUTLINES
Introduction
Problem Statement
Objectives
Literature Review
Research Methodology
Results
Conclusion
INTRODUCTION
 Predicting construction expenses is crucial in the early
phases of a building project [2].

 Cost is seen as a standard indicator of the resources used

on a project [3].

 Quantity Rate Analysis is the primary conventional

Source : https://www.vecteezy.com/photo/3468650-architects-
. are-using-a-calculator-to-estimate-the-cost-of-house-plans
method for estimating costs that are commonly utilized
[4].
 Despite the growing popularity of machine learning in various
industries, its application to the construction sector in Nepal has
been relatively limited.

 Traditional cost estimation methods might not fully account for

these complexities, making machine learning an appealing option to
develop more accurate cost estimation model.

 Accurate cost estimation can lead to potential

cost savings and improved time efficiency during project execution,
making it an attractive proposition for the construction industry in
Nepal.

4
PROBLEM STATEMENT
 Construction cost estimation relies on the knowledge of a human expert and engineers'
whose experience is frequently not verified or documented.

 Incorrect cost estimation causes a variety of issues, including modification orders and
delays in the construction process [5].

 Traditional technologies are unable to process and evaluate the vast volume of data
generated by the construction sector, resulting in the loss and storage of a significant
amount of data [7].
OBJECTIVES
Main Objective:
To compare the performance of different machine learning algorithms in estimating

the preliminary costs of Building construction projects specifically in Nepal.

Specific Objectives:
To identify the most significant features/input for cost prediction models of buildings.

To develop and compare the performance of different machine learning models. .
LITERATURE REVIEW
 (Sonmez, 2004) discovered that regression models, on the other hand, typically required fewer
model parameters than neural networks, which had led to greater prediction performance if
the relationships between the variables are well stated.

 The research done by (Cho, 2013) showed that the artificial neural network model had a
lower error rate than the multiple regression model of projected building costs.

 (Kim G. H., 2013) used 197 cases for model construction and validation. remaining 20
instances for testing and discovered that the NN model provided more accurate estimation
results than the RA and SVM models.

 (Badawy, 2020) conducted research where 174 actual residential projects in Egypt provided
the source of the statistics. In comparison to the ANN model and regression models, the
hybrid model's mean absolute percentage error was 10.64%, which is lower.

More details in report..

8
More details in report..
RESEARCH METHODOLOGY
11
Topic Selection

The selection of the topic “A Comparative Study of Machine Learning Algorithms for
Early Cost Estimation of Building Projects in Nepal.” for the thesis was driven by its
profound relevance and practicality within the context of the Nepalese construction
industry.

By considering the data from past projects, computers can learn and make accurate and
fast estimations as compared to software tools and human expertise which is tedious and
time-consuming.

12
Expert Opinion

From the literature reviews the input factors were gathered.

The questionnaire was filled out by 5 experts including both contractor and consultant.

The following criteria are taken for expertise:

 more than 12 years of experience in this construction field.

 must have a relevant educational background.

 working as Consultant/Contractor.
14
Pilot Testing

 A pilot test was conducted with 3 respondents to check the clarity and
comprehensibility of the questionnaire.

 The respondents easily understood the questionnaire; hence, there was no difficulty in
filling up the questionnaire.

 The minimum time taken to fill the questionnaire was around 10 minutes and the
maximum time taken was almost 20 minutes.

15
 The summary of the respondents of the pilot test is given in the table
below:

16
Data Collection

Building projects’ structural data were gathered from various construction

firms and consultancies with their final cost of projects.

 Data were collected from the Department of Urban Development and

Building Construction(DUDBC), Consultancies, and Contractors.

 The data collection process was very tough as cost the bidding amount is
confidential for contractors.

17
18
19
Data Preprocessing
 Separated Numerical and Categorical Features (excluding the total cost of the project).

 Read the Excel file into a Data Frame (df).

 Counted the number of numerical features.

 Plotted individual scatter plots for numerical features to visualize the data and identify
outliers.

 Plotted scatter plots between numerical features and the total cost of the project to
analyze their relationship.

 Plotted a normal distribution graph for numerical features to analyze the distribution
of the data.
 Calculated mean, median, mode, and variance for individual numeric features.

 Replaced missing values in numerical features with the mean, median, and mode, and
data was saved in separate Excel sheets.

 Normal distribution graph was plotted for all metrics.

 Based on the analysis of variance, replaced missing values with mean as there is less
variance in data when replaced with the mean value.

 Counted the number of missing categorical features.

 Plotted histograms for categorical features to find whether values are unique or in some
order.

21
 Replaced missing values in categorical features with the mode (i.e., repeated values)and
again plotted histograms.

 Features were encoded using one-hot encoding since it represent distinct project names
without any inherent order.

 Encoded categorical features using one-hot encoding give all the data values in numeric
features, hence suitable for further analysis.

 There are no missing categorical data now.

 Final Data set is ready and Splitting the data set into training and testing sets in a
ratio of 80:20 to implement in models.

22
Models Implementation
 In this work models such as Linear Regressor, Decision Tree Method, Random Forest
method, Artificial Neural Networks, Support Vector Machine, XGboost method, Extra
tree method, Voting Regression, and Stacking method are implemented.
 Linear Regression (LR) is a basic regression model used to establish a linear
relationship
between independent and dependent variables.
 Decision Tree Regressor (DT) partitions data into subsets based on features for
predictions.
Random Forest Regressor (RF), an ensemble method, combines predictions from
multiple decision trees.
The Neural Network (NN) comprises several dense layers with varying activations
trained using the ’Adam’ optimizer for100 epochs to minimize mean squared error.
23
XGBoost Regressor (XGB) is a gradient-boosting algorithm that combines weak
learners to boost predictive performance.
Support Vector Machine (SVM) with a linear kernel is used for regression.
Extra Trees Regressor (ET) is similar to Random Forest but employs random thresholds
for feature splitting.
VotingRegressor (Voting) amalgamates LR, DT, and RF models.
Stacking Regressor (Stacking) combines LR, DT, RF, ET, and Gradient Boosting
models via a meta-regressor.
Each model showcases unique methodologies and predictive strengths tailored to the
task at hand.

24
Linear Regression, Decision Tree, Random Forest, Extra Trees, Voting Regressor, Support
Vector Machine, Gradient Boosting: scikit-learn
Neural Network: Keras with TensorFlow backend
XGBoost: xgboost library

Model Architectures
Linear Regression (LR): Utilizes a simple linear model to establish a linear relation-
ship between input features and the target variable. No hidden layers are involved.
Decision Tree (DT) Regressor: Employs a decision tree-based model to make
predictions using a tree-like graph, consisting of nodes representing features, branches,
and leaf nodes containing the predicted values.
Random Forest (RF) Regressor: Comprises an ensemble of decision trees to enhance
prediction accuracy by averaging the outputs of multiple decision trees.
Neural Network (NN) Regressor: Implements a feedforward neural network with
four layers: an input layer with the number of features as neurons, followed by three
hidden layers having 2048, 256, and 64 neurons respectively, and an output layer with
one neuron for prediction.
25
Input shape:

 The input shape is determined by the number of features in the training data. It’s specified as (X
train.shape[1], ), which indicates the number of columns or features in the input data.

Model compilation:

 The model is compiled using the ’adam’ optimizer and the loss function set to ’mean squared error’.

Training:

 The model is trained using the fit method, where it’s trained on X train and y train data.
 The training is performed for 100 epochs with a batch size of 32.
 The verbose parameter set to 0 implies that no output will be printed during training.
 Validation data (X test, y test) is used to validate the model’s performance after each epoch.

26
Predictions:

After training, the model is used to make predictions on the test data (X test), and the predictions are
stored in nn predictions.
 XGBoost Regressor: Deploys an XGBoost-based ensemble model using gradient boosting that
sequentially builds multiple decision trees to predict the target variable.
 Support Vector Machine (SVM) Regressor: Uses a support vector machine algorithm to find the
hyperplane that best separates the data points.
 Extra Trees (ET) Regressor: Functions as an ensemble model using extremely randomized trees,
which are an extension of Random Forests.
Voting Regressor: Creates an ensemble by combining the predictions from multiple base estimators
(LR, DT, RF) and generates a final prediction based on the aggregated results.
Stacking Regressor: Combines predictions from multiple base estimators (LR, DT, RF, ET, GB) using
a meta-estimator (LR) to produce final predictions.

27
Performance Metrics for Models:

 Root Mean Square Error: Measure of the average magnitude of the errors between
predicted and actual values.
 Mean Absolute Error: Average absolute differences between predicted and actual values
in the dataset.
 Mean Squared Error: Average squared differences between predicted and actual values
in the dataset.
 Coefficient of Determination (R²): Quantifies the proportion of variance in the target
variable that is predictable from the independent variable.

28
TOOLS AND EXPERIMENT
SETUP
Google Colaboratory platform

Python Libraries:
Tools:
• Python (Pytorch for machine learning )
• Matlab library for data visualization ( to plot graphs)
RESULTS AND DISCUSSION
Filtering the Input factor from expert opinion
High numeric values for aspect ”Yes” are taken into consideration and the high
value of aspect ”No” is eliminated.
Additional factors were also given by Experts.
After eliminating and adding some factors, a new questionnaire was made.
The Building Attributes that are finally considered for further processing are listed.
 The attributes listed includes name of the project, location of the building, type of building,
construction completion year, site/geographic conditions, access to the site, site area, type of
foundation, plinth area, floor area, floor height, number of floors, number of columns,
number of rooms, number of bathrooms, number of kitchens, number of lifts/elevator,
number of basements, use of building code, type o window, type of door, type of flooring
works, external painting, internal finishing, HVAC work, sanitary works, electrical works,
landscaping, and road works/river training works
30
31
Scatter Plots
Scatter plots can reveal trends or patterns in the data.
For example, if the points form an upward or downward slope, it indicates a positive or
negative linear trend between the variables.
Outliers, which are data points that significantly deviate from the main cluster of points,
are easily identified in scatter plots.
This can help in identifying complex patterns and interactions in the data.
Scatter plots can also reveal non-linear relationships between variables.
If the points form a curve or some other non-linear shape, it suggests a non-linear
relationship.

32
33
Normal Distribution Graph
A normal distribution graph describes how data is distributed when many independent,
random factors contribute to an outcome.
Deviations from the normal distribution can indicate outliers or anomalies in data.
Detecting outliers is crucial in data analysis.
This distribution appears with a tail stretching towards the right side of the curve. It
indicates that the data has a longer right tail compared to the left side.
In such cases, the mean tends to be larger than the median, and most data points cluster
towards the left side.
A tail on the right side of a distribution indicates a right-skewed pattern, and outliers
situated near this tail represent unusually high values that might have a significant impact
on statistical measures and require thorough examination during data analysis.
35
Data Preprocessing Results

The data had missing values and preprocessing must be done

Calculated mean, median, mode, and variance for individual numeric features.
Calculated variance for data.
Replaced missing values in numerical features with the mean, median, and mode.
Considering the variance as a measure to decide between using mean or median for imputation, the
results after imputation with the mean have shown a lower variance compared to imputation with the
median.
Lower variance signifies less dispersion of data points from the mean value, implying that the dataset
tends to be more tightly clustered around the mean.
 Comparing the variance values across different attributes between the raw data, mean imputed data, and
median-imputed data, mean imputation tends to preserve the original variability of the dataset better than
median imputation
The decision is also justified by the table.

36
37
38
39
Histograms for categorical features to find whether values are unique or in some order.

Figure 5.7.1: Histogram for Categorical Features

Regarding replacing missing values in categorical features with the mode (most frequent
value), it is a common approach and often a reasonable strategy, especially when dealing
with categorical data.
Imputing missing categorical values with the mode ensures the overall distribution of the
categories and minimizes the potential impact of missing data on data analysis.
42
 Replacing the values in the Total final cost of the project including VAT with their
natural logarithms as shown in Figure 5.10.
 Taking the logarithm can normalize the distribution or reduce the impact of extreme
values making the data more suitable for data analysis.

43
Calculating the correlation matrix for the Data Frame:
 The correlation matrix shows how each numerical column in the Data Frame is related to
every other numerical column by calculating Pearson correlation coefficients.
 The Pearson correlation coefficient ranges from -1 (perfect negative correlation) to 1
(perfect positive correlation), with 0 indicating no linear correlation.
 Positive values indicate a positive correlation, while negative values indicate a
negative correlation.
 It is useful for feature selection or understanding the data’s pattern.

44
Dropping Plinth Area(sqm) and Number of Bathrooms:
These two features are being removed because they exhibit a
high correlation with other variables (’Floor Area(sqm)’ and
’Number of Rooms’ respectively) beyond a predefined
threshold of 0.70.

Due to their strong correlation with other variables, it’s

assumed that they might not provide additional significant
information for the analysis or modeling and could potentially
lead to multi collinearity issues.

Dropping Construction Year: This feature is being dropped

because its correlation with the target variable (’Total final
cost of the project including VAT’) is lower than a specified
threshold of 0.70, specifically correlating to 0.091.

A correlation below this threshold suggests a weak linear

relationship between ’Construction Year’ and the target
variable, which might not significantly contribute to
explaining the variability in the target.
45
 Cleaning the column: Cleaning the column Location of
Building. Since the dataset is
limited to certain places only.
 Data is categorized whether all the data is either inside of
Kathmandu or Outside of Kathmandu.
 Since the Location of the building is cleaned into the
inside valley and the Type of foundation is cleaned into
individual foundations we can drop these two columns.
 The feature was encoded using one-hot encoding as
shown in Figure 5.13.
 One hot encoding is that categorical variables have been
transformed into a numerical format.
 Dropping all the variables that are either highly correlated
with each other or are less correlated with the target variable
which is the Total final Cost of the project including VAT.

46
47
48
RESULTS OF MODELS
IMPLEMENTATION
 Regarding the comparison of the models, the
Decision Tree, Random Forest, ExtraTree, Voting,
and Stacking models exhibit relatively better
performance in terms of MSE, MAE, RMSE, and
R2.

 Among these, the Decision Tree, ExtraTree, and

Voting models demonstrate particularly strong
performance across multiple metrics.

 The Decision Tree or ExtraTree model is

considered the best choice based on the provided
metrics, as they seem to have lower errors and
higher R2 values compared to other models.

49
50
51
52
53
54
55
56
CONCLUSION
 Regarding the datasets, the buildings that were used were 12 Educational Building, 3
Commercial Building, 6 Hospital Building, 18 Residential Building, 13 Public Building, 17
Official Building, and 3 Hotel Building having 0 to 2 basements ranging above 1 crore.
 The input features were taken from the literature review, and validated by expert opinion.
 After pilot testing, a survey questionnaire was distributed among contractors and
consultants.
 Data preprocessing helps to clean data.
 Missing values are substituted by mean for numeric values and by mode for the categorical
values.
 By analyzing the correlation heat map the unwanted features are dropped.

57
 The final dataset is divided into train and test sets in the ratio of 80:20.
The nine models were implemented and the Mean absolute error, Mean square error, and
R square value are recorded for evaluation.
 The Decision Tree, Random Forest, ExtraTree, Voting, and Stacking models exhibit
relatively better performance in terms of MSE, MAE, RMSE, and R2.
 Among these, the Decision Tree, ExtraTree, and Voting models demonstrate
particularly strong performance across multiple metrics.
 The Decision Tree or ExtraTree model is considered the best choice based on the
provided metrics, as they seem to have lower errors and higher R2 values compared to
other models.

58
REFERENCES
Akalya, K. R. (2018). Minimizing the cost of construction materials through optimization techniques. . IOSR Journal of
Engineering.

Atapattu, C. N. (2022, November). Statistical cost modelling for preliminary stage cost estimation of infrastructure
projects. Earth and Environmental Science, 1101(5), 052031.

Badawy, M. (2020). A hybrid approach for a cost estimate of residential buildings in Egypt at the early stage. Asian
Journal of Civil Engineering, 21(5), 763-774.

Badra, I. B. (n.d.). Conceptual Cost Estimate of Buildings Using Regression Analysis In Egypt. 17(5).

Beltman, J. F. (2021). Predicting construction costs in the program phase of the construction process: a machine
learning approach. Bachelor's thesis, University of Twente.

(more in report...)
60
61
THANK YOU!

Semiconductor Optoelectronic Devices - Bhattacharya, Pallab - 1997
100% (3)
Semiconductor Optoelectronic Devices - Bhattacharya, Pallab - 1997
650 pages
Software Mining Repository Practical
No ratings yet
Software Mining Repository Practical
28 pages
Final Thesis
No ratings yet
Final Thesis
70 pages
PyGAD-2 15 1
No ratings yet
PyGAD-2 15 1
203 pages
End-to-End Machine Learning Project (Bootcamp)
No ratings yet
End-to-End Machine Learning Project (Bootcamp)
415 pages
Booth Competition Mechanics
No ratings yet
Booth Competition Mechanics
2 pages
Unit 3 Slides - Getting Started With Neural Networks
No ratings yet
Unit 3 Slides - Getting Started With Neural Networks
70 pages
Neural Network Complete Notes
No ratings yet
Neural Network Complete Notes
46 pages
BR100 - AppSetup - GL - 1.0
No ratings yet
BR100 - AppSetup - GL - 1.0
22 pages
Density & Grid Based Clustering
100% (1)
Density & Grid Based Clustering
21 pages
INT354 Syllabus
No ratings yet
INT354 Syllabus
2 pages
Java Course File
No ratings yet
Java Course File
306 pages
Chapter 4 Random Variables
No ratings yet
Chapter 4 Random Variables
180 pages
Project Report Utility
100% (2)
Project Report Utility
29 pages
Lac Matrix ESP
100% (1)
Lac Matrix ESP
4 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Case Study 219302405
No ratings yet
Case Study 219302405
14 pages
Accounting Flexfield
No ratings yet
Accounting Flexfield
9 pages
Chapter 3 FLC
No ratings yet
Chapter 3 FLC
110 pages
Plant Seedling Classification
0% (1)
Plant Seedling Classification
12 pages
Business Cases and Benefits Management
100% (2)
Business Cases and Benefits Management
66 pages
15 Types of Regression in Data Science PDF
No ratings yet
15 Types of Regression in Data Science PDF
42 pages
Term Project - CE 443 Pavement Design
No ratings yet
Term Project - CE 443 Pavement Design
3 pages
Biomass Gasifier
No ratings yet
Biomass Gasifier
19 pages
Hospital Recommender System
No ratings yet
Hospital Recommender System
6 pages
Development Journalism
No ratings yet
Development Journalism
7 pages
Agricola Rulebook
No ratings yet
Agricola Rulebook
12 pages
Development Journalism CIA3
No ratings yet
Development Journalism CIA3
14 pages
Provisional BSPL 2017-18 PDF
No ratings yet
Provisional BSPL 2017-18 PDF
239 pages
Problem Set 2 S2017
No ratings yet
Problem Set 2 S2017
2 pages
Technical Report Writing in CS 2016
No ratings yet
Technical Report Writing in CS 2016
9 pages
SEPM Module 4
No ratings yet
SEPM Module 4
26 pages
Sky X of T Club Membership Database Template
No ratings yet
Sky X of T Club Membership Database Template
62 pages
Evaluating A Development-Oriented Community Newspaper: A Content Analysis of The Los Baños Times
100% (1)
Evaluating A Development-Oriented Community Newspaper: A Content Analysis of The Los Baños Times
14 pages
Widodo Yohanes - Citizen Journalism and Media Pluralism in Indonesia
No ratings yet
Widodo Yohanes - Citizen Journalism and Media Pluralism in Indonesia
19 pages
Transportation Engineering 05 Ce 63xx
No ratings yet
Transportation Engineering 05 Ce 63xx
55 pages
Transportation Engineering: Name: Muhammad Jamal
No ratings yet
Transportation Engineering: Name: Muhammad Jamal
3 pages
Hydro Culv 21
No ratings yet
Hydro Culv 21
10 pages
Nitin Sharma Python Report
No ratings yet
Nitin Sharma Python Report
68 pages
Project Proposal 260 Copy
No ratings yet
Project Proposal 260 Copy
38 pages
Research Proposal Nepal Journalism
No ratings yet
Research Proposal Nepal Journalism
15 pages
Computational Data Science: Advanced Programme in
No ratings yet
Computational Data Science: Advanced Programme in
15 pages
AllPossibleQuestionBank (Desc&Obj) ..R22 OOPSJAVA 2324SEM1..14112023
No ratings yet
AllPossibleQuestionBank (Desc&Obj) ..R22 OOPSJAVA 2324SEM1..14112023
18 pages
XG AC DC Annual Maint Schedule
No ratings yet
XG AC DC Annual Maint Schedule
28 pages
Transportation Engineering
100% (1)
Transportation Engineering
20 pages
HRE Prelim Finals
No ratings yet
HRE Prelim Finals
163 pages
CS2055 - Software Quality Assurance
No ratings yet
CS2055 - Software Quality Assurance
15 pages
Development Journalism New Dimension by Chalkley - 1980
No ratings yet
Development Journalism New Dimension by Chalkley - 1980
4 pages
Chapter 6 Earned Value Management
No ratings yet
Chapter 6 Earned Value Management
18 pages
Introduction To Traffic Engineering: Varun Singh
100% (1)
Introduction To Traffic Engineering: Varun Singh
52 pages
Tendernotice 5 PDF
No ratings yet
Tendernotice 5 PDF
148 pages
Lec#1 Highway Location, Surveys Plans
No ratings yet
Lec#1 Highway Location, Surveys Plans
60 pages
Media and The Extractive Sector - Improving Our Coverage
No ratings yet
Media and The Extractive Sector - Improving Our Coverage
24 pages
Braid Burn Hydrology Fieldtrip: Welcome To The Channel Hydrology Spreadsheet
No ratings yet
Braid Burn Hydrology Fieldtrip: Welcome To The Channel Hydrology Spreadsheet
15 pages
The Problem of Overfitting - Coursera
No ratings yet
The Problem of Overfitting - Coursera
1 page
Project Work
No ratings yet
Project Work
21 pages
Lecture 1 - Software Evolution Process
No ratings yet
Lecture 1 - Software Evolution Process
40 pages
HOW TO WRITE FYP PROPOSAL - 20192020 Portal
No ratings yet
HOW TO WRITE FYP PROPOSAL - 20192020 Portal
40 pages
Demo
No ratings yet
Demo
9 pages
East West Institute of Technology: An Improved Approach For Fire Detection Using Deep Learning Models
No ratings yet
East West Institute of Technology: An Improved Approach For Fire Detection Using Deep Learning Models
21 pages
Lecture 6 - State Space Search - Uninformed Search
No ratings yet
Lecture 6 - State Space Search - Uninformed Search
43 pages
Introduction To Sociolinguistics: Chapter One: Overview: What Is Sociolinguistics? What Do Sociolinguists Study?
100% (1)
Introduction To Sociolinguistics: Chapter One: Overview: What Is Sociolinguistics? What Do Sociolinguists Study?
18 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
New Look Street Lighting
No ratings yet
New Look Street Lighting
37 pages
Tvl11-He-Cookery Q1 M4 W4
No ratings yet
Tvl11-He-Cookery Q1 M4 W4
15 pages
Fill Data: Qno O1 O2 O3 O4 Ans1 Ans2 Ans3 Ans4
No ratings yet
Fill Data: Qno O1 O2 O3 O4 Ans1 Ans2 Ans3 Ans4
10 pages
Development Journalism
No ratings yet
Development Journalism
2 pages
Format Project Report
No ratings yet
Format Project Report
12 pages
Sop
No ratings yet
Sop
2 pages
BBM Maths Notes - Bhartiyar University
100% (1)
BBM Maths Notes - Bhartiyar University
171 pages
Epa's Aermet Guide
No ratings yet
Epa's Aermet Guide
310 pages
R G Bronze Mfg. Company PVT Limited RGB
No ratings yet
R G Bronze Mfg. Company PVT Limited RGB
2 pages
Classification of Sounds - Vowels
No ratings yet
Classification of Sounds - Vowels
11 pages
CTA Tehnic
No ratings yet
CTA Tehnic
5 pages
IV - Model Paper 2024
No ratings yet
IV - Model Paper 2024
8 pages
SPCCPDF
No ratings yet
SPCCPDF
83 pages
Focus4 2E Grammar Quiz Unit2 5 GroupB
No ratings yet
Focus4 2E Grammar Quiz Unit2 5 GroupB
1 page
Zhang 2020 J. Phys. Conf. Ser. 1449 012001
No ratings yet
Zhang 2020 J. Phys. Conf. Ser. 1449 012001
6 pages
Charges REPCA
No ratings yet
Charges REPCA
3 pages
Quran Thesis Statement
100% (3)
Quran Thesis Statement
5 pages
Xi SPL Computer SC Sample Paper
No ratings yet
Xi SPL Computer SC Sample Paper
12 pages
CBSE Class10 Error Correction Worksheet 2025
No ratings yet
CBSE Class10 Error Correction Worksheet 2025
2 pages
SE - Lighting LED Aluminum Profiles Catalogue 2023
No ratings yet
SE - Lighting LED Aluminum Profiles Catalogue 2023
27 pages
DLL - Tle 6 - Q4 - W8
No ratings yet
DLL - Tle 6 - Q4 - W8
4 pages
Introduction To Real-Time Control Solution
No ratings yet
Introduction To Real-Time Control Solution
15 pages
645-1 Agua
No ratings yet
645-1 Agua
4 pages
VADUE Goes To Peru
No ratings yet
VADUE Goes To Peru
19 pages
ME Assignment 2
No ratings yet
ME Assignment 2
3 pages
Software Making
No ratings yet
Software Making
3 pages
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Presentation - Final Thesis

Uploaded by

Presentation - Final Thesis

Uploaded by

A COMPARATIVE STUDY OF MACHINE LEARNING

ALGORITHMS FOR EARLY COST ESTIMATION OF

 Cost is seen as a standard indicator of the resources used

 Quantity Rate Analysis is the primary conventional

 Traditional cost estimation methods might not fully account for

 Accurate cost estimation can lead to potential

the preliminary costs of Building construction projects specifically in Nepal.

More details in report..

From the literature reviews the input factors were gathered.

The following criteria are taken for expertise:

 more than 12 years of experience in this construction field.

 must have a relevant educational background.

Building projects’ structural data were gathered from various construction

 Data were collected from the Department of Urban Development and

 Read the Excel file into a Data Frame (df).

 Counted the number of numerical features.

 Normal distribution graph was plotted for all metrics.

 Counted the number of missing categorical features.

 There are no missing categorical data now.

The data had missing values and preprocessing must be done

Figure 5.7.1: Histogram for Categorical Features

Due to their strong correlation with other variables, it’s

Dropping Construction Year: This feature is being dropped

A correlation below this threshold suggests a weak linear

 Among these, the Decision Tree, ExtraTree, and

 The Decision Tree or ExtraTree model is

You might also like