0% found this document useful (0 votes)
0 views

DataAnalyticsforCivilEngineers_Module1

data analytics

Uploaded by

faisaloffice2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

DataAnalyticsforCivilEngineers_Module1

data analytics

Uploaded by

faisaloffice2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/377415936

DATA ANALYTICS FOR CIVIL ENGINEERS_ Module 1

Presentation · January 2024


DOI: 10.13140/RG.2.2.24118.88644

CITATIONS READS

0 1,079

1 author:

Jayaram M.A
RASTA - Center for Road Technology VOLVO Construction Equipment Campus
190 PUBLICATIONS 1,485 CITATIONS

SEE PROFILE

All content following this page was uploaded by Jayaram M.A on 16 January 2024.

The user has requested enhancement of the downloaded file.


RASTA
Center for Road Technology
VOLVO-CE Campus
Peenya Industry, Bengaluru, India

Course Title

DATA ANALYTICS FOR


CIVIL ENGINEERS
Course Contributor

Dr.M.A.Jayaram
Professor
RASTA
Module 1
Data and Knowledge
Assessment of Knowledge
Statistics, Descriptive and
Inferential
Definition of Data Analytics
Data Analytic Process
Models: KDD, SEMMA,
CRISP-DM
Methods
Tasks
Tools
RASTA

Data Analytics for Engineers


Course Code : 22CHT335

Module 1: Introduction
Dr.M.A.JAYARAM
Professor
RASRA-Center for Road Technology
VOLVO-CE Campus, Peenya Industrial Area
Bengaluru, India
INTRODUCTION
Data analytics involves the process of converting raw data into actionable insights. It
includes a range of tools, technologies, and processes used to find trends and solve
problems by using data. Data analytics can shape business processes, improve decision-
making, and foster business growth.
Motivation : Availability of huge data, sensors, Internet, search engines, Public
domain data, archivals etc
We are drowning in information, but starving for knowledge.

As a consequence, a new research area has been developed, which has become known under
the name of data mining. The goal of this area was to meet the challenge to develop tools that
can help humans find potentially useful patterns in their data and solve the problems they are
facing by making better use of their data.

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
INTRODUCTION: DATA Vs KNOWLEDGE
DATA
The quantities, characters, or symbols on which operations are performed
by a computer, which may be stored and transmitted in the form of electrical
signals and recorded on magnetic, optical, or mechanical recording media.
Examples:
Specific Gravity of Cement(Gc) : 3 -3.15
Specific Gravity of Bitumen(Gb): 0.95 – 1.10
Strength of concrete(Sc): 30- 40 MPa
Traffic volume on a particular road is Tv: 200 Vehicles/Hour
Plasticity Index of soil(PI): 15%
Grade of Concrete CG: M40
The absolute viscosity of Bitumen is VG20 is: Vi: 1600 – 2200 Poise at 60 degree centigrade
Rutting in a particular road stretch: Rd : 9 -14 mm.
Optimum Bitumen Content :Obc: 5.4%
CBR = 19%
Grade of Steel : Fe415
RASTA, Highway Technology, Elective Course ,Data
Etc…. Analytics for Civil Engineers_22CHT335
In summary : DATA
• Refer to single instances
(single object, people, events, points in time, materials, etc.)

• Describe individual properties


( Comp strength, OMC, Coeff. Thermal expansion, stiffness, viscosity, tenacity)

• Are often available in large amounts


(databases, archives, sensors, measuring instruments, continuous observations..)
• Are often easy to collect or obtain
(e.g., Public databases, publications, Lab experiments)
• Utility is limited to particular domain of application
• Do not allow us to make predictions or forecasts

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
In Summary, the Knowledge:
• Refers to classes of instances
(sets of objects, people, events, points in time, etc.)

• Describe general patterns, structures, laws, principles, etc.

• Consists of as few statements as possible


(this is an explicit goal)

• Is often difficult and time-consuming to find or to obtain


(e.g., natural laws, education, experience, observation, lab tests etc…..)

• Allows us to make predictions and forecasts

• Much more valuable than (raw) data.

• With its generality and the possibility to make


RASTA, Highway Technology, predictions
Elective Course ,Data about the properties of new
cases are the main reasons for this superiority
Analytics for Civil Engineers_22CHT335
KNOWLEDGE
Awareness or familiarity gained through experience of a fact or situation, education,
Knowledge as justified true belief (JTB). This definition identifies three essential features:
it is : (i) a belief that is (ii) true and (iii) justified. Truth is a widely accepted feature of
knowledge.

Knowledge is expressed through statements ( verbally or written)


Examples:
• With specific gravity, we can calculate the bearing capacity of soil.
• The specific gravity of straight run and cut-back bitumen is essential for purposes of calculating rates of spread, asphaltic
concrete mix properties etc.
• Keeping other parameters constant lower W/C ratio ( 0.2-0.4) corresponds to a strength range of ( 40 – 60 Mpa).
• Lower the W/C ratio higher the strength.
• As per IS 1206, the Absolute viscosity of VG 30 paving grade bitumen is 2400-3600 poises (60•C) and kinematic viscosity is
350 mm² /Sec (CSt)
• The maximum dry unit weight and the optimum moisture content of the sand are 18.698 kN/m3 and 10.271%,
respectively.
• The bitumen has two origins. It can be produced in the petroleum refinery or it can be extracted from the mines which is
gilsonite.
RASTA, Highway Technology, Elective Course ,Data
• Tar can be obtained through the destructive distillation of organic materials such as coal, wood, peat or petroleum
Analytics for Civil Engineers_22CHT335
Assessment of the Knowledge
• Not all kinds of knowledge are equally valuable as any other. Not all
general statements are equally important, equally substantial, equally
significant, or equally useful.

• Therefore, Knowledge is to assessed.

• Assessment will ensure the relevance of the knowledge and eliminate


irrelevant knowledge.

Criteria for assessment


i.Correctness: Through probability, success rate, factual observation, and field
verification.
RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
Examples: Assessed by conducting experiments

The density of bitumen is in the range 1.01-1.05 g/cm³ in the temperature


range of 25°C to 85°C.

Tar density in this temperature is higher than bitumen and is in the range 1.15
to 1.4 g/cm³.

The asphalt density is 2.2 to 2.4 g/cm³. The higher the density of the asphalt,
the more durable it is.

The pavement condition index (PCI) is a numerical index between 0 and


100, which is used to indicate the general condition of a pavement section.
85-100: Good, 70-85: Satisfactory, < 40: not passable ( through
observation of various kinds of Technology,
RASTA, Highway distress) Elective Course ,Data Analytics for
Civil Engineers_22CHT335
ii.Generality (domain and conditions of validity)
One of the downfalls of installing a rigid pavement is that the
installation is very costly, but the cost of maintenance is
reasonable
Flexible pavements are applied in layers. The weakest materials are laid at
the very bottom whereas the more durable materials are laid at the very
top to ensure the structural integrity and adaptability of the entire
structure.
iii.Usefulness (relevance, predictive power)
Fiber-optic cables embedded in the road detect wear and tear, and
communication between vehicles and roads can improve traffic management.

In a smart road system, data can be obtained from various sources, such as a large number
of sensors, smart cards, satellite systems, cameras, social networks
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
iv. Comprehensibility (simplicity, clarity, parsimony)
• Crushing test is carried out to assess the strength of coarse aggregate when a
compressive load is gradually applied.
•Aggregate crushing value= W2/W1 x 100, where W1= Total weight of the dry
sample, W2 = weight of material passing through 2.36 mm IS sieve.

•Should not exceed 35% for CC pavements and 45% for wearing surfaces.

V. Novelty (previously unknown, unexpected)


The solar pavement has become one of the most researched new highway transportation
infrastructures with a goal to transform the road system from the energy consumer to the
energy provider, and eliminate or alleviate pollution from the source of energy of
transforming
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Among the data emerging in the field of ITS, visual data are among the most
voluminous kind. Computer Vision studies enable the analysis of both images
and videos and provide detailed information about the traffic situation.

In the domain of science & and technology, the focus is on correctness, generality, and
simplicity (parsimony) are in the focus: one way of characterizing science & and
technology is to say that it is the search for a minimal correct description of the world.

In economy and construction industry, however, the emphasis is placed on usefulness,


comprehensibility, and novelty: the main goal is to gain a competitive edge and thus to
increase revenues.

Nevertheless, neither of the two areas can afford to neglect the other criteria.

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
Data Analytics??
Data analytics converts raw data into actionable insights. It includes a range of tools,
technologies, and processes used to find trends and solve problems by using data. Data
analytics can shape business processes, improve decision-making, and foster business
growth.
By analyzing factors such as weather conditions, traffic patterns, and regulatory
requirements, project managers can make more accurate project schedules and resource
allocations.
Transportation data analytics offer highly granular datasets suitable for complex modelling,
including route information, trip speed, length, and duration, travel mode (e.g. driving
vs. cycling), O-D patterns, and more.
There are two approaches to maintaining roads—either address the damage that has already occurred thus
being reactive, or try and prevent the damage from occurring in the first place as a proactive measure.
Advances in analytics, cloud-based mobile technologies, and the Internet of Things (IoT) are making it possible
for authorities to adopt the latter approach. It not only helps them avoid accidents caused by poor road
conditions but also creates more cost savings that can be diverted toward road maintenance.
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Data Analytics is thought of as a kind of statistics.

Statistics has a long history and originated from collecting and analyzing
data about the population and the state.

Statistics
Descriptive Statistics

Inferential Statistics

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
Descriptive Statistics
Describe states and processes based on observed data.

The main tools to tackle this task are the computation of characteristic measures,
tabular and graphical representations.

Characteristic measures : central tendency, dispersion measures, and


distribution measures
Tabular representations: Correlation matrix, Covariance matrix etc….

Graphical representations: Histogram, Pie Chart, box plots, mosaic charts,


scatter diagrams, star plots, Spider plots, parallel coordinates etc…..
RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
Commuters satisfaction
Evolution of over public
transportation tarnsporttransport

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Inferential Statistics
Inferential statistics is a branch of statistics that makes the use of various analytical tools
to draw inferences about the population data from sample data.

It helps in making generalizations about the population by using various analytical tests
and tools.

Inferential statistics can be


classified into hypothesis testing
and regression analysis.
Hypothesis testing also includes
the use of confidence intervals to
test the parameters of a
population.

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
Examples

The study of the statistical significance of the predictor variables (e.g., “age group,”
“manufacturer,” “season,” and “color”) on the luminous intensities of traffic signals

The study on the pattern of crashes involving young drivers.

Travel Time reliability in urban areas/ metropolitan cities

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
Other methods
Exploratory Data Analysis : is concerned with generating hypotheses from the collected data. There are
no or at least considerably weaker model assumptions about the data generating process in exploratory
data analysis. They are mostly universal methods designed to achieve a certain goal but are not based
on a rigorous model as in inferential statistics.

Data Mining: is concerned with finding patterns in huge data bases. For this certain data mining
algorithms / tools are to be applied. With this any kind of desired knowledge could be squeezed out of
a given database automatically with no or only little human interference.

Knowledge discovery in data bases (KDD) : As every project is unique, we need specific approaches.
KDD is a process of identifying valid, novel, potentially useful, and ultimately understandable patterns
in data.

Data Analytics : Coined by David Hand (1997). Data analytics is the collection, transformation, and
organization of data in order to draw conclusions, make predictions, and drive informed decision
making. RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
Data analytics is a multidisciplinary field that employs a wide range of analysis techniques,
including math, statistics, and computer science, to draw insights from data sets. Data
analytics is a broad term that includes everything from simply analyzing data to theorizing
ways of collecting data and creating the frameworks needed to store it.

Data analytics is important across many industries, as many business leaders use data to
make informed decisions. An expert in smart health monitoring of structures might look
at sensor data to determine which components of a structure could be continued with
simple repairs and which to be dismantled. An environmental engineer may look at
ambient atmospheric data garnered through devises to determine the degree of
pollution and hence to design or device the possible remedial methods.

Ex. Health care analytics, Business analytics, Inventory Analytics, Material Behavior
Analytics, Visual Data analytics, Structural health care data anaalytics, pavement
deterioration data analytics etc……
RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
Data analytics requires a wide range of skills to be performed effectively:
Structured Query Language (SQL): a programming language commonly used for databases

Statistical programming languages: such as R and Python, commonly used to create advanced
data analysis programs

Machine learning: a branch of artificial intelligence that involves using algorithms to spot data
patterns, and making computers/devices to learn on the given data.

Probability and statistics: in order to better analyze and interpret data trends

Data management: or the practices around collecting, organizing and storing data

Data visualization: or the ability to use charts and graphs to tell a story with data

Econometrics: or the ability to RASTA,


useHighway
dataTechnology,
trends to create mathematical models that forecast
Elective Course ,Data Analytics for
future trends. Civil Engineers_22CHT335
DATA ANALYTICS PROCESS MODELS
Process models are meant to establishment of standards in analytic processes, methods,
and tasks both by academics, researchers and by people in the industry.

The models are centered in the attempt to formulate a general framework for data analysis

Three models are popular


Knowledge Discovery from Data Bases ( KDD)

Sample, Explore, Modify, Model, Assess (SEMMA)

CRoss Industry Standard Process for Data Mining (CRISP-DM)

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
The KDD Process Model
The KDD, as presented in (1996) is the process of using data mining methods to extract
what is deemed knowledge , using a database along with any required preprocessing,
sub sampling, and transformation of the database

Stage 1 : Selection – This stage consists on creating a target data set, or focusing on a subset of
variables or data samples, on which discovery is to be performed.
Sample Key Pots Rutting Raveling Alligator PCI
Cracks
HTB_20 2 5.0 1 2.0 50
HTB_21 3 6.1 3 3.5 65
HTB_22 1 12.4 4 4.6 40
HTB_23 0 10.2 2 7.2 48
HTB_24 5 4.1 5 3.2 70
----- ----- ----- ----- ----- -----
----- ----- -----Technology, Elective
RASTA, Highway ----- -----for
Course ,Data Analytics -----
Civil Engineers_22CHT335
Stage 2 : Pre processing – This stage consists on the target data cleaning and pre processing in
order to obtain consistent data
Stage 3 : Transformation – This stage consists on the transformation of the data using
dimensionality reduction or transformation methods.
Stage 4 : Data Mining – This stage consists on the searching for patterns of interest in a
particular representational form, depending on the data mining objective (usually, prediction)
Stage 5: Interpretation/Evaluation – This stage consists on the interpretation and evaluation of
the mined patterns.

KDD process must be preceded by the development of an understanding of the application


domain, the relevant prior knowledge and the goals of the end-user. It also must be continued
by the knowledge consolidation by incorporating this knowledge into the system

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
KDD FLOW DIAGRAM

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
SEMMA PROCESS
Developed and standardized by the SAS Institute considers a cycle with 5 stages for the
process. SAS Institute is an American multinational developer of analytics software based in
Cary, North Carolina. SAS develops and markets a suite of analytics software, which helps
access, manage, analyze and report on data to aid in decision-making.

Stage 1: Sample – This stage consists on sampling the data by extracting a portion of a large
data set big enough to contain the significant information, yet small enough to manipulate
quickly. This stage is pointed out as being optional.

Stage 2: Explore – This stage consists on the exploration of the data by searching for
unanticipated trends and anomalies in order to gain understanding and ideas.

Stage 3: Modify – This stage consists on the modification of the data by creating, selecting,
and transforming the variables to focus the model selection process.

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
Stage 4: Model – This stage consists on modeling the data by allowing
the software to search automatically for a combination of data that
reliably predicts a desired outcome.

Stage 5: Assess – This stage consists on assessing the data by


evaluating the usefulness and reliability of the findings from the data
mining process and estimate how well it performs. Here model
performance is evaluated against the test data

SEMMA offers an easy to understand process, allowing an organized


and adequate development and maintenance of data mining
projects. It thus confers a structure for his conception, creation and
evolution, helping to present solutions
RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
CRISP-DM Model
The CRISP-DM process was developed by the means of the effort of a consortium initially
composed with Daimler Chrysler, SPSS (Statistical Package for the Social Sciences-IBM) and
NCR. CRISP-DM stands for CRoss-Industry Standard Process for Data Mining. It consists on a
cycle that comprises six stages
Stage 1. Domain / Business Understanding
The Domain understanding phase focuses on understanding the objectives and requirements of the project. The
three other tasks in this phase are foundational project management activities that are universal to most projects:
Determine objectives: You should first “thoroughly understand, from a business perspective, what the project
engineer really wants to accomplish.” and then define project success criteria.
Assess situation: Determine resources availability, project requirements, assess risks and contingencies, and
conduct a cost-benefit analysis.
Determine data mining goals: In addition to defining the business objectives, you should also define what success
looks like from a technical data mining perspective.
Outcome of this stage : - project plan: Select technologies and tools and define detailed plans for each project
phase. While many teams hurry through this phase, establishing a strong domain understanding is like building the
foundation of a structure. RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
CRISP-DM
Process
Model
Flow
Diagram

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Stage 2. Data Understanding

The focus is to identify, collect, and analyze the data sets that can help you accomplish the
project goals. This phase also has four tasks:

Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis tool.
Describe data: Examine the data and document its surface properties like data format, number
of records, or field identities.
Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships among
the data.
Verify data quality: How clean/dirty is the data? Document any quality issues.

If data is not understood properly, then it is needed to redo/ revisit domain understanding
again

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
Stage 3. Data Preparation

A common observation is that 80% of the project is data preparation.

This phase, which is often referred to as “data munching”, prepares the final data set(s) for
modeling. It has five tasks:
Select data: Determine which data sets will be used and document reasons for
inclusion/exclusion.
Clean data: Often this is the lengthiest task. Without it, it will be garbage-in, garbage-out. A
common practice during this task is to correct, impute, or remove erroneous values.
Construct data: Derive new attributes that will be helpful. For example, angular ratio, PI etc..
Integrate data: Create new data sets by combining data from multiple sources.
Format data: Re-format data as necessary. For example, you might convert string values that
store numbers to numeric values so that you can perform mathematical operations.

RASTA, Highway Technology, Elective Course ,Data


Analytics for Civil Engineers_22CHT335
Stage 4 : Modeling

In this stage development and assessment of various models based on several different
modeling techniques. This phase has four tasks:
Select modeling techniques: Determine which algorithms to try (e.g. regression, neural net).
Generate test design: Pending your modeling approach, you might need to split the data into
training, test, and validation sets.
Build model: Developed model will be available
Assess model: Generally, multiple models are competing against each other, and the data
scientist needs to interpret the model results based on domain knowledge, the pre-defined
success criteria, and the test design.

If model is not acceptable, data preparation is to be revisited/ redone.

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Stage 5 : Evaluation

The Evaluation phase looks more broadly at which model best meets the business and what
to do next. This phase has three tasks:

Evaluate results: Do the models meet the business success criteria? Which one(s) should we
approve for the business?

Review process: Review the work accomplished. Was anything overlooked? Were all steps
properly executed? Summarize findings and correct anything if needed.

Determine next steps: Based on the previous three tasks, determine whether to proceed to
deployment, iterate further, or initiate new projects.

If model fails or proves to be erroneous or does not work for new data, processes are
to be repeated from domain understanding
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Stage 6 : Deployment

Depending on the requirements, the deployment phase can be as simple as generating a


report or as complex as implementing a software system suiting the enterprise/ project

Plan deployment: Develop and document a plan for deploying the model.

Plan monitoring and maintenance: Develop a thorough monitoring and maintenance plan to
avoid issues during the operational phase (or post-project phase) of a model.

Produce final report: The project team documents a summary of the project which might
include a final presentation of data mining results.

Review project: Conduct a project retrospective about what went well, what could have been
better, and how to improve in the future.

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Comparison of KDD, SEMMA and CRISP-DM

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Methods, Tasks, and Tools
Every data analysis problem is different.

To avoid the effort of inventing a completely new solution for each problem, it is helpful to
think of different problem categories and consider them as building blocks from which a
solution may be composed.

Methods
Classification
Regression
Clustering/Segmentation

Association
RASTA, Highway Technology, Elective Course ,Data Analytics for
Deviation Analysis Civil Engineers_22CHT335
Classification
Predict the outcome of an experiment with a finite number of possible results
Predict a class label for an entity/sample/material/structure
Examples : Yes/NO , Class1/Class2/Class3, Distressed / Not Distressed/ Severely
distressed
Palatable/Not-palatable/Neutral, Durable/ High durable/ Moderately durable.
High/low/moderate ---- (Angularity, workability, strength)

We may be interested in a prediction because the true result will emerge in the future
or because it is expensive, difficult, or cumbersome to determine it.

Is this material worthy of consideration/selection?


Is the distress level of this stretch of highway acceptable?
Is the material/component has adequate service life?
Does the material possess required property ?( angularity, workability, tensile strength etc….)
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Regression
Regression is, just like classification, also a prediction task, but here the value of interest
is numerical in nature. Regression equations may be developed.

How will the strength of concrete develop?

How much may be the cost to build certain infrastructure, 5 years from now?

What will be the degree of distress/ deterioration of a structure 5 years from now?

What will be the optimum bitumen content, given other parameter values?

What is the maximum duration of a project given the values of other parameters?

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Clustering / Segmentation
Summarize the data to get a better overview by forming groups of similar cases (called
clusters or segments).

Instead of examining a large number of similar data, we need to inspect the group
summary only.

We may also obtain some insight into the structure of the whole data set. Cases that do
not belong to any group may be considered abnormal or outliers.
Do the data about viscosity, shear strength, flowability of bitumen divide into different
groups?
Can the fly ashes be grouped into 3, based on the chemical, physical and morphological
features?
Group specific properties of materials.
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Association Analysis
Find any correlations or associations to better understand or describe the
interdependencies of all the attributes

The focus is on relationships between all attributes rather than focusing on a single
target variable or the cases

What are the two attributes that will go together in increasing the strength of
Concrete/Bituminous mix/ asphalt?

How superplasticizer and chemical admixture influence the properties of SCC?

What are the significant attributes that contribute to good quality of material xx?

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Deviation Analysis
Knowing already the major trends /structures in a material/ process/ cost, , find any
exceptional subgroup that behaves differently concerning to some target attribute.

Under which circumstances does the material/ component of a structure/ pavement behave
differently?

Which properties do those materials share who do not follow the pattern ?
The most frequent categories in transportation data analytics are classification and
regression, because decision making always becomes much easier if reliable predictions of
the near future are available.
When a completely new area or domain (like ITS, IoT ,Smart City ) is explored, cluster analysis
and association analysis may help to identify relationships among attributes or instances.
Once the major relationships are understood (e.g., by a domain expert), a deviation analysis
can help to focus on exceptional situations that deviate from regularity.
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Major Tasks
Pattern Finding
If the domain (and therefore the data) is new to us or if we expect to find interesting
relationships, we explore the data for new, previously unknown patterns.
For this , we may apply methods from, for instance, segmentation, clustering,
association analysis, or deviation analysis.
Techniques available Examples of patterns :
Crack length , width, numbers over a unit area
Statistical Techniques Variation in rutting over a stretch of highway
Prediction of serviceability aspects of pavement
Structural Techniques Clustering of highway distress, traffic flow intensities based
on related attributes
Template Matching
Neural Network Approach
Fuzzy Model
Hybrid Models RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Finding Explanations
We have a special interest in some target variable and wonder why and how it varies from case
to case
The primary goal is to gain new insights (knowledge) that may influence our decision
making, but we do not necessarily intend automation.
We may apply methods from, classification, regression, association analysis, or deviation
analysis.

Examples:
The variation of vehicular registration
The variation of Cement Production
Variation of Strength of concrete/ rigid pavement in relation to several influencing
parameters.

RASTA, Highway Technology, Elective Course ,Data Analytics for


Civil Engineers_22CHT335
Finding Predictors
We have a special interest in the prediction of some target variable, but it (possibly)
represents only one building block of our full problem, so we do not really care about the
how and why but are just interested in the best-possible prediction.
We may apply methods from, classification or regression.
Prediction of 28 day strength of concrete, using huge data.

Prediction of optimal bitumen content in bituminous mixes.

Prediction of infrastructure (Road, Industrial , Railway, Metro) expenditure in India for


next 5 years.

Available open source tools:


R-studio ,Weka, Neuro Intelligence, Tableau, Rapid Miner, Knime, Pentaho
Data Analysis with Open Source Tools
by Philipp K. Janert, 2010, O'Reilly Media, Inc.
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
View publication stats

You might also like