DataAnalyticsforCivilEngineers_Module1
DataAnalyticsforCivilEngineers_Module1
net/publication/377415936
CITATIONS READS
0 1,079
1 author:
Jayaram M.A
RASTA - Center for Road Technology VOLVO Construction Equipment Campus
190 PUBLICATIONS 1,485 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jayaram M.A on 16 January 2024.
Course Title
Dr.M.A.Jayaram
Professor
RASTA
Module 1
Data and Knowledge
Assessment of Knowledge
Statistics, Descriptive and
Inferential
Definition of Data Analytics
Data Analytic Process
Models: KDD, SEMMA,
CRISP-DM
Methods
Tasks
Tools
RASTA
Module 1: Introduction
Dr.M.A.JAYARAM
Professor
RASRA-Center for Road Technology
VOLVO-CE Campus, Peenya Industrial Area
Bengaluru, India
INTRODUCTION
Data analytics involves the process of converting raw data into actionable insights. It
includes a range of tools, technologies, and processes used to find trends and solve
problems by using data. Data analytics can shape business processes, improve decision-
making, and foster business growth.
Motivation : Availability of huge data, sensors, Internet, search engines, Public
domain data, archivals etc
We are drowning in information, but starving for knowledge.
As a consequence, a new research area has been developed, which has become known under
the name of data mining. The goal of this area was to meet the challenge to develop tools that
can help humans find potentially useful patterns in their data and solve the problems they are
facing by making better use of their data.
Tar density in this temperature is higher than bitumen and is in the range 1.15
to 1.4 g/cm³.
The asphalt density is 2.2 to 2.4 g/cm³. The higher the density of the asphalt,
the more durable it is.
In a smart road system, data can be obtained from various sources, such as a large number
of sensors, smart cards, satellite systems, cameras, social networks
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
iv. Comprehensibility (simplicity, clarity, parsimony)
• Crushing test is carried out to assess the strength of coarse aggregate when a
compressive load is gradually applied.
•Aggregate crushing value= W2/W1 x 100, where W1= Total weight of the dry
sample, W2 = weight of material passing through 2.36 mm IS sieve.
•Should not exceed 35% for CC pavements and 45% for wearing surfaces.
In the domain of science & and technology, the focus is on correctness, generality, and
simplicity (parsimony) are in the focus: one way of characterizing science & and
technology is to say that it is the search for a minimal correct description of the world.
Nevertheless, neither of the two areas can afford to neglect the other criteria.
Statistics has a long history and originated from collecting and analyzing
data about the population and the state.
Statistics
Descriptive Statistics
Inferential Statistics
The main tools to tackle this task are the computation of characteristic measures,
tabular and graphical representations.
It helps in making generalizations about the population by using various analytical tests
and tools.
The study of the statistical significance of the predictor variables (e.g., “age group,”
“manufacturer,” “season,” and “color”) on the luminous intensities of traffic signals
Data Mining: is concerned with finding patterns in huge data bases. For this certain data mining
algorithms / tools are to be applied. With this any kind of desired knowledge could be squeezed out of
a given database automatically with no or only little human interference.
Knowledge discovery in data bases (KDD) : As every project is unique, we need specific approaches.
KDD is a process of identifying valid, novel, potentially useful, and ultimately understandable patterns
in data.
Data Analytics : Coined by David Hand (1997). Data analytics is the collection, transformation, and
organization of data in order to draw conclusions, make predictions, and drive informed decision
making. RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
Data analytics is a multidisciplinary field that employs a wide range of analysis techniques,
including math, statistics, and computer science, to draw insights from data sets. Data
analytics is a broad term that includes everything from simply analyzing data to theorizing
ways of collecting data and creating the frameworks needed to store it.
Data analytics is important across many industries, as many business leaders use data to
make informed decisions. An expert in smart health monitoring of structures might look
at sensor data to determine which components of a structure could be continued with
simple repairs and which to be dismantled. An environmental engineer may look at
ambient atmospheric data garnered through devises to determine the degree of
pollution and hence to design or device the possible remedial methods.
Ex. Health care analytics, Business analytics, Inventory Analytics, Material Behavior
Analytics, Visual Data analytics, Structural health care data anaalytics, pavement
deterioration data analytics etc……
RASTA, Highway Technology, Elective Course ,Data
Analytics for Civil Engineers_22CHT335
Data analytics requires a wide range of skills to be performed effectively:
Structured Query Language (SQL): a programming language commonly used for databases
Statistical programming languages: such as R and Python, commonly used to create advanced
data analysis programs
Machine learning: a branch of artificial intelligence that involves using algorithms to spot data
patterns, and making computers/devices to learn on the given data.
Probability and statistics: in order to better analyze and interpret data trends
Data management: or the practices around collecting, organizing and storing data
Data visualization: or the ability to use charts and graphs to tell a story with data
The models are centered in the attempt to formulate a general framework for data analysis
Stage 1 : Selection – This stage consists on creating a target data set, or focusing on a subset of
variables or data samples, on which discovery is to be performed.
Sample Key Pots Rutting Raveling Alligator PCI
Cracks
HTB_20 2 5.0 1 2.0 50
HTB_21 3 6.1 3 3.5 65
HTB_22 1 12.4 4 4.6 40
HTB_23 0 10.2 2 7.2 48
HTB_24 5 4.1 5 3.2 70
----- ----- ----- ----- ----- -----
----- ----- -----Technology, Elective
RASTA, Highway ----- -----for
Course ,Data Analytics -----
Civil Engineers_22CHT335
Stage 2 : Pre processing – This stage consists on the target data cleaning and pre processing in
order to obtain consistent data
Stage 3 : Transformation – This stage consists on the transformation of the data using
dimensionality reduction or transformation methods.
Stage 4 : Data Mining – This stage consists on the searching for patterns of interest in a
particular representational form, depending on the data mining objective (usually, prediction)
Stage 5: Interpretation/Evaluation – This stage consists on the interpretation and evaluation of
the mined patterns.
Stage 1: Sample – This stage consists on sampling the data by extracting a portion of a large
data set big enough to contain the significant information, yet small enough to manipulate
quickly. This stage is pointed out as being optional.
Stage 2: Explore – This stage consists on the exploration of the data by searching for
unanticipated trends and anomalies in order to gain understanding and ideas.
Stage 3: Modify – This stage consists on the modification of the data by creating, selecting,
and transforming the variables to focus the model selection process.
The focus is to identify, collect, and analyze the data sets that can help you accomplish the
project goals. This phase also has four tasks:
Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis tool.
Describe data: Examine the data and document its surface properties like data format, number
of records, or field identities.
Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships among
the data.
Verify data quality: How clean/dirty is the data? Document any quality issues.
If data is not understood properly, then it is needed to redo/ revisit domain understanding
again
This phase, which is often referred to as “data munching”, prepares the final data set(s) for
modeling. It has five tasks:
Select data: Determine which data sets will be used and document reasons for
inclusion/exclusion.
Clean data: Often this is the lengthiest task. Without it, it will be garbage-in, garbage-out. A
common practice during this task is to correct, impute, or remove erroneous values.
Construct data: Derive new attributes that will be helpful. For example, angular ratio, PI etc..
Integrate data: Create new data sets by combining data from multiple sources.
Format data: Re-format data as necessary. For example, you might convert string values that
store numbers to numeric values so that you can perform mathematical operations.
In this stage development and assessment of various models based on several different
modeling techniques. This phase has four tasks:
Select modeling techniques: Determine which algorithms to try (e.g. regression, neural net).
Generate test design: Pending your modeling approach, you might need to split the data into
training, test, and validation sets.
Build model: Developed model will be available
Assess model: Generally, multiple models are competing against each other, and the data
scientist needs to interpret the model results based on domain knowledge, the pre-defined
success criteria, and the test design.
The Evaluation phase looks more broadly at which model best meets the business and what
to do next. This phase has three tasks:
Evaluate results: Do the models meet the business success criteria? Which one(s) should we
approve for the business?
Review process: Review the work accomplished. Was anything overlooked? Were all steps
properly executed? Summarize findings and correct anything if needed.
Determine next steps: Based on the previous three tasks, determine whether to proceed to
deployment, iterate further, or initiate new projects.
If model fails or proves to be erroneous or does not work for new data, processes are
to be repeated from domain understanding
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Stage 6 : Deployment
Plan deployment: Develop and document a plan for deploying the model.
Plan monitoring and maintenance: Develop a thorough monitoring and maintenance plan to
avoid issues during the operational phase (or post-project phase) of a model.
Produce final report: The project team documents a summary of the project which might
include a final presentation of data mining results.
Review project: Conduct a project retrospective about what went well, what could have been
better, and how to improve in the future.
To avoid the effort of inventing a completely new solution for each problem, it is helpful to
think of different problem categories and consider them as building blocks from which a
solution may be composed.
Methods
Classification
Regression
Clustering/Segmentation
Association
RASTA, Highway Technology, Elective Course ,Data Analytics for
Deviation Analysis Civil Engineers_22CHT335
Classification
Predict the outcome of an experiment with a finite number of possible results
Predict a class label for an entity/sample/material/structure
Examples : Yes/NO , Class1/Class2/Class3, Distressed / Not Distressed/ Severely
distressed
Palatable/Not-palatable/Neutral, Durable/ High durable/ Moderately durable.
High/low/moderate ---- (Angularity, workability, strength)
We may be interested in a prediction because the true result will emerge in the future
or because it is expensive, difficult, or cumbersome to determine it.
How much may be the cost to build certain infrastructure, 5 years from now?
What will be the degree of distress/ deterioration of a structure 5 years from now?
What will be the optimum bitumen content, given other parameter values?
What is the maximum duration of a project given the values of other parameters?
Instead of examining a large number of similar data, we need to inspect the group
summary only.
We may also obtain some insight into the structure of the whole data set. Cases that do
not belong to any group may be considered abnormal or outliers.
Do the data about viscosity, shear strength, flowability of bitumen divide into different
groups?
Can the fly ashes be grouped into 3, based on the chemical, physical and morphological
features?
Group specific properties of materials.
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Association Analysis
Find any correlations or associations to better understand or describe the
interdependencies of all the attributes
The focus is on relationships between all attributes rather than focusing on a single
target variable or the cases
What are the two attributes that will go together in increasing the strength of
Concrete/Bituminous mix/ asphalt?
What are the significant attributes that contribute to good quality of material xx?
Under which circumstances does the material/ component of a structure/ pavement behave
differently?
Which properties do those materials share who do not follow the pattern ?
The most frequent categories in transportation data analytics are classification and
regression, because decision making always becomes much easier if reliable predictions of
the near future are available.
When a completely new area or domain (like ITS, IoT ,Smart City ) is explored, cluster analysis
and association analysis may help to identify relationships among attributes or instances.
Once the major relationships are understood (e.g., by a domain expert), a deviation analysis
can help to focus on exceptional situations that deviate from regularity.
RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Major Tasks
Pattern Finding
If the domain (and therefore the data) is new to us or if we expect to find interesting
relationships, we explore the data for new, previously unknown patterns.
For this , we may apply methods from, for instance, segmentation, clustering,
association analysis, or deviation analysis.
Techniques available Examples of patterns :
Crack length , width, numbers over a unit area
Statistical Techniques Variation in rutting over a stretch of highway
Prediction of serviceability aspects of pavement
Structural Techniques Clustering of highway distress, traffic flow intensities based
on related attributes
Template Matching
Neural Network Approach
Fuzzy Model
Hybrid Models RASTA, Highway Technology, Elective Course ,Data Analytics for
Civil Engineers_22CHT335
Finding Explanations
We have a special interest in some target variable and wonder why and how it varies from case
to case
The primary goal is to gain new insights (knowledge) that may influence our decision
making, but we do not necessarily intend automation.
We may apply methods from, classification, regression, association analysis, or deviation
analysis.
Examples:
The variation of vehicular registration
The variation of Cement Production
Variation of Strength of concrete/ rigid pavement in relation to several influencing
parameters.