0% found this document useful (0 votes)
39 views10 pages

Report IRA Group

This document describes a study that developed probabilistic models to predict when track geometry defects tagged as yellow will deteriorate into defects tagged as red, violating safety standards. Survival analysis was used to analyze defect data and model the probability of defects transitioning between states over time. Specifically, Weibull hazard models were developed with the scale parameter a function of explanatory defect variables, allowing the models to account for factors impacting deterioration. The developed models can predict whether yellow tag defects will deteriorate within a given time interval based on the defect-specific survival functions.

Uploaded by

RaghavendraS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views10 pages

Report IRA Group

This document describes a study that developed probabilistic models to predict when track geometry defects tagged as yellow will deteriorate into defects tagged as red, violating safety standards. Survival analysis was used to analyze defect data and model the probability of defects transitioning between states over time. Specifically, Weibull hazard models were developed with the scale parameter a function of explanatory defect variables, allowing the models to account for factors impacting deterioration. The developed models can predict whether yellow tag defects will deteriorate within a given time interval based on the defect-specific survival functions.

Uploaded by

RaghavendraS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2015 RAS Problem Solving Competition Report

Track Geometry Analytics

Negin Alemazkoor
Graduate Research Assistant
Department of Civil and Environmental Engineering
University of Illinois at Urbana-Champaign
Newmark Lab, 205 N. Mathews Ave., Urbana, IL 61801

Conrad Ruppert, Jr.


Sr. Research Engineer / Associate Director for Research
Rail Transportation and Engineering Center - RailTEC
Department of Civil and Environmental Engineering
University of Illinois at Urbana-Champaign
Newmark Lab, 205 N. Mathews Ave., Urbana, IL 61801

Hadi Meidani
Assistant Professor
Department of Civil and Environmental Engineering
University of Illinois at Urbana-Champaign
1211 Newmark Lab, 205 N. Mathews Ave., Urbana, IL 61801
Abstract
Track geometry defects could have a critical implication on the safety of the rail transportation. In order to

make optimal maintenance decisions, to have safe and effective railroads, it is necessary to analyze the track

geometry defects and develop reliable defect deterioration models. This report is aimed to support our answer

to the 2015 RAS competition's problem on predicting evaluation yellow tag geometric defects into conditions

violating the Federal Railroad Administration (FRA) track safety standards. In general, infrastructure

deterioration process is unpredictable since many factors that impact the deterioration process can not be

captured by available data; hence probabilistic deterioration models that account for the stochastic nature

of deterioration are superior to the deterministic curve-fitting models, and are the subject of our study. The

training dataset provided by 2015 RAS competition's organizing committee are used to develop probabilistic

models for the lifetime of yellow tag defects. The developed models are used to predict whether the yellow

tag defects in the test data will deteriorate into red tag state in a given time interval or not.

Introduction
Like any other infrastructure systems, railroads need maintenance. As railroads age, maintenance actions

become essential to insure safe rides on the rails and minimize the probability of derailments. There are two

types of maintenance actions: (a) preventive maintenance actions and (b) corrective maintenance actions.

Individual defects with amplitude greater than the threshold specified by FRA track safety standards (red tag

defects) must be treated by corrective maintenance action as determined by the FRA track safety standards.

Yellow tag defects are those defects with amplitudes that exceed a particular railroad's standards but are

below FRA standards. Understanding how yellow tag defects deteriorate into the red tag defects allows a

better planning for preventive maintenance action, thereby reducing maintenance cost.

Different approaches can be used to analyze the geometric defects, one of which is survival analysis. Analyzing

the waiting time until a specific well-defined event happens is called survival analysis. Survival analysis has

many applications in different fields, such as biomedical, engineering, and economics, where the system,

equipment or organ of interest has two states: a functioning state and a failed state (1, 2). Survival analysis

aims to find, for a component, the probability distribution of its lifetime (the time spent in functioning state

before failing) and also its survival function (the probability that failure has occurred by a given time).

Analyzing the track geometric defects can be done using survival analysis as geometric defects have two

states (yellow and red) and we seek to determine when a yellow tag becomes a red tag. The time a defect

spends in yellow state can be considered as lifetime of the defect and turning into the red tag can be the

failure event. The training dataset is used to find the survival functions of yellow tag defects. The survival

functions are then used to find the probability of failure for the yellow defects in the test dataset.

1
Methodology
Let's denote the yellow state by Y , and the red state by R. The sequence of observed state condition for each

defect has the form Y, Y, Y, ..., R, ..., R. When a failure happens (a defect exceeds the FRA safety limits), i.e.,

a defect goes from the yellow state to the red state, the defect will remain in red condition until a corrective

maintenance action is done. The time that defect stays in yellow state depends on a number of explanatory

variables, such as the defect's amplitude, length, track code and etc. Our goal is to describe this dependence

mathematically. We treat lifetime, T , as a continuous random variable with probability density function f (t)

and cumulative distribution function F (t) = P (T < t). F (t) gives the probability that failure has happened

by time t. The probability that a failure has not happened by time t or the survival probability is given by

1 − F (t). Therefore, we define the survival function to be the complement of the cumulative distribution

function:

Z t Z ∞
0 0
S(t) = P (T > t) = 1 − f (t )dt = f (t0 )dt0 (1)
0 t

Using the concept of conditional probability, we can calculate the probability that failure happens between

time t and t + ∆t given that the failure has not happened by time t, denoted by h(t, ∆t):

F (t + ∆t) − F (t)
h(t, ∆t) = P (t < T < t + ∆t|T > t) = (2)
S(t)

The average rate of failure per unit of time can be obtained by dividing h(t, ∆t) by ∆t , and the instantaneous

rate of failure or hazard rate, denoted by z(t), can then be obtained by choosing ∆t to be very small.

P (t < T < t + ∆t|T > t) F (t + ∆t) − F (t) f (t) −d ln S(t)


z(t) = lim = lim = = (3)
∆t→0 ∆t ∆t→0 ∆tS(t) S(t) dt

The hazard rate function gives fundamental information about the nature of the event that is modeled.

When a hazard rate monotonically increases over time, it means that the probability of failure increases with

increasing the time spent without failure. The converse is also true.

Weibull hazard function

The simplest hazard function is a constant hazard function, which means that the probability of failure

is constant over time. A constant hazard function implies that lifetime has an exponential density func-

tion:
−d ln S(t)
z(t) = λ = (4)
dt
Z ∞
S(t) = exp(−λt) = f (t0 )dt0 (5)
t

f (t) = λ exp(−λt) (6)

2
Exponential distribution has only one parameter, the scale parameter. Weibull distribution is a general-

ized form of exponential distribution and has an additional parameter,the shape parameter, which makes

the Weibull distribution extremely flexible and applicable for modeling lifetime and hazard occurrence (3,

4). The probability density function, survival function and hazard function of Weibull distribution are as

following:
p
f (t) = λp ptp−1 exp(−(λt) ) (7)

p
S(t) = exp(−(λt) ) (8)

z(t) = λp ptp−1 (9)

Where p and λ are the shape and the scale parameter, respectively. When p = 1, the Weibull distribution

takes the form of simple exponential distribution and hazard rate takes the constant value of λ. If p > 1,

then the hazard function, z(t), is monotonically increasing. Hazard function is monotonically decreasing if

0 < p < 1. Figure 1 shows Weibull probability distribution for λ = 1 and different shape parameters, which

provides an understanding of Weibull distribution's flexibility (4).

Figure 1: Weibull probability densities for λ = 1 and different shape parameters

We want the track geometric deterioration models to be dependent on the explanatory variables. However,

the Weibull hazard function does not account for the impact of the explanatory variables. To overcome this

limitation, an extension of Weibull distribution is used, in which the scale parameter λ is itself a function of

exogenous variables (5, 6):

λ = exp(−βX) (10)

z(t) = exp(−pβX)ptp−1 (11)

p
S(t) = exp(−(exp(−βX)t) ) (12)

Where X is the column vector of explanatory variables and β is the row vector of the coefficients and should

be estimated.

3
Parameter Estimation

As discussed, Weibull distribution is widely used to model the lifetime of components. In this work, we

use Weibull distribution to probabilistically model the time a defect spends in yellow state before exceeding

safety limits. We use maximum likelihood method to estimate the parameters p and β in Equations 11

and 12. For two consecutive reports of a repeated defect, we assume that the lifetime of the yellow defect

starts from the first report. This assumption is reasonable as we include the amplitude of the defect as an

explanatory variable. In other words, we want to know the lifetime of a defect after its amplitude reached

a certain value (defect’s amplitude in the first report). If the defect did not turn into red tag defect by the

second report, we consider it as a right-censored record, implying that we do not know the lifetime of the

defect, i.e., we only know that the defect survived until the second report. Even for un-censored records,

when the second inspection indicates red tag, we do not know the exact failure time. In this work, we assume

that the failure happened midway between the two inspections.

In case of having m records, where n of them are uncensored with failure time ti , and m − n of them are

censored with survival time ti , the likelihood function can be written as following:

n
Y m
Y
L= f (β, p, Xi , ti ) S(β, p, Xi , ti ) (13)
i=1 i=n+1

n
X m
X
{p∗ , β ∗ } = argmax ln f (β, p, Xi , ti ) ln S(β, p, Xi , ti ) (14)
p,β i=1 i=n+1

Since Equation 14 includes non-linear function of p and β, iterative methods are usually used to find the

values of p∗ and β ∗ that maximizes the log-likelihood (7, 8). Statistical packages such as R can be used to

find the parameters p∗ and β ∗ .

Data Analyzing and Results


Data Processing

In this work, we used the statistical software R to find the parameters in Equations 11 and 12. In order to do

this, the input dataset should have the format as in Table 1. The first column is the time, the second column

indicates whether the failure has occurred in the time duration specified in the first column, i.e., one means

the event has occurred and zero means it has not occurred. The other columns include the explanatory

variables, which can be binary, real or integer.

Table 1: Input Data Format for Survival Analysis in R

Time Event Explanatory Variable 1 Explanatory Variable 2 Explanatory Variable 3

10 0 1 2.05 7

30 1 0 3.04 8

4
We need to generate a table with the format shown in Table 1 for each type of defect (surface, DIP and

crosslevel) using the training dataset. The data processing steps to generate these tables are described below:

Step 1: Dividing the training dataset to three datasets by type of defects

There are three different types of defects in the dataset, which may have very different deterioration process.

Therefore, we developed a survival model for each type of defect.

Step 2: Multiple inspections within a day

In training dataset, there are several defects that are reported multiple times within a day. The average

reported amplitude and lengths for those defects are calculated. Then, all the multiple reports of a defect

within a day are replaced by a single report including the average values.

Step 3: Repeated defects

The next step is identifying the repeated defects. By definition, a defect from the same type find within 100

feet in either side of a previous defect is considered as a repeated defect. We used this definition to find the

repeated defects and sort them by inspection date, i.e., from the earliest inspection to the latest inspection.

Step 4: Identifying maintenance actions

Although we do not have any information about when and where maintenance actions were performed, we are

provided information about inspection runs and we use that to make reasonable guesses about maintenance

actions. If between two consecutive times that a defect was reported, there was an inspection run with

no reported defect, we assume a maintenance action was performed and consequently we do not consider

the defects to be repeated defects. Additionally, if the absolute amplitude of the defect reduced from one

inspection to the next inspection, or if the sign of amplitude changed, we assume a maintenance action

was performed between the two inspections. We understand that it might be a measurement error and not

necessarily maintenance was done; however, we are unable to distinguish measurement errors from the cases

when maintenance actions were performed. Therefore, we prefer to be conservative and not consider the

cases where the amplitude of the defect reduced or the sign of amplitude changed to be repeated defects.

Step 5: Making the input table for survival analysis in R

Once we have all the repeated defects sorted from the first to the last inspections, we can make a table in

the form of Table 1. The first column is the time between two consecutive inspections reporting a repeated

defect if the second inspection indicates yellow tag; otherwise, it is half of the time between two inspections,

i.e., we assume that the failure happened midway between the two inspections. The second column is zero if

second inspection indicates yellow tag; and is one if the second inspection indicates red tag, meaning that the

failure has occurred. We then include the initial absolute values of the defect's amplitude and length, track

code, class of track, operating speeds, and tonnage as explanatory variables in the table. The track code is a

binary variable. It is either tangent or curve (dataset does not include the spiral case); we assign zero to the

tangent code and one to the curve code. To indicate the class of the track we define binary variables class5,

5
class4 and class3, whose values are one when the track belongs to classes 5, 4 or 3, respectively. When all

three variables are zero, it means that the track belongs to class 2 as the dataset does not include tracks

belong to class 1.

We are interested in including the both traffic and tonnage as explanatory variables in the model; however,

we suspect that traffic data suffers from substantial errors as majority of data suggest very small or very

large number of cars to number of trains ratio, i.e., smaller than 10 or larger than 150. Therefore, we discard

the traffic data and only include the average tonnage load across the section that defect is located as an

explanatory variable.

Results

DIP defects

The processed training data set for DIP defects includes 540 records, of which 363 records are censored.

Table 2 shows the estimated parameters for DIP defects lifetime model. As already discussed, Trackcode is

a binary variable, i.e., zero and one denote tangent and curve codes, respectively. The processed training

data set for DIP defects only includes tracks from classes 5, 4 and 3. Therefore, two binary variables, class5

and class4, are included in the model. When both of these variables are zero, it indicates the class of track

is 3.

Table 2: Parameter Estimations for DIP defects lifetime model (including all variables)

Variable Coefficient z-score

Constant 22.69 9.57

Length 0.02 1.04

Amplitude -8.57 -7.48

TrackCode -0.12 -0.49

Class5 -4.69 -3.17

Class4 -2.84 -2.83

OperatingSpeed-freight -0.02 -0.62

OperatingSpeed-passenger -0.002 -0.75

Average-Monthly-Tonnage -0.04 -0.77

1/p 0.85

It can be seen in Table 2 that the estimated value of p, shape parameter of Weibull distribution, is greater

than one. It means that the hazard rate is monotonically increasing and probability of failure increases with

the time spent without failure. The coefficients' signs are intuitively correct, i.e., an increase in variables

with negative coefficient would lead to increasing the likelihood of a shorter lifetime.

6
We want to investigate whether the coefficients are significantly different from zero or not. The z-score is a

measure of statistical significance which indicates whether or not to reject the null hypothesis of corresponding

coefficient being zero. A z-score higher than 1.96 or lower than -1.96 indicates that there is less than 5%

chance that null hypothesis will be falsely rejected. As it can be seen in Table 2, only four parameters are

significant at 5% level. By removing the insignificant variables from the model, we can get the parameter

estimations shown in Table 3.

Table 3: Parameter Estimations for DIP defects lifetime model

Variable Coefficient z-score

Constant 21.13 10.20

Amplitude -8.08 -8.08

Class5 -5.31 -6.61

Class4 -3.10 -2.83

1/p 0.86

We designed a test to find out how discarding the insignificant variables can impact the model prediction

quality. We used half of the training data as learning subset and the other half as validation subset. We

used the learning subset to estimate the parameters of survival function. Then we predicted the failure for

the validation subset based on estimated parameters. The estimated parameters are used in Equation 15 to

find the failure probability:

p
P (failure) = 1 − S(t) = 1 − exp(−(exp(−βX)t) ) (15)

If the failure probability is greater than or equal to 0.5, we predict a failure, otherwise, we predict yellow tag

defect will not turn into red tag defect. The model including all variables makes 71.5% correct predictions

and the model including only significant variables makes 71.1% correct predictions. Since including all

the variables does not improve the predictions substantially, we keep the model concise and discard the

insignificant variables.

Crosslevel defects

The processed training data set for crosslevel defects has 994 records, including 540 uncensored and 454

censored records. Table 4 shows the estimated parameters for crosslevel defects lifetime model. Only the

parameters that are significant at 0.05 level are included in the model. The data set for crosslevel defects

includes very few tracks from classes 3 or 2. Since there is no crosslevel defect from class 2 or 3 in the test

data, we deleted the data with class 2 or 3 from training dataset to get a more accurate model for class 5 and

7
4. Therefore, only one binary variable, class5, is included in the model. Zero value for class5 variable means

the class of track is 4. The estimated value of p is greater than one, indicating monotonically increasing

hazard rates. The coefficients' signs are also intuitively correct.

Table 4: Parameter Estimations for crosslevel defect lifetime model

Variable Coefficient z-score

Constant 22.57 12.00

Amplitude -11.77 -10.49

Class5 -1.72 -4.32

OperatingSpeed-freight -0.08 -3.52

1/p 0.86

Surface defects

The processed training data set for surface defects includes 960 records, of which 275 records are uncensored.

Table 5 shows the estimated parameters for surface defects' lifetime model. Amplitude is the only significant

parameter at 0.05 level. The data set for surface defects includes mostly tracks from class 5 and very few

tracks from class 4. Since all surface defects are from class 5 in the test dataset, we deleted the data with

class 4 from training dataset to get a more accurate model for class 5. The value of p is estimated to be

1.25. This indicates that the hazard rate is monotonically increasing, which is intuitive.

Table 5: Parameter Estimations for surface defect lifetime model

Variable Coefficient z-score

Constant 14.52 16.55

Amplitude -18.70 -10.49

1/p 0.80

Test data predictions

Parameters estimated in Tables 3, 4 and 5 are used in Equation 10 to find the probability of turning into

red tag within the given interval for the yellow tag defects in the test dataset. If the failure probability is

greater than or equal to 0.5, we predict that failure will occur. Conversely, if it is less than 0.5, we predict

that failure will not happen and yellow tag defect will not turn into red tag defect.

Several defects in the test dataset have already exceeded the FRA safety limits and are considered as red

tag defects. These defects will remain in red tag state unless they undergo corrective maintenance actions.

We assumed that no maintenance action is performed during the given intervals for test data, thus those red

8
tag defects will remain in the red tag state by the end of the interval.

Conclusion
This report is aimed to describe our approach to answer the 2015 RAS competition's problem on predicting

yellow tag geometric defects deterioration into red tag defects. Survival analysis is used to probabilistically

model the time the defects spend in yellow state before turning into red state. The developed models are

used to predict whether the yellow tag defects in the test data will deteriorate into red tag state in a given

time interval or not. An excel file including the predictions and also the MATLAB code used to process the

data along with this report is sent to competition organizing committee.

Refrences
1. van Noortwijk, J. M., & Frangopol, D. M. (2004). Deterioration and maintenance models for insuring

safety of civil infrastructures at lowest life-cycle cost. Life-Cycle Performance of Deteriorating Structures:

Assessment, Design and Management. ASCE, Reston, Virginia, 384-391.

2. Cleves, M. (2008). An introduction to survival analysis using Stata. Stata Press.

3. Pinder III, J. E., Wiener, J. G., & Smith, M. H. (1978). The Weibull distribution: a new method of

summarizing survivorship data. Ecology, 175-179.

4. Nelson, W. B. (2005). Applied life data analysis (Vol. 577). John Wiley & Sons.

5. Kleinbaum, D. G., & Klein, M. (2012). Parametric survival models. In Survival analysis (pp. 289-361).

Springer New York.

6. Mishalani, R. G., & Madanat, S. M. (2002). Computation of infrastructure transition probabilities using

stochastic duration models. Journal of Infrastructure systems, 8(4), 139-148.

7. Mauch, M.,& Madanat, S. (2001). Semiparametric hazard rate models of reinforced concrete bridge deck

deterioration. Journal of Infrastructure Systems.

8. Aitkin, M., & Clayton, D. (1980). The fitting of exponential, Weibull and extreme value distributions to

complex censored survival data using GLIM. Applied Statistics, 156-163.

You might also like