Deep Learning For DIC
Deep Learning For DIC
Correlation
2
Sma-RT'y SAS, Aubie.re, Fmnce
Keywords: Convolntional Nenrnl Network, Deep learning, GPU, Digital image Correlation, Error
Quantification, Photornechanics, Speckles
1. I ntroductio n
Digital Image Correlation (DIC) is a full-field displacement and strain measurement technique which
has rapidly spr ead in the experimental mechanics comumnity. The main reason is that this technique
achieves a good compromise between versatility, case of use, and mctrological performance [1]. Many
recent papers illustrate the mm of DIC in various Hituations [2] . Some others discm,s bow to characterize
or improve its mctrological performance, as [3] written within the framework of the DIC Challenge [4] .
Drn,pite all ifa, advantages, DIC suffers from some drawbacks. For instance, DIC is by essence an
iterative procedure, which a utomatically leads to mobilizing significant computational resources. Con-
sequently, its use is fanited when den.se (i.e. pixclwiso-dcfined) displacement or strain distributions
arc to be mcasmcd. Another drawback is the fact that DIC acts as a low-pass filter, which causes
the .retrieved diHplaccmcnt and strain fields to be blurn:d. Indeed, it is shown in [5, 6] that the di'i-
placement field rendered by a DIC system is not the actua l one, but regm·dlcsH of noise, the actual one
convolved by a Savitzky-Golay filter. This makcH DIC to be unable to correctly render displacement
or strain ficldH featuring high spatial frequencies. Overcoming these limitations seems however difficult
without introducing a paradigm Hhift in image processing itHclf. For in.stance, the minimization of the
optical residual t hat DIC p erforms iteratively in the spatial domain can be switched to the Fourier
domain [7]. T he benefit is to considerably reduce the computing time and to allow the use of optimal
patterns i.n terms of camera sensor noise propagation [8, !), 10]. The drawback is t hat such patterns
arc periodic, and depositing them on specimens remains challenging as long as no simple transferring
technique is commercially available. In the prcHcnt study, we propose to usc r andom speckle patterns
as in DIC and to invcHtigatc to what extent a Convolntional Neural Network (CNN) can be used to
retrieve clisplac:cmcnt and strain fields from a pair of reference and deformed imagcH of a speckle pat-
tern. To the best of the authors' knowledge, this route has never been explored so far to retrieve dense
sn bpixcl displacement fields. However , CNNs have been widely used in computer vision in the recent
past, but mainly in order to perform image dassification or recognition [11], or to eHtimatc rigid- body
clisph1cemcnts of solidH [12]. The problem addreHsed here is somewhat diffcn:nt because deformation
occurH, and bcCalL'ie the resulting cliHplaccmcnt is generally much lower in amplitude than in the afor e-
mentioned cases of rigid-body movements. Con.scqucntly wc mainly focus here on the estimation of
snbpixcl displacements.
1
The present paper is organized as follows. The basics of CNNs and deep lea.ruing arc first briefly
given in Section 2. We examine in Section 3 how a problem similar to ours, namely optical flow
determination, has been tackled with CNNs in the literature. CNNs must be trained, and since no
dataset suited to the problem at hand is available in the literature, we explain in Section 4 how a. first
dataset containing speckle images has been generated. Then we t est in Section 5 how four pn.'-cxisting
networks iuc fine-t uned with thi:, data.'lct, and examine whet her these networks a.re able to determine
displacement ficlcb from different speckle images artificiaUy deformed. The network giving the best
rrnmlts is then selected and improved in several ways to give displacement ficlcL'> of better quality.
The different i.mpr ovcmcnts and t h e corresponding results a.r e given in Se ction 6 . Pina.Uy, we use the
network resulting from ail the suggested improvements, named her e "Str ain.Net", to process some pairs
of speckle images from t he DIC Challenge and from a previous study.
The numerical experiments proposed in this paper can be reproduced with Mat lab and Pytorch
codes as well as with datasets available at the following URL: https://github.com/ Drea.rnlP / Stra.inNct.
Data-driven approaches have revolutionized several scientific domains in the last decade. Th.is is
probably due to a combinfition of several factors, in particular the dramatic improvement of computing
and information storage capacity. For exam ple, it is now quite easy to have large datasets gathering
many examples for a task of interest, as millions of labeled images for an iniagc classification task.
However, the most cruci,11 p oint is probably the rise of new machine learning techniques built on
artificia l neural networks. While neural networks have been introduced in the 1950s and have been
continuously improved since then, a major breakthrough occurred a few years ago with the demonstr a-
t ion that deep ncmal networks [13] give excellent results in many signal processing applications. The
most iconic brcakthro11gh is probably the famous 2012 paper [11] which demonstrates a deep learning
based syst em outperforming the comp eting approaches in the lmageNct classification challenge.
The basic clements of neural networks arc neurons connected together. 1u fccdforward neural
networks, neurons m-e organized in layers. The input layer encodes input data (an image in image
classification ta.'lks, two images in the present problem) and the output layer encodes the associated
label (the posterior probability of a cla.<;s in cla.'lsification, a displacement field here). The intermediate
layers arc the hidden layers. T hey a.re made of neurons which output a signal. Each neuron of the
hidden and output layers arc connected to neurons from the preceding layer. The output of these
neurons arc a weighted sum of t he connected neurons modulated by a continuo us non-decreasing
function called activ-a.tion function , except for the output layer in a regTcssion problem as here where
no activation is involved. The most popular activation function is the so-called Rectified Linear Unit
(ReLU) [13].
Deep ncw·al networks arc ca.lied "deep" because they may be made of several t eus of Layers. As we
have seen, neural connections between layers involve weights. Note that "weights" a.re also commonly
referred to as the "network parameters" in the literature. T he term "weight" is used here t o avoid
c,cmfnsinn wit.h t.h,1 nt.h,:r raramd.,:rs rldi nc:rl in t.h,~ papr1r .
Compu ter vision applications typically caU for a s11bclass of deep neural networks known as con-
volutional neural networks (CNNs) where the number of weights is mitigated by imposing that the
weighted su ms correspond to convolut ions , which turn out to be the basic operators of signal process-
ing. Nemons from a convolut ional layer iuc represented as the output of the implemented convolutions,
hence the blue parallelepipeds in Figure 1. Another ingredient used in CNN is down-sampling which
reduces the width of the layers involved. In t he p resent work, down-sampling by a factor (the so-
called stride) of two is obtained by computing convolutions shifted by two un.its instead of one unit as
in a standard convolu tion. This explains t he narrower and narrower layers in the feature extraction
part in Fig1rrc 1. Note that in th.is fig,rre, 11p-sarnpliug is performed Hu-o ugh the so-called transposed
convolutions when predicting the displacement field .
However, deep CNNs arc still likely to require several millions of weights. As in any supervised
lear ning task, the weights arc set by training the CNN over a dataset made of observations, that is,
2
pairs of inputs and corresponding expected outputs. More precisely, training, called deep learning in
this context, consists in minimizing with respect to the weights the cost of errors between the expected
outputs and the CNN outputs obtained from the inputs and the current weights. The error cost is
measmed through the Slrcalled loss function. The optimization method implementing the training
process is most of the time a variation of the mini-batch stochastic gradient descent (SGD) algorithm:
sma!J sets of observation.~ arc iter atively randomly d..rawn from the whole data.-;et, giving a slrcalled
mini-batch, and the weights of the CNN arc then Hlightly updated in order that the sum of the lossllli
over each mini- batch decreases. At each iteration, the outputs of the CNN computed from the mini-
batch inputs arc t hus supposed to get closer to the exp ected outputs. The magnitude of the weight
update is proportional to the step size of the SGD algorithm, called learning rate in machine learning
applications and denoted A in this paper. This parameter has to be set carefully. E ach complete pass
over the dataset is called an epoch. Many epochs arc needed to perform the full training of a CNN.
However, training a deep CNN requires a large dataset a.nd heavy computational rcsomces, which
is simply not possible in many i-;ituatioru;. A popular alternative approach consists in using freely
available p r e-trained networks, tha.t is, CNN that have been trained on standard datasets, and in
adjusting their weights to the p roblem of intercHt. Adjustment can be performed by fine tuning or
tranHfor learning. The former consists in marginally changing the weights in order to adapt them to
the problem of intereHt and its dataset. The latter consists in changing a pa.rt of the arch..itccturn of
the pre-trained network and in learning the corresponding weights from the problem of interest, the
remaining weights being kept constant. Let us now examine how such networ ks have been used in the
literature for solving a classic computer vision problem similar to ours, namely optical flow estimation.
Optical flow is t he apparent displacement field obtained from two views of a scene. It iH cairned
by the relative motion of the ohscrvcr and objects in t he scene which may move or deform. Recent
algorithms of optical fl.ow estimation ba.<-Jcd on CNNs provide a promising alternative to the meth-
ods cla.-;sically used to resolve this problem in computer vision after the seminal papers by Horn and
Schimek [14] or by Lucas and Kana.de [15] . As a typical machine learning setup, Al-based optical flow
estimation algorithms can he divided into three categories: unsupervised, semi-supervised, or snpcr-
vised. Unsupcrviimd [16, 17, 18] and semi-supervised [19, 20] methods arc reviewed in the literature
to add..ress the problem of lim..ited training data in optical flow estimation. ln contra1-lt, these methods
do not yet r each the accuracy of their supervised countcrpa.rtH. Furthermore, supervised methods a.re
the predominant way of learning, as described in the preceding section, and gcnera.lly provide good
performance. However, they require a large amount of accurate, ground-truth optical fl.ow mca.'i\ITL'-
mcnts for training. Most accmatc models m;c CNNs as one of the components of their »-ystcm, a.s in
DCFlow (21], MRFlow [22], Dcepflow (23], and F low F ield-; [24]. None of thlllic previous approachllli
provide cnd-tlrlmd trainable rnodeL'-l or real-time processing performance. The most efficient algo-
rithms recently proposed in the Literature for optical flow estimation a.re reviewed below. In general,
t.hr.y share! t.h ci same! arc:hitc!d ,11ni (a. schmnatic· vic~w is r cprPscmtc~d in F igm·c! 1 ) . 'fhci first. pmt. nf the!
network cxtrads the foa.turcs of the images and the optical flow is preclictcd t luough an up-sampling
procci-;s in the second pa.rt of t he network.
Doi-;ovitskiy ct al. [12] presented two CNNs called FlowNctS and FlowNctC to learn the optical
flow from synthetic dataset. These two CNNs a.re coru;tructcd based on the U-Net architecture [25].
FlowNctS is an cnd-tlrend CNN- ha.-;ed networ k. It concatenates the reference and the current imagllli
together and focdi-; them to the network in order to extract the optical flow. In contrast, FlowNctC
creates two separate processing pipelines for each of these two images. Then, it combines them with
a correlation layer that porforms multiplicative patch comparisons between the two generated feature
maps. The resolution of the output optical flow in both networks is equal to 1/ 4 of the image resolu-
t ion. Indeed, the authors explain that adding other layers to reach full re8olution is computationally
expensive, and docs not rca.lly improve the accuracy. A consequence is that with t his network, the
optical flow at full resolution is obtained by performing a hi.linear interpolation with an upscale factor
3
• Convolutions + ReLUs
• Transposed Convolution + ReLU + Convolution
Figure 1: ScliemaLic view of a oou volutional ucura.J nCLwork. AuLic:ipatiug the result;; discu&md in SL-cLion 6.1, we
prc.-scnt here a schematic view of J<'lowNct-f. The feature extraction part eoru;isLs of several couvoluLion layers followed
by a H.cLU (!{edified Linear Unit 113]). The last layer has a stride of two Lo perform down-sampling. Such a stack is
called a level. The o utput of these levels can thus he rcpreseutcd by uarrowcr and uarrower blocks, iu blue here. Ju the
displacement prediction part, the levels arc made of t rausposed convolution laye rs (for u p-sampling) and convolution
layers. The input of each level o[ this part is the output of the precL~Jiug level, concal,cnatcd to the output of the levels
frow the feature cxtractiou part, as rcprcscute<l by the black arrows.
of 4. This point is discussed further below as we do not use exactly the same approach to solve the
problem addrc;;sed in this paper.
Hu et at. [26] propo;;cd a recurrent spatial pyramid network that inputs the full-resolution images
and generates an initial optical flow of 1/8 full-rci;ulution. The initial flow is then upscaled and refined
recurrently based on a.n energy function. T he initial generated optical flow is converted to reach
the full-resolution flow by performing this recurrent operation on the spatial pyramid. Tli.iH network
achieve;; comparable performance to FlowNctS with !)0 times less weights. It is however twice as slow
as FlowNctS because of the recurr ent operations performed on the spatial pyramid. Note that when
CNNs arc compared, it is important to use the same image size and the same GPU for all of them.
In addition, the computing time, or inference time, is generally considered as equal to the mean time
value over a certain number of pairs of images.
FlowNct 2.0 [27] is the extension of FlowNct [12]. It stacks multiple FlowNctS and FlowNctC
networks in or der to obtain a more accurate model. F lowNct 2.0 reduces the estimation error by more
than 50% compared to FlowNct. However it has three limitations. First, the model size is 4 t imes
larger than the original FlowNct (over 160 million weights). Second, F lowNct 2.0 is 4 times slower
t han FlowNct. Third, the sub-networks need to be trained sequentially to reduec ovcrfitting problem.
Ovcrfitting is a non-desirable phenomenon, which occurs in dc-cp learning when the cardinality of the
dataset is too small with respect to the number of weights to be trained. This leads to a network, which
foils in c:or rc1d.ly p rnrlict.in p; t.hn opt.ir:al flow from imagns w h ich arc1 t.oo rlifforcmt from t.hosn r:nnt.ainnrl
in the training data.set.
Ranjan et al. [28] developed a compact spatial pyramid network, called SpyNct. It combines deep
learning with clas:,ical optical flow estimation principle such as image warping at each pyramid level,
in order to reduce the model size. In SpyNct , the pyramid level-; arc trained independently. Then, each
level uses the previous one as an initialization to warp the second image toward the first th.rough the
spatial pyramid network. lt achieves similar performance to F lowNctC on the same benchmark but is
not as accurate as FlowNctS and F lowNet2. 1n contrast to F lowNet [12], SpyNet model is about 26
t imes smaller but almost similar in terms of rnnning speed. In addition, it is trained by level, which
means that it is more difficult to tra.in than FlowNct.
LitcFlowNct [29] is composed of two compact sub-nctwmks specialized in pyramidal feature extrac-
t ion and optical flow estimation, respectively named NctC and NetE. NctC transforms the two input
4
images into two pyramids of multi-Hcale high-dimen:-;ional features. NctE com;i:-;ts of ca:-;caded "flow-
inference" a.nd "regularization" modules that estimate the optical flow fields. LiteFlowNet performs
better than 8pyNet and FlowNet, but it is outperformed by FlowNet 2.0 in terms of accuracy. It ha.-;
30 timcH less weight.:-; than F lowNct 2.0, but it is 1.36 times faster.
Sun ct al. [30] proposed PWC-Net. This network is based on simple a.nd wcll-c.-;tablished pri11ciplcs
such as pyramidal processing, warping, and cost volume. Similarly to FlowNet 2.0, PWC-Net :-;how:;
the potential of combining deep learning a.nd optical flow knowledge. Compared to FlowNet 2.0, the
model Hize of PWC-Nct iH about 17 t ime:-; Hrrmller and the inference time iH divided by two. Thi:;
n etwork is 1;1bo easier to train than SpyNct [28]. PWC- Net outperfor ms all the previous optical flow
method:-; on the MPl Sintcl [31] and KITTI 2015 [32] benchmarks.
All the:-;e studies demonstrate the u:-;ability of CNNs for estimating the optical flow. Thi:-; motivated
the development of Himilar model-; in order to tackle analogom; problem:-; a.riHing in other rc:-;earch fields.
This is typically the case in [33], where the authorn used FlowNetS to extract dense velocity fields from
particle images of fluid flows . The method introduced in this reference provides promising rnsults, but
the experiments indicate that the deep learning model docs not outperform classical method:-; in terms
of accuracy. Cai ct al. [34] developed a network based on the LitcFlowNct [29] to extract dem;e velocity
field:-; from a. pair of particle images. T he original LiteFlowNct network is enhanced in order to improve
the perform.a.nee by adding more layers. This leads to more accurate rc:;ult:; than those given in [33]
at the price of extra computing time.
In the following section, we propo:-;e to employ a deep convolutional ncw·al network to measure
full-field displacement field:-; that occur on the :-;urfacc of deformed solid:-;. AH for 2D DIC, we assume
here that the deformation is plane and that the swfacc is patterned with a random speckle. The main
difference with the examples from the literature discussed above is twofold. First a deformation occurs
a.nd second, the resulting displacements arc generally small, which means that subpixcl resolution must
be achieved. 1n addition, the idea is to use thi:-; measurement in the context of mctrology, meaning
that the errors affecting these measurement fields must be thoroughly estimated. A specific dataset
and a dedicated network chri:;tcncd "Stra.inNct" arc proposed herein to reach these goals, as described
in the following sections.
4. D ataset
Almost a.ll existing datasets arc limited to large rigid-body motion,; or quasi-1;gid motions and
deal with natural image:-;. Hence, an appropriate dataset consi:;ting of deformed speckle images had
to be developed in the present :-;tudy. This dataset should be made of pairs of images mimicking
5
typical reference and deformed speckles, as well as their associated deformation fields. It has to be
representative of real images and deformations so that the resulting CNN has a chance to perform
well when inferring the deformation field from a new pa.ir of images that arc not in the training
dataset. Speckle images have a smaller variability than natural images processed by computer vision
applicatiom;. However, we shall sec that we cannot use smaller datasets as the one of Table 1 because
we seek tiny djsplaccmcnts.
Genemtin_q reference speckle frames. 1n real experiments, the fu-1>1 step is to prepare the specimen,
often by spray-painting the surface in order to deposit black droplets on a wrntc surface. Tltis process
was mimicked here by using the speckle generator proposed in [3!J] . This generator consists in randomly
rendering small black disks superimposed in an image, in order to get synthetic speckled patterns that
arc similar to actual ones. Reference frames of size 256 x 256 pixels were rendered with t rns tool. ThiH
frame size influences the quality of the results, as di'lc1issed in Section 6.1 . These reference frames were
obtained here by mixing the different settings or para.meters which arc li;;tcd in Table 2 (Hee (3!J] for
more details).
Table 2: Par,unelers used to render Lhe images for Speckle <latasel 1.0 a.n<l 2.0 (s<--concl column), an<l the images
dcformc.-<l through the Star displaccmeut of Sectiou 5.3 (thir<l colUUUI).
The quality of these reference frames was visually estimated and those featuring very large spots
or poor contrast were eliminated. Only 36:3 frames were kept for the next step. Figure 2-a. shows a
typical speckle obtained with our generator and 1rned to build Speckle dataset 1.0. It can be visually
checked that the aspect i~ similar to a. real speckle u;;ed for DIC in experimental mechanics, sec typical
examples in (l ].
1
Rcfcre11cc s p<--ckle frames u;;c<l in sp<-tlle dataset generation au<l Matlab co<lc a rc available at
github.com/DreamIP/StrainNet
6
(a) R.e fcrc ucc spoddc intago (h) R.andow clispla<:ctmmts a loug Lhc :c (t:) R.audom displaccuwuLs aloug Lbc y
axis a.xis
Figure 2: Typical synthetic speckle images and ra.ndom displacem ent (in pixels) aloug x and !I usG'Cl in Speddc dataset.
1.0. J\ll dimcns ious arc in pixels.
Defining the displacements. The following Htrategy was used in order to generate a rich Hct of dis-
placement fields covering the widest possible types of displacement fieldH. The first Htcp consis ted in
Hplitting each reference frame into adjacent square regions of size 8 x 8 pixels. The pixel located at the
corners of these regions were affected by a ra.miom displacement. The second step consisted in linearly
interpolating t he displacements between all thrn,;c pixels to get the displacement at the remaining pixels
of each squm·e region.
Sir1cc we were interested here in estimating subpixel dis-placements, the impoHed random displace-
ments lied between -1 and + 1 pixel. Furthermore, the displa{:ements at pixels located along the
boundary were set to zero to limit the error due to the missing information in these regions. Figures 2-
b and -c show typical random displacements used to deform reference speckle images to obtain their
deformed cow1terparts.
Generating the deformed fmmes. Ea.ch of the 363 reference frames was deformed 101 times, each time
by generating a new random displacement field. 363 x 100 = 36,300 pairs of frames were used to form
the dataset on which the network was trained. 363 other pairs were used for validation purposes in
order to ass<~'!H the ability of the network to render faithful diHplacemcnt fields after training. As in
other HtudieH dealing with optical flow C',stima.tion, this ability is quantified by calculating the Ho-called
Average Endpoint Error (AEE). AEE is the Euclidean distance between predicted flow vector and
ground truth, averaged over all the pixel'!. Thm,
l K L
AEE = KL LL J(ue(i,j) - ug(i,j)) 2 + (v.,(i , j) - v9(i, _j))2
i= l j = l
where (1.1,e, ve) and (ug, vg) arc the estimated optical flow and the ground truth, respectively, dcfilled
at each pixel coordinate (i , j) . l( and L a.re the dimensions of the zone over wh.iclt the AEE value i~
calculated. This quantity is in pixel, the displacements being expm'!Hcd in pixel.
The main intereHt of the Hpeckle generator deHcribed in (39] iH that it doeH not rely on interpolation
when deforming the artificial Hpeckled patterns ir1 order to pr event additional biases caused by interpo-
lation. However, Hince it relics on a Monte Carlo integration scheme, a long computing time is required.
This generator is therefore only H1iitable for the generation of a liniited number of pairs of synthetic
reference and deformed images. Such an approach iH not adequate here since Hever al thousands of
images had to be generated. T he solution adopted here was to use bi-cubic interpolation to deform
the reference images through the randomly-defined d iHpla.cement fields, corn,idering that it yielded a
reaH<mable trade-off between computing time and quality of the results. Hence the Hpeck.le generator
described in (39] was only employed to generate the reference frameH mimicking real speckles uHcd in
experimental mechanics.
7
5. Fine-tuning networks of the liter ature
In general, CNNs arc developed for specific applications and trained with datasets suited to the
problem to be solved. The question is therefore to know if the studies reviewed in Section 3 above, which
ha.ve goo<l performance for <'Btimating large di'!placemcnts, can be fully or partially used to address
the problem of the present paper, namely, resolving subpixel (~ 0.01 pixel) displacements. Transfer
learning and fine tuning arc possibilities to respon<l to this question, as explained in Section 2.
When classification iK the ultimate goal, transfer learning is carried out by r eplacing only the
last fully-connected layer and training the model by npdating the weights of the replaced layer, while
maintaining the same weights for the convolutional layers. The idea behind this is that if the dataset at
hand is similar to the training dataset, the output of the convolutional layers keeps on being relevant for
the problem of interest. Since our dataset is completely different from the datascfa; used for training
the CNNs described in the studies reviewed above, we proceeded to a fine t u ning by updating the
weights of all the layers and not only the last one. The weights found in the cases di'!cussed above
were considered as starting points for the fine-tuning p erformed on our own dataset.
In the following, we discuss the fine tuning of the networks considered as the most interesting ones
in terms of accuracy and {;()mputationa.l efficiency. We did not choose to fine-tune Flownct 2.0 because
of its high computational cost, as di'!cusscd in Section 3. Instead, we ma.inly relied on the FlowNct,
PWC--NET, and LitcFlownct models [12, 30, 29] because they exhibit better computational efficiency
during both fine tuning and inference. In practice, we m,cd t he PyTorch implementations of these
models provided in [40], [41), and (42], respectively.
8
PWC-NBT IAEE=0.356
I
I
LiteFlowNet IAEE=0.3.S
I I
.Figure 3: Average ABE value for four actworks Lrairwd on Speck.le daLasct 1.0.
2 SLar (nuuei; used here arc available at g i thub . com/DreamI P/Str ain Net
!)
...
·oo
20()
,.
MO
,00
.."'.'
400
,00
~:o 400 600 w.:o IOOO 1200 1'00 1600 1800 2000
100
;'(I(;
,00
400
51JO ~.,
A.:O 600 1000 1200 aoo 1600 1800 2000
"'' ""'
( b ) Ileforcncc displaccmcnL
F igure -1: ( a) R.cfcreucc image oorn,spoml ing Lo (l>) the Star <l.isplacerucut. The grccu rcctaugle is u sed iu this work lo
evalual e the r csull.s. AU dimensions arc i.u pixels.
dealing with t he mctrological p erformance of full-field measurement systems [10, 46], the sh.ift between
two consecutive subsets is equal to one pixel, wh.ich p uts us in a position where DI C is at its best
performance level (regardless of computing time). The results obtained in t hese different cases arc
given in F igure 5.
The worst results arc obtained with FlownctC. Those obtained vvith LitcFlownct and PWC-NET
arc better , but it is elem that FlownctS provides the best results, with a displacement map which is
similar to that o btained with DIC and 2M + 1 = 11 pixels. T he poor accuracy obtained by the first
thn,'c networks (FlownctC, LitcFlownct, and PWC-Nct) may be due to the fact that these networks
make use of predefined blocks such as correlation or regularization, and that they were originally
developed for t he determination of large displacements instead of subpixel ones.
As in previous studfos dealing with q uality assessment of full-field mcasmcmcnt techniq1m'i, such
as [7, 46, 4] for instance, the displacement a.long the horizontal a.xis of symmetry, denoted here by 6.,
is plotted in Figure 6 in order to clearly sec the attenuation of the sigT1al when going to the highest
i-.'Patia.l frequencies of the Star displacement (thus when going to the left of the displacement map), as
well a8 the high-frequency fluctuation8 affecting the results. It i8 worth rem embering that the refer ence
value of the displacement a.long 6. i8 equa l to 0.5 p ixel. A b lue horizontal line of ordinate 0.5 pixel is
t herefore 8upcrimpo8cd to these c1irvc8 to vi8ualizc this reference v<1.luc. The closer the curve8 to this
horizontal line, the better the rcsult8. Displacements obtained with DlC arc also 8upcrimpo8ed for
comparison p1irpo8cs. The mean absolute error obtained column-wise is also given.
It is worth noting that at high ;;patia.l frequencies (left- hand par t of the graphs), F lowNctS provides
rc;;ult8 which arc similar to those given by DIC with a 8u bsct of 8izc 2J'vf + 1 = 11 pixcb. At low spatial
frequencies (right-hand part of the graphs) , DIC provides smoother and more accurate re8ult8 than
FlowNctS. Calculating in ca.ch case t he mean absolute error for the vertical displacement gives a.n
estimation of the global error over the displacement map in each ca8e. This quantity is calculated over
the green rectangle plotted in Figure 4b to remove boundary cffoct8. Thi8 quantity, denoted by MAE
in the following, i8 defined by the mean value of the ab8olutc error throughout the field of interest.
ThlL'i
10
05 )(1_0_5 _ _ 05 ~10~
0 5 0 5
-0 5 -0 5
200 400 600 800 1000 1200 1400 1600 1800 200 400 600 800 1000 1200 1400 1600 1800
Dlsplaoernent Otsptacemenl
0.5
100 100
200 200
300 300
,oo ,oo
--0.5 0
200 400 600 800 1000 1?.0{) 1400 1600 1800 0.5 200 400 600 800 1000 1200 1'00 16(X) 1800 -0.5 0.5
Etror Error histogram Error Error hi$togram
(a) DIC , sul,scl size 2M + l = 21 pixe ls (u) DIC, subscL size 2M + l = 11 pixels
O!'i x 10.:_ __ 05 x10_, _ _
6
100
200 O 5
200 O 5
,oo 400 ~ - -
-05
200 400 600 800 1000 1200 1400 1600 1800 200 400 aoo aoo 1000 1200 1400 1600 rnoo
Dltp'8oemen1 Dlsp~oement
1;~1/._,·~ ~~ --~,-, · ·- .. ~-:--- ----: ~ ;i
100 100 /
200 200
300 300
'
,oo 400
""~~~:~.-;i :--:_-="k ;-::~ ~......-~=. :---I~c ._ ..: :. : -i --0.so ,___ __
200 400 600 800 1000 t200 14-00 1600 1800 0.5 ·200 400 600 soo 1000 1200 1400 ,soo moo -0.5 0.5
Error Error hi$togram Error Error histogram
0 5
-0 5
200 400 600 800 1000 1200 1400 1600 1800 200 400 600 800 1000 1200 1400 1600 1800
01.splaoement
0.5 0.5
100 100
200 200
300 300
, oo ,oo -0.50 , __ _~
200 40() 600 800 1000 1200 1400 1600 1800 0.5 200 4()() 600 800 1000 1200 1400 1600 1800 -0.5
Error Error histogram Error Error histogram
.Figure 5: Star displacement obtained (a)(b) wiLb D IC and (c)-(f) with four sclectod CNNs. ln each case, the rdrievc<l
di,;placc10cuL field , Lite d.illereuce wilh I.he reference displacerncal fie ld aud lhc histogram of Lhis difforeucc a.re given in
lum in Lhc cli!Icrent s ub-figures. All djmcusious arc in pixels.
l I< L
MAE = KL LL lve(i,j) - v 9 (i,j) I (2)
i= l j= l
where Ve and v 9 arc the estimated displacement and ground truth, respectively. K and L arc here
t he dimensions of the gTccn rectangle over which t he rc.-mlts arc calc:ulatcd. MAE is equal to AEE
defined in Equation 1, in which the horizontal displacement has been nullified. It i.-; introduced here in
order to focus on the error made along the vertical dir ection only, the relevant infonnation being along
th.is direction for the displacement maps extracted from the Star images. Table 3 gives this quantity
calculated for DIC and F lowNctS-ft.
It is dea r that the .MAE value is the lowest for DIC with 2M + 1= 11 pixels. It is followed by
11
0,6
·0.4 l
0 200 .coo eoo
0.35 ~ _.=,.~--. ,~ - - - ~ - - ~ - - - ~ - - ~ - - ~ - - - ~ - -
__
- _- _- '-
~-- _- - ~ ~
i 0.3 "\
"S~ 0.25
..
2 0.2
'
j 0.15
.
0
! 0.1
: 005
::i;
Figure 6: Comparison bclwccu FlowNetS--ft and DlC. Top: displaccmeut along the horizontal axis of syuuneLry of U1c
Star displacemeul field. The ltori>1011tal blue line corresponds lo the rcfcrcucc disµla<::cmenl along 6.. TLLi,; reference
displacement is equal Lo 0.5 pixels. l3ottom: mean absolute e rror obtainL-<l column- wise. Closo- up views of the
rightmost part. of tbe graphs arc also in.sertecL All dimensions arc in pixels.
Table 3: MAE (in pixels) for D IC (subset si>le 2M + 1 = 11 aud 21 pixels) a11d FlowNetS-[t.
FlowNctS-ft. Th<',sc first results a.re promisrng but they come from predefined networks which arc
therefore not sp ecifically developed to r<'A
'>olvc the problem at hand. The best candidate, namely
FlowNctS-ft, was therefore selected for further improvement. This is the aim of the next section.
Two types of modifications were proposed to enhance the accuracy of the results. The first one
concerns the network, the second the dataset.
12
therefore proposed to improve the network to directly output a higher resolution optical flow, without
interpolation.
In general, the deeper the network, the better the results. The dataset shall however be enlarged
accordingly to avoid overfitting (sec definition in Section 3 above)). Two approaches were examined
in this study iu order to enhance the accuracy of the predicted optical flow. The first one consists in
adding some levels to the network (thus increasing the number of layers) , the second in changing the
architecture.
"
Loss = L AiCi (3)
i: l
where n is the number of level<, forming the network, A; is a. weighting coefficient corresponding to the
i-th level, and c; i8 the AEE between of the output of the i-th level and the ground-truth displacement
sampled at the same r esolution. T his los8 function was adapted to the proprn,cd networks by keeping
the same Ai coefficients as in [12] for the levels corresponding to the F lowNctS levels, and affecting a
coefficient of 0.003 to each new level. Compared to the strategy defined in Sc'Ction 5.1, only the loss
function is modified. Two training scenarios were used here. lu the first one, the new networks were
fine-tuned by updating all the weights of the networks. lu the second scern1rio, only the weigh.ts of the
new level8 were updated. The remaining weigh.ts were the same as those obtained after applying the
fine-tuning process described in the previous section.
By applying the first scenario for training the new networ k with one additional level only, the
average AEE value definG'C! in Section 5.2 increases from 0.107 to 0.141, which means that the result8
arc worse. The same trend is observed with the MAE value deduced from the displacement map
obtained with the Star images (MAE = 0.3334). The displacement map obtained in this case is shown
in Pigurc 7. It can be seen that the error made is significant. Adding one more level and retraining ,ill
the network weights docs not improve the quality of the rcsu.lts, which means that the first scenario is
not a good option for training the networks.
On the contrary, the second scenario really improvc8 the accuracy of the res1ilt8 when considerin.g
the random displacement fields mmd to generate the 363 deformed speckle images considered ru,; test
dataset, with a reduction of the average AEE of more than 50% compared to FowNctS-ft, as reported
in Table 4. The average AEE concerns the 363 displa.c:cmcnt field<; retrieved from the 363 pairs of
images of the test dataset. When considering now the Star displacement, we can sec that the MAE
reported in Table 4 for each of these two new networks is nearly the same as the MAE obtained with
PlowNetS-ft . This is confirmed by visually comparing the Star displacement reported in Figure 8 which
arc obtained with these two new networks, and the reference Star clisplaccment depicted in Figure 4b.
Besides, it can be seen in Figure 8 that almost the ,mrne accuracy as F lowNctS-ft is obtained when
considering the Star displacement field of Figure 46.
The displacements a.long the horizontal a.xis of symmetry I::!.. of the Star displacements which a.re
obtained with these two networks, F lowNctS-ft, and DIC (2M + 1 = 11 and 2M + 1 = 21 pixels) arc
depicted in Figurn 9 to more easily compare the different techniques. No real improvement is clearly
observed with t hese curves. T he mean absolute error estimated column-wise is lower for the proposed
networks than for DIC (2M + 1 = 11 pixels) for medium-range spatial frequencies, between x = 200
and x = 400, but thi8 enor becomes higher for x > 600. The obtained results show that the networks
13
0 0.5 x105
6
100
200 0 5
300
400 4
-0.5
0 200 400 600 800 1000 1200 1400 1600 1800
Displacement 3
0 0.5
100 2
200 0
300
400
-0.5 0
0 200 400 600 800 1000 1200 1400 1600 1800 -0.5 0 0.5
Error Error histogram
Figure 7: Star <lisplaccrnem outaiuc<l l,y adiling one level only Lo FlowNctS, perfonniug inlcrpolaliou Lo reach full
resolution and updating all the weights, (MAE - 0.3331). All dimensions arc in pixels.
Metric FlowNctS-ft DIC, 2M + 1 = 11 pixels Network with one additional level Network with two additional levels
Average ABE 0.1070 0.0560 0.0450
MAE 0.11437 U.03(i5 0.0445 0.0171
Table 1 : Com parison be.tween i- FlowNctS-fL, ii- DIC with 2M + l = 11 pixels and iii- FlowNctS aft,cr adding one or
two levels and updating only the weights of the new levels. A vcragc ABB caJculaL,xl over the whole images of the test
dataset an<l MAE cakulatc<l with the Star ilisplaccwenL.
5
C).5 x 10_ _
6
100
200 0 5 O 5
300
,oo
-0 5 -0 5
200 400 600 800 1000 1200 1400 1600 1600 200 400 600 800 1000 1200 1400 1600 1800
Dlsplacemen1 Cl!Jplacemen1
J°'
0 .5
100
200 0
300
,oo ,; I ,
--0.so '--- - .c.so'-. -' --
200 400 600 800 1000 l200 1400 1600 1800 -0.5 0.5 200 400 600 800 1000 !200 14'.')f) 1600 1800 --0.s 0.5
Error Error histogram Error Error histogram
(a) Optiou 1: Ade.ling oue level to FlowNotS, (!.,) OpLiou 2, Add iug Lwo levels Lo FlowNcLS,
MAE = 0.044S. MAE = 0.0471.
l<'ip;ure 8: Star <lisplac:crncnt obLained by adding one or two levds to FlowNctS a ud updating only Lhc weights of the
ucw levels. All c.litncusions a.re in µix.els.
proposed here enhance the learning accuracy on Speckle dat,.u-;et 1.0 compared to FlowNetS-ft but no
improvement is observed with the Star displacement.
14
0,6
~~~~---, -.,-·- ~
-,---
•
1- - ' -,- ~
~"Y:JJI, , .
-
-
"T ,-
- - ·DIC. 2M+ 1 "'21 pixels ; .,
r
<l 0.4
'"··-DIC. 2M• 1 ■ 11 pl,iels:
& I ; ••., ,,,.. ..•~ Fb,NetS.tt
;
j
0.2
0
~.v·~·-:'.
-
•
,._.
055
05
-
-·
Add onole.....
L 2 1 - - ...
0 .45
1200 1250 11 00 1350 L 1400 1~50 1500 1550 1600 1650 L1700
·0.4 l J..
0 200 ,coo eoo 800 1000 1200 1400 1600 1800
0.35
C - • ·DIC. 2M+1 ::21 pullels
§ 0.3 ~···- 01~ 2M•1 ■ 11 J)Qle!II
AowNofS,n
'S~ 0.25
.
e 0.2
'' 0 04
- Addone llM!II
~
' .
'
j 0.15 0.02
.
0
!
: 005
0.1 ' - ,o -'
1200 _ 1250..
... __ 1300
-........ _
1350 1400 1450 HiOO 1550 1600 1650
...............
1700
~
- - - • .- - - -. -- -
::i;
0
0 200 ,coo 800 800 ·1000 1200 1•00 1600 1800
Figure 9: Comparison uctwcen i- FlowNctS-fL, ii- FlowNetS after adding one or two levels a n<l updating only the
weights of ti.Jc new lcvcb, and iii- DIC with 2M + 1 - 11 and 2M + l - 21 pixcls. Oisplaccmenl.s along 6. and mean
absolute error per column.
with th.is second approach. The first one, named Stra.in.l\[et-f, is a full-resolution network obtained by
applying 4 down-samplings followed by 4 up-samplings. The second network, named StrninNet- h, is a
half-resolution network obtained by applying 5 down-samplings followed by 4 up-samplings. The same
loss function as in F lowNctS is used in both cases. These two networks a.re trained by using the weights
of the corresponding level-; of FlowNetS-ft as starting values, and then by fine-tuning all the network
weights. The same training strategy as that described in Section 5.1 was adopted. The average AEE
and MAE v-,1lues reported in Table 5 clearly show that these two proposed networks outperform the
pre,,jous ones, even though no r eal difl:'crcnce is visible to the naked eye between the Star displacements
reported in Fig11re 10 and those reported in Fig1rre 8 . This is clearer when observing the displacements
along I::,. reported in F igure 11 and comparing them with their ccmntcrparts in Figure !). lndL'Cd the
sharp fluctuations visible in the closeup view embodied in Figure 11 arc much sm,iller and smoother
t ha.n those shown in F igure !). In addition, the MAE per colu mn is lower in the latter than in the
former a.t high frequencies (for about 200 < x < 400). Finally, t he main conclusion of the different
metrics given in Tables 4 and maps or cmves showed in Figure 10 and 11 is t hat the last two networ ks
perform better than DIC (2NJ + 1 = 11 and 21 pixels, 1st order) at high spatial frequencies, and that
t hey prm,jdc comparable results at low frequencies . Let us now examine how to improve fmther the
re~mlts by changing the dataset used to train the networks.
Table 5: Comparisou between i- FlowNeLS-ft, ii- DIC with 2M + 1 = 11 pixels, and iii- FlowNeLS after changiug the
arcl1itectu rc with two opt.ions: half- re,;olution (8trainNct- h ) and full resolution (StraiuNet-f), aucl updating all the
weight..~. Average A.EE and MAE (in pixels), ealculat.ed over the whole irnagcs of the tcst dataseL.
Different methods were investigated in this section in order to improve the network. Table 6 gathers
a ll these methods and the corresponding results in order to help the reader compare them and easily
find the results obtained in each case. The results can also be improved by changing the data;;et. This
is the aim of the next section.
15
0.5 x 10~
0 5
4 4
,0 5 ,0 5
200 -too eoo 800 1000 1200 1400 ,aoo 1aoo 200 400 600 800 1000 1200 1400 1600 1600
100
200
Displacement
r· Dlsplttcemenl
0.5
300 , l·
,1 00 ff."_ I : _. : '-------------- -0.s oL-...-..___
200 400 600 800 IOOO t200 1400 1600 1800 -0.5 0.5 200 4(X) 600 800 lOOO 1200 1400 1600 1800 -<J.5 0.5
Eiror Error histogram Error Error histogram
(a ) F irsL neLwork: ha.U-rcsolulion ncLwork SlraiuNcl- h , MAE = (b ) Se cond nc lwork: full-reso lulion network SLrai uNel-f, MAE
0.0350. = 0.0:161.
Figure 10: Star d isplaccmenl. obtained aJtcr changjng the archil.ccl.ure of FJowNel.S wil.lt Lwo opLion..~: (a )
half-rc:;olution and (b) frill rc,;ulution, and updal.ing all the weights. Ail dimensious arc iu pixels. T he abscissa of the
vertical red line is such that I.be period of the sine wave is equal Lo 16 pixels, thus 1.wice Lhe s ize of Lhe regions used to
dcH.ne I.he displacernenl. maps gathered in Speckle dataseL 1.0. The red square aL the Lop left is Lite zone considered for
plot ting Lhc closeup view iu Figure 12.
0.6
<l OA
!~
02
0.55 ~ - ~ - = - -- - - - - - - ~ - -- --~=-----i
..
j
.e
0
0
-0.2
0.5
~
•
045 ( ·-~i., ...- ....... ~ • -
;
J
1200 1250 1'J_00 1350 I 1400 14r° 1500 1550 16«f 1650 l 1700
-0.4
0 200 800 800 1000 1200 1400 1600 1800
0.35, - - ; r a c ; - - - - . - - - - - - - - - - - , - - - - - - - - , - - -- ,- - - - - = = = = = = = -
- - ·DIC. 2M+1 =21 pbi~s
i 0.3 ·--·---CIC. 2M• 1 • 1 t obcetl
! 0.25 -
-
AowNetS.ft
Half-resolulc:ft
f"'-'esoli.mon
I
Q.
0.2
0.04
Figure 11: Comparison bctw<.-cu i- FlowNetS- ft, ii- DlC with 2/vl + 1 - 11 and 21 pixels, and iii- F lowNcLS after
cl.Hlllgiug the archllccl.urc with Lwo op tions: half-resolu Liou and fuU rcsolutiou, and updaLi.ug all the wcighLs.
Displaceme11Ls along 6.. and meau absolute error per columu. AJJ dirneusiuus arc iu pixels.
16
Approach Option 'I\·aining scenario R esults
1- Adding Levels one level Updating all the weights F ig. 7
1- Adding Levelc; one level Updating new Table 4, F ig. Sa
weights only
1- Adding LeveL'l two leveLc; Updating new Table 4, Fig. 8b
weights only
2-Changing architecture Half-resolution Updating all the weights Table 5, F ig. 10a, Fig. 11
(StrninNet-h)
2- Changing a.rchitectm e Fu.11-resolution Updating a.II the weights Ta.hie 5, F ig. 10b, F ig. 11
(Stra.inNet-f)
Table 6: tfoLhods aud options iuvcstigaLed Lo improve Lhc network iu this study.
Observing the bias at hi_qh frc.quencies in the Star displacement map. Before improving the dataset,
let us exam.inc in deta il t he rcsu.lts obtained in the preceding section. For instance, the error rna.ps
d epicted in Figure 10 show that the error suddenly i:ncrcasrn, for the highest frequencies (those on
the left). Interestingly, the location of this i-mdden change in rcspmL'lC of the network su bstantially
corresponds to the zone of the rlisplaccment map for which the spatial period of the vertical sine wave
is equal to twice t he size of the region used in Speckle dataset. 1.0, namely 8 x 2 = Hi pixels. This is
confirmed by superimposing to these maps a vertical red line at an abscissa which corresponds to this
period of 16 pixel'!. The explanation is that the displacement field coJLc;id ercd in Speckle dataset 1.0
arc linear interpolation of ra.ndom values 8 pixel apmt. They arc ther efor e increasing or decreasing
over an 8 pixel interval, and ca.nnot correctly reprc:,cnt sine waves of period lower than 16 pixcl8.
l n the same way, let us now enlarge the di:,placcment field obtained with StrninNct-h trained on
Spec:klc dataset 1.0. The zone under coll8ideration is a small square portion of the displacement field
(:-me precise locat ion with the red square in Figures 10a and l:fa). The result is given in Fig11re 12a. It
can be observed that the network is impacted by the fact that the dataset used for training pmposes
is generated from 8 x 8 independent r egions. Indeed, the network predicts t he displacement at one
point per region and then interpolates the rc8ltlts to obtain the full opticaJ flow. This p henomenon
is con.finned by down-:,arnpling a predicted displacement with a factor of 8 and then up-sampling the
result with the same factor. The resulting rlisplacernent is practically the same as the one given by
StrainNet -h trained with Spccklc datm,et 1.0. The main conclusion is that the network cannot correctly
predict the displacements on the left of the Star displacement because they occu.r at higher spatial
frcxi.uencics than those used in Speckle d ataset 1.0.
0.5
10 10 10
20 0 20 20
30 30 30
40 ·0.5 40 40
10 20 30 40 10 20 30 40 10 20 30 40
(a) SLrainNe L-b traioe<l on Speckle (b ) StrainNeL-b trained on Speckle (c) Ilefere uce displaceo1eut
dataset 1.0 <lalasCL 2.0
Figure 12: Result.:; obtained wiLh Stra.inNe t,.h Lraiucd on Specl<lc dataset 1.0 a.nd 2.0: closeup view at high s patial
frequcuc.ics area of the Star displaccmcuL (sec preeiHC locatiou iu Figures 10 and 1:1). (a) SLraiuNct,.h Lraiued on Speck.le
dataset 1.0. ( b) StrainNct- h trained on Speckle d ataset 2.0. (c) Reference displacement. All dimensions arc in pixels.
17
A com;cquencc of the remark;; above is t hat ;;quan! region;; smaller in ;;ize t han 8 pixel;; should a.l;;o
be included in the speckle dataset to be able to r etrieve displacement field;; involving ;;patia l -frequencies
higher than 1/8 pixcl- 1. A ;;econd and mor e suitable datai-;ct called Speckle dataset 2.0 was therefore
generated, as explained below.
Generating Speckle dataset 2.0. Speckle dataset 2.0 was generated with the same principle as Speckle
dataset 1.0, but by changing thrG'C design rules. First, regions of various ;;izc~-; instead of uniform ;;ize
of 8 x 8 pixels were considered to define the random ground truth di;;placement ficld'i. On the one
h and, the preceding remark motivates t he use of smaller rcgioI18. O n t h e other hand, lcss accu rate
c;;timation t han with DIC for low-frequency di;;placements (;;cc for instance F ig11re 11) motivates the
use of larger r egions. We therefor e considered regions of ;;izc equal to 4 x 4, 8 x 8, 16 x 16, ;12 x 32,
64 x 64 a.nd 128 x 128 p ixels.
Second, bilinear interpolation used in Speckle dataset L O wa;; replaced by a bicubic one to limit
potential interpolation bias. Third, a noise wa;; added to all t he images of the dataset in or der to
;;imulate the effect of sensor noise which al ways corrupt;; digital images, while only noi;;clcs;; images
were considered in Speckle dataset 1.0. This noise was hctcrm;cedastic to mimic typical sensor noise of
actual linear cameras [47. 43]. With this model, the variance v of the camera sensor noise i;; an affine
function of the brightn~-;s s, so v = a x s + b. We chose her e a.= 0.0342 and b= 0.2679 to be com;istent
with the values used to generate the noisy images used in the DIC Challenge 2.0 [4] .
The number of frames was t he same for C',ach region size. It was equal to 363 x 10 = 3, 630 (363
reference images deformed 10 t imes). Since six different region sizes were considered, Speckle dataset
2.0 contains 6 x ;1630 = 21780 different pairs of images.
100
200 0 5 0 5
300
a .,• -0 5
4
- - - =- =- - -
Ohipl41cement
- ~ - - - - - - - - - -- - - - , - 0_:;
200 400 600 800 lOOO 1'200
Ch1placement '"StnlnNttt.f"
1400 1600 1600
100
400
- ~ - - - - - - - - - - - - - ~ - -0.s o"-- -
• • • • - =- - -
En o, =
Err0r o u
h istogram ••••-=--- Error
0 .5
Error histogram
Figure 13: Star displacement obtairl(.'<l with Stra.iuNct-f aud Stra.iuNet-h traint.'<l ou speckle tlaLascL 2.0. The reel square
at I.he I.op left. is I.he zour, cousidenxl for plotting I.he doseu p view iu Figure 12. All tl.imeusious a.re iu pixels.
Results obtained at the right-hand side of t he disphwcmcnt map of Fig1ue 13 arc rather smooth.
In addition, bearing in mind that the colorba.rs used in F igure-; 10 and 13 arc the same, it can be seen
by compm-ing the error maps that the high spatial frequencies me rendered in a more accmate way
with the networks trained on Speckle dataset 2.0., in particular the high-frequency components in the
left-hand side of the displacement maps.
18
Figure 14 compares the rrnmlts obtained by StrainNet-h and Stra.inNet-f trained in tum on Speckle
datasets 1.0 and 2.0. The r esults obtained by these networks trained on Speckle data.set 2.0 arc also
compared in Figmc 15 with DIC (subset size 2M + 1 = 11 and 21 pixcb) . It is clear that the r e-
sults obtained after training on Speckle data.'lct 2.0 arc better and have less fluctuations of the error.
Furthermore, the results shown in Figure 15 show that StrainNet-h and StrainNet-f have a similar
accuracy to DIC at low frequcm;ics and a better one at high frequencies.
Finally, it can be concluded that trnining the network wit h Speckle dataset 2.0 instead of Speckle
dat,;1.-;et 1.0 leads to better results with the Star images. Let us now examine the influence of image
noise on the results.
0.6 .,.
<I 0.4 '. • · SmmNel•h (Speclde dataset 1.0)1 "'
f -
-
StrainNel-1 (Speci<le dalaset 1.0)
SlralnNel-h (Speckle da1- • 2.0)
c •
• 02
Stra,nNel-f (Speckle dalas&l 2.0) 1
ii
j
C
0
-0.2
00: E ~~
0.45 '--.,___...__.....__ _ _ _ __,___ _.__ _ ___._ __,
-0,4 J._ 1200 1250 1:j_Qll 1350 .t.)400 _ j4p0 1@0 1550 16QO 1650 J 1700
0 200 <I()() eoo 800 1000 1200 1•00 1eoo 1800
002 1~~~~~~~~r~~,.,~'
0.01 1""'
, , ~ -, •
'
0 -L- ...
1200 1250 1300 1350 1400 1450 1500 1550 1600 1650 1700
F igure 14: Comparison between the networks l.r aiu<.-d uu Sp eckle dataset 1.0 and Sp <.-cklc <lat,a,;et2.0.
Rcs11,lts obtained with noisy images. T he previous results were obtained with noiseless images. We
now consider noisy imagrn to evaluate the dependency of t he maps provided by Strainl'llet-f and -h to
image noise. T he Star images used in this case were obtained by adding a heteroscedastic noise similar
to t he one discussed above to the noiseless Star images.
Fig1.rre 16 shows the maps obtained with StrainNet-h and Strainl\/et-f with the St;u- images, as well
a.'l those o btained with DlC for comparison purposes (2M + 1 = 11 and 21 pixels). It is worth noting
t hat Stra.inNet-h and StminNet-f outperform DIC 2M + 1 = 11 pixels at both high and low spatial
frc.,,quencies. Indeed, the bias is lower for the high spatial frequencies and a smoother displacement
cli<,tribution (thus a lower noise) is obtained.
19
<1
gt 0.4
..
0
·.~~J,~\-:=~-~-:-~:~,:::·.~ ~ ~ i :~i::t --_,
'c
t~. ( / '1t
1
- StrainNet-h (Speckle dataset 2.0)
e 0.2 ,
,t• .. ,; ,.: ' ·'
,~ ' 0 •55
- StrainNet-f (Speckle dataset 2.0)
~
m ,,.;
0.
~ 0 \ :•,,) ,,.i 0.5 --:·:·:~: -_'.~~ -: ~:. ;-:'_: .··:~---:- ' ~_ /;:fr!~·:,,"':·,·,~ ~'-~ __::~,_: =~~~:
.•· 0.4
-0.2 L __ ___J__ _ __ L_ __J_,_,,,.CJ_..,__,,.,____._..,,,,_ ....,,.l>L.JL...e:n8L_ -"1>,.,__,_,,,,,,_...L..,_,,,,,,__ u,.,,,,,__ lll>,!IL_.Lli.'8l.J
C
~ 0.3
- - · DIC 2M• 1 • 21 pixels
..... .... DIC 2M+1 • 11 pixels
8:;; 0.25 - StrainNet -h (Speckle dataset 2.0)
- StrainNet-f (Speckle dataset 2.0)
:_:~
0.
g 0.2
0
~
Fii,,urc 15: Cornpa.rison bctwcc1t the uctworks Lraiued ou Speckle daLa,-;ct 2.0 aml DlCs (2M + 1 - 11 aud 21 pixels).
Top: displaccrncuLs a.loug !::,.. BoLLom: rn=u absol ute error per col umu. AU dirnensious a rc iu pixels.
sine displacement. ThiH a ttenuation is generally equal to 10%=0.l. l n the pr esent caHe, it means
t hat t he spatial rcHolution is the period of the vertical sine displa.cement for which the amplitude iH
equal to 0.5 - 0.5 x 0.1 = 0.45 pixel. T he value of d must be a.-; small as possible to reflect a small
!>'Patial resolution, thus t he ability of the mm.IBuring technique of interest to distinguish clm;e features
in di-;pla.cement and strain maps, and r eturn a value of t he displaccmentH and strains in these regions
with a small bias. ln certain cases, the di-;pla.cement r esolution can be p r edicted theor etically from the
transfer function of t he filter associated to the technique (Savitzky-Golay filter for DIC [5, 6], Gaussian
filter for the Localized Spectrum Analysis [48]).
I n the present case of CNN however , no t r anHfcr function has been identified so far , so d can only
be determined numerica.lly or graph ically, by seeking the intersection between the curve representing
t he error obtained along 6,, with noiseless imageH on the one hand, and a horizontal line of equation
y = 10% on the other hand, sec F ig11re 17. Note that the curveH were smoothed with a rectangular
filter to remove the local fluctuation of t he bias that could potentially diHturb a. proper eHtimation of
this q uantity. T he spatial resolution found in each case iH directly reported in each subfigme. 'Ne alHo
considered here second-order subset shape functions, this case being more favorable for DIC [49]. Only
t h e ea.,c 2M I 1 = 21 p ixels iH reported here, DIC diverging 11t some points with 2NI I 1 = 11 pixels.
The value of dis smaller with both StrainNet-f and -h t han with DlC 1rncd with first-order su bset
shape functions, even for 2NI + 1 = 11 pixels, whiled is nearly the same with DIC used with second-
order 1,ubsct shape functions. Indeed, F igure 18 (top ) shows that the d isplacement along 6,, iHsimilar
between Stra.inNet-f or -h on the one hand, and DIC with second-order subHet shape functions on
t he other hand. F ig1rre ]!), where closeu p viewH of the error map for the highest Hpatial frcqnenci<',;;
arc depicted, shows however t hat the way d is eHtimated is more favorable for DIC with Hccond-order
1,u bset shape functiou..-; than for Strnin.Net-f and - h. Indeed, an addit ional bias occurs with DIC when
going vertically away &-om the a.xis of symmetry /::;. , a.-; clearly illustrated in Figure H)a. F igures 19b-c
Hhow that it is not the case for StrainNet-f and - h. Th.is phenomenon is not taken into account when
estimating d since only the loss of amplitude along 6,, is considered. Consequently, when considering
t he mean absolute error per column as in F igure 18 (bottom), it can be observed that this error is
20
0 5
0 5 0 5
4
-05 -0 5
~ - ~ ~ =
=CNs.placement - - = 200 400 600 800 1000 1200 1400 1600 1600
Dlsplaeement
::,-,,, -- - - - - - - - - - - - - - , ,.0,0 0.5
100 100
200 200
300 300
400 ,oo
200
= 400
= - 600
- -800
-- 1000
---------••
1203 1400 1600 18QO
--0.so'---
-0.5
--
0.5 200 400 600 800 1000 1200 1400 1600 1800
-0.soL.__...__
-0.5 0.5
Error h ist ogra,m Error Error h ist ogram
(a) DIC, 2M + 1 = 21, MAE = 0.08:14 {b) DIC, 2M + 1 = 11, MAE = 0.0417
05 x 10_5 _ _ 0 .5 x 10_5 _ _
100
200 0 5 O 5
300 300
400 4 400 4
-0 5 .0 5
200 400 600 000 1000 1200 1400 1600 1600
: ~t~
0 I,placement Olsplaeeme<it
L,
o-~~f;_-_~---~---------~--~ • "'
100 ~~~:
::i
400 ~1 -0.
50
300 '}.:-
' -....,...__, 1100 .: .;~
200 400 600 800 1000 120) 1400 1600 1800 --0.5 0.5 200 400 600 800 1000 1200 1400 1600 1800 -0.5 0.5
Em>, Error h istogram Error Error hist ogram
-0.2 ~ - - - ~ - - - ~ - - = ~ ~ = ~ ~ = - ~1=3=
50~_1~4~0=0- ~ = - = ~ ~ = ~ ~ = ~ ~ = ~ ~ 17~00=
0 200 400 1000 1200 1400 1600 1800
C "- 1200•• .1250. 1300 1350 1400 1450 1500 1550 1600 1650 1700
:i oo: L____L_~:-"'=::----._
::~"' ::::"""~'.':"::'.'.i':::::~:-::--:".-~-~:::::'.'.:::~-=-'.i-~-~-=-==-:::::::::t::~~~::r:::::::::::::~~~~::'.'::~'.'::!::d
0 200 400 600 800 1000 1200 1400 1600 1800
(c) Displaccmcul. aloug 6 an<l wcau abfiolul.c e rror per c.oluwu . All diwcJ..tt.iom;; are .iu pixels.
lower for StrainNet-f and -h than for DIC with second-order subset shape functions.
21
100 - Wilhoul filler 100 - Without filter
- After median filter - After median filler
80 80
~
0
(/)
60 C
(/)
60
ro ro
iii 40 iii 40
0
--------- 0
20 40 60 80 100 120 140 20 40 60 80 100 120 140
Spatial wave length (pixel) Spatial wave length (pixel)
(a) DIC, 2M + 1 = 21 pixels, lirsL-ordcr subscL shape fuuc Liou.s ( h ) DIC, 2/vf + 1 = 11 pixel~, fusl- o r<lc r suhs<:L shape fuucLious
80
~ 60
(/)
ro
iii 40
0
20 40 60 80 100 120 140
Spatial wave length (pixel)
(c) DIC , 2M + 1 = 21 pixels, second-order s uhseL shape
fun.cl io ns
80 80
~ 60 ~ 60
(/) (/)
"' 40
iii "' 40
iii
0 0
20 40 60 80 100 120 140 20 40 60 80 100 120 140
Spatial wave length [pixel] Spatial wave length (pixel]
(d ) S lrai,,Nel- b (c) SLraiuNcl- [
Figure 17: Seeking Lhc spatial resolution of ead1 technique. The biii,; give n here b a percentage of Lhc displacement
ampliLudc, wh.ich is equal Lo 0.5 pixel
22
0,6 -,--- ,. ,--,
• M~~
l - - -DtC, 2M+1: 21 pixels .....
- Stra.,Not-h
~ s,.,...Nol•f -j
f ~~~t-~
1200L 1300 L 1400 J_SW 1600 1700 1800 L1900
-0.2 l
0 200 .coo eoo 800 1000 1200 1400 1600 1800
· ·· DIC, 2M•1 • 21 p l j
- sualnNot-h
SmmNet-f
0.05,~~~ J
0~
1200 1300 1400
--
1600 1600 1700
~~ 1800 1900
Figure 18: Comparison betwc-cn StrainNct and D IC (2M + 1 = 21 pixels) with S<.-cou<l-ordcr subset shape function
(noisy images). "lbp: disp lacements along 6.. BotLow: mca.o absolute e rror per cnluwu. All di.mcnsions arc in pixels.
0 0.5 0 0.5
0.5
100 100
200 200
300 300
400 400
-05 -0.5
-0.5 100 200 300 100 200 300
100 200 300 Displacement ,.StrainNet-f,.
Displacement "StralnNet-h"
Displacement "DIC 21x21" o .....,..,...,.,.,,,...,.,,.._=....,,
0 0.5 0.5
0 0.5
100 100
100
200 200
200
300
300
400
400
-0.5 -0.5
·0.5 100 200 300 200 300
100 200 300
Error Error
Error
J:,' igurc l!J: Closcup vicw of the crror map in pixels (for the high spatial frequencies) obt.a.incd wit.h St.rainNct. and DIC
with st.'Cou<l-order s ubset shape fw1etions.
23
which minimizr.H the Harne reHidual, but in the Hpatia.l domain. Hence eHtim ating t he product between
d and a,, iH a handy way to compa re measurement 8yHtems for a given biaH, but independently from
any other parameter chosen by the uHer such as the subset Hize for DIC. This product., denoted by a
and named "m etrological perfor mance indicator" in [7, 46], haH been calculated here for the different
a lgor ithms. The value of a., is merely estimated by calculating the standard deviation of the differ ence
between the displacement fields found by processing noisy and noiseless Star images. The values of a
found for each algorithm is reported in Figm e 20.
Thi8 indicator is nearly the Harne for DIC uHcd with 21\II + l = 11 pixel., and 21 pixelH (hit order),
which is consistent with the conclus ions of [7]. l t is also almost identical for Stra.i nNct- f and -h . Doth
lie between DlC uHed with first- and second-order Hubsct shape fu.ndions. Since the spatial resolution
eHtimated with d is nearly the same with DIC used with second-order subset shape fund.ions and
Strainl~et , it means that the noise level i8 higher in t he latter case. This can be observed in t he uoiHe
maps depicted in Fig11re 21. lo particular , the shape of the wave can be guessed in Fig11re 2lb-c. A
higher difference can alHo be ob8erved on the left, so for the highest spatial frequencies. lt means that
a slight bias is induced by noise in t hese two cases. Further investigations should be undertaken to sec
how to climi.nate this phenomenon, by changing the data.-;et and/or the laycrH of the network itHelf.
StrainNet-f
I I" = 0.49 I
Figure 20: Mctrological efficiency iu<licalor a for DlC (lsl aud 2u<l subset shape Iuuclious), SlraruNet.-1.i au<l
S LraiuNet-f.
24
100
200
300
400
01
.
.,
0 05
200
0
.•..
300
400
/·
'1>: , •
·:<i ·.~··
. .Q,05
.Q 1
200 400 600 800 1000 1200 1400 1800 1800
(b) SLrainNe t - !J
01
100 0.05
200
, :•
,,.
., 0
300
..
,.,. -0.05
400 ~
•
• •' ~1. I
U:..·•I , ► .Q.1
200 400 600 800 1000 1200 1400 1600 1800
Figure 21: Difference between displacerncnL ficl,L~ obtained with noisy and no isclc,;s s p<--cklc images (in pixels). All
dimcru;ions arc i.i, pixe ls.
retrieving the 50 corre8ponding di'lpla.cement field'>, and plotting a.gain the mean distribution a.long
/1. With the first experiment, the raJ1dom fluctuations due to sensor noise m·e averaged out (or at
least decrease in amplitude by averaging effect) . However, PIB is constant over the dataset Hince the
di<.Jp!accment is the same and the sp eckles arc similar from one image to ;mother , the only difference
bctwc,'Cn them being due to noise. In the second experiment, a ll the speckles arc different ;md they arc
noisy, so both the random fluctuations due to sensor noise on the one hand, and due to the r andom
fluctuations caru;c by PIB on the other hand, arc averaged out. Comparing t he curves obtained in
each of thc.-,c two cases enables us to numerically illustrate the effect of PIB on the displacement field,
and to study its properties.
Fig1rre 22 shows on the left and on the right the curves obtained on average in the first and second
ca.<-;cs, rc.<.Jp cctivcly. They arc plotted in. red. The curves obtained for the 50 different pairs of images
arc superimposed in each case to give an idea of t he fluctuations which arc observed with each pair
of images. The results obtained with DIC (2M + 1 = 11 pixels, 1st order and 2M + 1 = 21 pixels,
2nd order) arc also shown for comparison purposes. It is worth r emembering that exact ly the same
sets of images are procc~'lsed here by DIC 1md StrainNct. The main remark is that PIB abo affects the
results given by StrainNet, but aver aging the results provided by 50 different patterns less improves
the results t han for DIC. The effect of Pili seems therefore to be less prononnccd for StrninNet ti.tan
25
for DIC, other phenomena cam;ing t he:-;e fluctuation:-;. lt is worth remembering t hat StrninNet-f and
-h were trained on Speckle data:-;et 2.0, and that the deformed imag<',:-; contained in this dataset were
o btained by interpolation. It would therefore be intcrc~ting to t rain StrninNct-f and -h with irna.gc:-;
o btained with the Speckle generator dc:-;cr ibcd in [39]. Indeed this latter generator docs not per form
any interpolation , so we could sec if the errors observed in Figure 22-f arc due to t he pattern a.lone or
to both t he patter n and the bias due to t he interpolation used when generating the deformed imagei;.
A larger dataset with m10ther random di.i;placemcnt generation i;chernc as in Speckle dataset 2.0 could
a.h;o help smoothing out this bias.
0.5 0.5
0.4 0.4
l o.3
:§,
-.f 0.2
0.1 0.1
(a) D IC, 2M + 1 = 11 pixels, lsL-order subsd sliapc f1rnclious, (b) DIC , 2/1,f + 1 = 11 p ixe ls , lsl-order s ubsel shape fuuclious,
1 pa.Uc n1 with 50 uoiscs 50 patLcrus wit.l..1 a <liJforcuL uoisc
0.5 0.5
0 .4 0.4
0 .1 0.1
-0.1 ~ - - -~ - - - ~ - - - - ~ - - - ~ -OJ ~ - - -~ - - - ~ - - - - ~ - - - ~
0 500 1000 1500 2000 0 500 1000 1500 200D
x [pixel] x [pixell
(c) D10 1 2M + 1 = 21 pixels , 2ud~order sub~t. s hape fuactious~ (d) DIC, 2M + 1 = 21 pixels, 2nd-order suhs<ll s h ape
1 pal.tern wibh 50 noises fuuctions,50 pat.Lc rus wiLh a differcnL noise
StrainNet-f StrainNet-f
0.5 0.5
0.4
] 0.3
C.
;! 0.2
0.1 01
0 - -- - - - - - - - - - - - - - - --<
-0.1 ~ - - - - - - - - - - - ~ - - ~ -0.1----~---~----~---~
0 500 1000 1500 2000 0 500 1000 1500 2000
x [pixel] x [pLxel]
(o) SLraiuNct.-f, 1 p2.t.Lcru w iU.1 50 noises (f) SLrainNcL--f, 50 pallcrns will, a di!Jcccnl uoisc
Figure 22: PaLLern- i.nducc<l bias. Comparison between rcsulLs obtained wiLh DIC {2M +1= 11 pixels, lsL order and
2M + 1 - 21 pixels, 2nd order) a u d SLrn.iuNcL-f.
26
8. A ssessing t h e gen eralizat io n capability
In deep learning, a key point is to validate any CNN with images <liffcrcnt from those used to
train the network in order to ensure good generalization capability. l n the preceding sections, we took
care to use StrninNet on speckle images deformed with the Star displacement while this network was
trained with Speckle dataset 2.0 which docs not contain any image deformed in a similar way. This
is, however, not sufficient because both the reference images in Speckle data.set 2.0 and the reference
Star image were generated with the same speckle generator [39]. Two other examples were therefore
considered in this study. Both involve speckles, which arc different from those obtained with the speckle
gener ator [39] which generates the reference frarnl~s in t he speckle datasets, and both the reference and
deformed Star images. The first example concerns images of synthetic speckles from Sample 14 of the
DIC Challenge 1.0 [4], the second real speckle images taken during a compression test performed on a
wood specimen, as described in [44].
27
(a) C lose-up view of Lhc imag~s o f Sample 14 of t he DIC (b) C loSL'-Up view of oue of Lite irnage.q
challe nge 1.0. A subset u::-icd iu D I G is supcrirnµ osed ( size: o f Speckle <la.t.asct. 2.0 used Lo t raiu
2M + 1 =21 pixels). t.hc uctwork.
01
200
300
-0
-0
.o.1s - - ~ - ~ ' ~
=~
200 400 MO
""' l(JOO 1'00
""' 1600 18f.O
0 = - - - - - - - - -
(e) u .,, DIC , 2M + 1 = 11 pixels (f) u .,, D I C, 11, 2M + 1 = 11 pixels
0.1
100
0,(
200
'
300 t ..:
~oo 3:~-
..
500 ~.. ,
. -- -- - - ---
-0
'
.o.,s - _..__ -~. _..j_ --'---
I
"'~:'1
200, t . .
n.r
·,
JOO ~
·!! -0
.iool "
.'°k 200
"" "" 800 1000 1200 140.1 1600 1800 2000
-0
28
IDO L 100
I
2001
3001-
1
4001-
500c.
200 400 BOD BOO 1200 14CD • f!oo !BOO 2000
Figure 21: Strain map s,,,,, dcducc.~J from the dii;placeracnt fields depic ted in Figure 23. All dirncusioas arc i□ pixels.
stiffness spatially changes, and so docs t he strain d istr ibution if the rings arc perpendicular to the
loading force, which is the ca.<,c here. We consider a typical pair of frames and applied StrninNct-f
to determine the displacement field. Results obtained with DIC with 2M + 1 = 21 pixels (1st-order
imbsct shape functions) arc shown for comparison purposes. A convolution with a Gaussian derivative
filter is then applied in all cases to deduce the vertical strain map E: yy ·
It can be seen that similar maps arc obtained bu t a.gain, a more refined analysis should be performed
to discuss t he possible damping of the actual details in the strain map, as in [44] but this is out of the
scope of the p resent paper. The ma.in point here is that StrainNet is able to extract displacement and
Htrnin fields featuring rather high spatial frequencies from images different from those obtained wit h
Speckle dataset 2.0 since this is here a real speckle pattern. It must be noted that the displacement
is greater than one pixel over most of the front fa.cc of the sp ecimen. This disp lacement was therefore
split into two quantities. The first one is the displacement with a quite rough pixel resolution. The
second one is t he subpixcl displacement. The images arc t herefore processed p iecewise, in such a way
t hat the round value of the displacement is the same throughout each of the clements of the mesh. A
mesh of 11 horizontal bands is considered here. This round value for the displacement ca.n easily be
found by cross-correlation for instance. This point is however not really challenging and we applied
here a rough DIC to get this integer value for the sake of simplicity, only the su bpixel determination
of the displacement being of interest in the p resent study. A consequence is the fact that on close
inspection, slight straight lines c,m be seen in the strain maps, along t he bor der betwL'Cn :-;ome of the
clements. Nothing can be visually detected at the same place in t he displacement maps.
29
y
X
1--....... ---a------====- --+
(a) Specimcu lwfore spray- pai11li11g, aSlcr (44] ( h} Speckled sur fr.u.;c of Lhc specimcu after sp ray paiut.iug,
afLcr [44]
...
100 100
" 200
...,.
..,
300 JOO ....
.O.Oli
_
. _ __ .... "
,00
....,
''"
.........,
IIOO
I
,00
-=--
700 . . . . . . . . . ,,.
100 200 !XC1 400 !I(» 11C10 700 llOO ~ 1()(1:) nlXI 100 ,ao XIO 4 ~ et'Q 700 100 000 1000 1100
,00
"
IOO
,,"
200
.,, ,.,
.,0
"
500
~
.,0
..,,.
-<10,
, r,i ,00
.., 1DCIO .... tOO lOO 303 MJO !iDO tQl 100 800 ((ID 1000 1100
Figure 25: Result,,; o btairlL'<l l,y processing real images. LcfL: v <lis placcm euL ficl<l iu pixels. Hight: •·yy slra.iu map. All
di rncnsions arc iu pixels.
30
9. Comput ing t ime
Finally, we give some information on the computing time needed to perform the cakulatiom;. CNNs
arc wcU suited to be run on Graphical Processing Units (GPUs), which was the case here. This is
even necessary to achieve training in a reasonable amount of time, as mentioned in Section 5.1. Once
StrainNet was trained, we used it on the same GP U. The computing time needed to estimate the
subpixcl displacement field is r eported in Table 7. Two typical examples arc considered here. The first
one corresponds to one pair of Star images, the second to Sample 14 of the DlC Challenge 2.0. The
number of pixel-, of the frames and the computing time are a lso reported in th.is table. The number of
Points Of Interest per second (POI/ s) is also given in each case. Th.is quantity represents the number
of points per second at wh.ich a measurement is provided by the measuring system. It is used in [53] in
order to measure the calculating speed of GPU- based DIC. In our case, using thi'> quantity is a handy
way to normalize the results obtained with different techniques and different frame sizes, and to fairly
compare them.
Table 7: Comµuling Lime and Poirll8 Of lot.crest µer 8L'COud (POl/s) for ExarnµIC8 1 aud 2.
For StrainNet-f and -h, the value of POI/:-; is about ten times lower for the "h" version than for
the "f' one while the resolution before final interpolation to reach fuU resolution is 4 times lower.
Interestingly, R ef [53] reports a POI/s equal to 1.66 • 105 and 1.13 • 105 for a parallel DIC code
implemented on a GPU, which is nearly two orders of magnitude below. These values a.re given for
information only: the GPU used in [53] (NVIDIA GTX 760, 2.:3 TFLOPs) is indeed less powerful than
the GPU used in the present study (NVIDIA TESLA VlO0, 114 TFLOPs). l n add.ition, the reader
should bear in mind that CNNs must be trained with a suitable dataset, which generally represents
heavy calculations. Fmther work should therefore be undertaken to fairly compare Strain.Net and a
GPU-based DIC in tenm of computing t ime. The conchrnion is, however, that Strain.Net provides
pixel-wise d.isplacement maps (and thus strain maps by convolution with s1.Litable dcrivc1tive filters) at
a rate compatible with real-time measurements.
10 . Con clusio n
This paper presented a CNN dedicated to the measurerncnt of d.isplacement and strain fields. As
for DIC, the surface to be investigated was marked with a speckle pat.tern. Various strategics deduced
from the similar problem of optical flow determination were presented, and the best one has been
adapted to give a network named StrainNct. This network was trained with two versions of a specific
speckle dataset. The main result was to demonstrate, tln·ough some relevant examples, the feasibility
of this type of approach for accurate pixclwisc subpixel measurement over fuU d.isplacernent fields. As
for other problems tackled with CNNs in engineering, the main benefit here is the very short computing
t ime to obtain the sought quantities, here the displacement fields. Further stud.ies remain necessary
t o investigate variou:-; problems, which are still open after thi'l preliminary work. For instance, the
dataset used in order to train the network directly in.fluenc<:;, the quality of the final results. The dataset
developed here led to valuable results for :-;pcckles different than those of the images form ing the dataset,
in particular experimental ones. This observation should however be consolidated by considering a
wider panel of speckles and thus by trying to reduce noise and bias in the final displacement maps.
The generator free from any interpolation, wh.ich was used here only for generating the deformed Star
images for a matter of time, could abo he employed for the images of the dataset despite the computing
31
cost. The networks discussed here were obtained by enhancing FlowNetS. A complete redesign should
also be undertaken, in particular in order to simplify the network. This would certainly reduce both
the training and the processing times. Finally, a model able to deal with displacements larger than one
pixel while still giving accurate subpixcl estimation should also be investigated further, for instance by
training the network on a datm,et containing deformed images involving displacements greater than
one pixel.
This work has been sponsored by the French government research prognun "lnvestissements d 'Avenir"
through the lDEX-ISITE initiative 16-lDEX-0001 (CAP 20-25) and the IMobS3 Laboratory of Excel-
lence (ANH-10-LABX-16-01).
R efer ences
[l] M. Sutton, .1..1. Orteu, and H . Schreier. Ima_qe Correlation for Shape, Motion and Deformation
Measurements. Basic Concepts, Theor7J and Applications. Springer, 2009.
[2] B. Pan, K. Qian, H. Xie, and A. Asundi. Two-dimensional digital image correlation for in-
plane displacement and strain measurement: a review. Measurement Science and Tcchnoto_qy,
20(6) :062001, 200!).
[5] H . W . Schreier, M. Sutton, and A. Michael. Systematic errors in digital image correlation due to
undermatched subset shape functions. Expcrimcnt<d Mechanics, 42(3):303-310, 2002.
[6] F. Sur, B. Blaysat, and M. Grediac. On biases in displacement estimation for image registration,
with a focus on photomechanics. Submitted for publication, 2020.
[7] M. Grcdiac, B. Blaysat, and F. Sur. A critical compm·i:son of some metrological parameters charac-
terizing local digital image correlation and grid method. E:i;perimental Mechanics, fi7(6):871-903,
2017.
[8] G.F. Bomarito, .J.D. Hochhalter, T ..J. Ruggles, and A.H. Cannon. Increasing accnrncy and prcci-
:sion of digital image correlation through pattern optimization. Optics and Lasers in Engineerin_q,
91:73 - 85, 2017.
[!J] M. Grcdiac, B. Blaysat, and F. Sur. Extracting displacement and :,train fields from checkerboard
images with the localized spectrum analysis. Experimental Mechanics, 59(2) :207-218, 2019.
[10] M. Grediac, B. Blaysat, and F. Sur. On the optima.I pattern for displacement field measurement:
random speckle a.nd DIC, or checkerboard a.nd LSA? Experimental Mechanics, 60(4):50!)-534,
2020.
[11] A. Ifrizhevsky, I. Sutskever, and G.E. Hinton. lnmgeNet classilica.tion with deep convolutiona.l
neural networks. Advances in Neural Information Processing Systems 25, 1097-1105, 2012.
[12] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hiiusser, C . Hazirbas, V. Golkov, P. van der Smagt, D. Cn.'-
mers, and T. Brox. FlowNet: Learning optical flow with convolutiona.l networks. In IEEE Inter-
national Conference on Computer Vision {!CCV}, pages 27fi8-2766, 2015.
32
[13] Ll. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MlT PreHH, Cambridge, MA, USA ,
2016.
[14] B.K.P. Horn and B.G. Schunck. Determining optical flow. Artificial i ntelligence, 17(1-3) :185-203,
1981.
[15] B.D. LucaH and T. Kana.de. An iterative image regiHt,ration teclmiquc with an application to Hterco
vision. In i nternational .lo·i nt Conference on Artificial i ntelligence, volume 2, page 674~79, 1981.
[16] S . G u=, II. L i, ,J.nd W . Zheng. Unsupervised learning for optic;J.l Bow estimation using pyramid
convolution lHt,m . 1n IEEE International Conference on Multimedia and Expo {ICME}, pages
181-186, 20Hl.
[17] A. Ahmadi and L P atras. Unsupervised convolutional neural networks for motion estimation. ln
IEEE International Conference on i mage Processing {I CIP), pages 162!)-163;\ 2016.
[18] Y . Wang, Y. Yang, Z. Yang, L. Zhao, P. Wang, and W . Xu. Occlusion aware unsupervised learning
of optical flow. In IEEE/CVF Conference on Comp1der Vision and Pattern Recognition, pa.geH
4884-48!):3, 2018.
[19] W.-S. Lai, .J.-B. Huang, and M .-H . Yang. Semi-supervised learning for optical fl.ow with generative
advers;u-ia l networks. 1n Ne·ural Information Processing Systems (NIPS), 2017.
[20] Y. Yang and S. Soatto. Conditional prior networks for optical flow . 1n IEEE Eurovean Conference
on Comvuter Vision {ECCV), pages 282-2!)8, 2018.
[21] .J. Xu, R . Ranft!, and V. Kolt1m. Accu rate optical flow via direct cost volume p roceHsing. 1n
iEEE Conference on Comvuter Vi~ion and Pattern Recognition {CVI'R) , pages 5807-5815, 2017.
[22] .J . Wulff, L. Sevilla-Lara, and M . J . Black. Optical flow in mostly rigid scenes. In IEEE Conference
on Computer Vision and Pattern Recognition {C VPR), pageH 6911-6920, 2017.
[2:3] P. Weinzaepfol, J. Revaud, Z. Ha.rchaoui, and C. Schmid. Deepflow: Large displacement optical
flow with deep matching. In IEEE international Conference on Comvutcr Vision, pageH 1385-
1;3!)2, 2013.
[24] C . Bailer, B. Taetz, and D. Stricker. Flow fields: Dense cmTespondence fields for highly accurate
large displaccrm:nt optical flow estimation. In IEEE International Conference on Comvuter Vision
{I CCV}, pages 4015-4023, 2015.
[25] 0. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical im-
age Hegmentation. 1n Medical i mage Cornp1tting and Comvuter-A ssisted Intervention (MI CCAI),
pages 234-241, 2015.
[26] P. Hu, G . Vvang, and Y. Tan. R ecurrent 1>J>atial pyramid CNN for optical flow estimation. IEEE
Transactions on Multimed-ia, 20(10):2814-2823, 2018.
[27] E. Ilg, N. Mayer, T . Saikia, M . K euper , A. Dosovitskiy, and T. Brox. F lownet 2.0: Evolution of
optical flow estimation with deep networks . 1n iEEE Conference on Comvuter Vi.sion and Pattern
Recognition {CVPR), pagcH 1647-1655, 2017.
[28] A. Ranjan and M . .J. Black. Optical fl.ow c~-;timation uHing a 1>J>atial pyramid network. ln iEEE
Conference on Computer Vision and Pattern Recognition {CVPR), pages 2720-272!), 2017.
[29] T. Hui , X. Tang, and C . C. Loy. Liteflownct: A lightweight convolutiona.l neural network for
optical flow eHt.imation. 1n IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pages 8981-8989, 2018.
33
[30] D. Sun, X . Yang, M. Liu, and .l. Kautz. PWC-Net: CNNs for optical flow using pyramid, warping,
and cost volume. In JEEE/CVF Conference on Computer Vision and Pattern Recognition, pages
8!):14-8948, 2018.
[31] D ..T. Butler, .l. Wulff, G.B. Stanley, and M ..J. Black. A naturalistic open source movie for optical
flow evaluation. In E1iropean Conference on Computer Vision {ECCV} , page 611-625, 2012.
[32] M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In IEEE Conference on
Computer Vision and Pattern R ecognition {CVPR }, pages 3061-3070, 2015.
[33] S. Cai, S. Zhou , C . Xu, and Q. Gao. Dense motion estimation of particle images via a convolutional
neural network. Experiments in Fluid.s, 60(4):7:3, 2019.
[34] S. Cai, .l. Liang, Q. Gao, C . Xu, and R. Wei. Particle image velocimetry based on a deep learning
motion estimator. IEEE Transactions on i nstrumentation and Measurement, 2019.
[35] N. Mayer, E . Ilg, P. Hiiusser, P . Fischer, D. Cremers, A. Dosovitsk.iy, and T . Brox. A large dataset
to train convolutional networks for disparity, optical flow, and sc,cne flow estimation. In IEEE
Conference on Computer Vision and Pattern Recognition {CVPR}, pages 4040-4048, 2016.
[36] S. Baker, D. Scharstein, .l. P. Lewis, S. Roth, M .J . Black, a.nd R. Szeliski. A database and
evaluation methodology for optical flow. International Journal of ComputCT" Vision, 92(1) :1-31,
2010.
[37] A. Geiger , P. Lenz, and R. Urtm:mn. Arc we ready for autonomous dTiving? the KITTI vision
benchmark suite. ln JEEE Conference on Computer Vision and Pattern R e.cognition, pages 3354-
3361, 2012.
[38] N. Mayer, E . lig, P. Hiiusscr, P . Fischer, D. Cremers, A. Do:mvitskiy, and T. Brox. A large dataset
to train convolutiona.1 networks for disparity, optical flow, and scene flow estimation. ln IEEE
Conference on Computer Vision and Pattern Rer.ognition {CVPR ), pages 4040--4048, 2016.
[39] F. Sur, B. Blaysat, and M. GnSdiac. Rendering deformed speckle images with a Boolean model.
Journal of Mathematical Imaging and Vision, 60(5):634-650, 2018.
[41] S. Niklaus. A reimplementation of PWG-Net using PyTorch. https: //gi thub. com/sniklaus/
pytor ch-pwc, 2018.
[43] M . G r6di,.u.: aud F. S u.r. EITcd uf :scu:sor 1101:sc uu Lite rc:suluLiuu a rnl :spatial rc:soluLiou uf LL.e
disp lacement and strain maps obtained with the grid method. Strain, 50(1) :1-27, 2014.
[44] M. Gr6diac, B. Blaysat, and F . Sur. A robust-to-noise deconvolution algorithm to enhance di'i-
placcment and strain maps obtained with local DIC and LSA. E:i;perimental Mechanics, 59(2):219-
243, 2019.
[45] E .M.C .Jones and M.A. l adicola (Eds) . A Good Practices Guide for Digital Image Correlation.
International Digital Image Correlation Society, 2018. DOI: 10.32720/ idics/gpg.edl.
[46] B. Blaysat, .J . Ncggcrs, M. Gr6diac, and F . Sur. Towards criteria character izing the metrological
performance of full-field measurement techniques. Application to the comparison between local
and global versions of DIC . E:r;perirncntal Mechanics, 60(3):393--407, 2020.
34
[47] A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian. Practical Poi:-;:-;onia.n-Gam;sian noi,m mod-
eling and fitting for single-image raw-data. JEEE Transactions on image Pror,cssing, l 7(10):1737-
1754, 2008.
[48] F. Sur and M. Grcdiac. Toward-, deconvolution to enhance the grid method for in-plane :-;train
measurement. Inverse Problems and Imaging, 8(1):259-2!)1, 2014.
[4D] L . Wittevrongcl, P. Lava, S. V . Lomov, and D. Debruyne. A :-;elf adaptive global digital image
correlation algor ithm. E:q1erimental Mechanics, 55(2):361-378, 2015 .
[50] R. B. Lchouc:q, P . L. R.eu, a.nd D. Z. 'l'mner. The effect of the ill-posed problem on quantitative
error as:-;cssment in digital image correlation. Experimental Mechanics, 2017. in prcHs.
[51] S. S. Fayad, D. T. Seidl, and P. L . Ren. Spatial DIC ci-rorn due to pattern-induced bias and grey
level discretization. ExpC'rimental Mechanics, 60(2):24!)-263, 2020.
[52] .J .-.). Ortcu, D. Garcia, L. Robert, and F. Bugarin. A speckle texture image generator. Proceedings
SPIE: Speckle06: speckles, frvm grains to flowers., 6341:63410H 1--6, 2006.
[53] L. Zhang, T . Wang, Z..Jiang, Q . Kcmao, Y. Liu, Z. Liu, L. Tang, a nd S. Dong. High accu-
racy digital image correlation powered. by GP U-based parallel computing. Optics and Lasers in
Engineering, 69:7 - 12, 2015.
35