MC-Calib: A Generic and Robust Calibration Toolbox For Multi-Camera Systems (Preprint)
MC-Calib: A Generic and Robust Calibration Toolbox For Multi-Camera Systems (Preprint)
ABSTRACT
In this paper, we present MC-Calib, a novel and robust toolbox dedicated to the calibration of com-
plex synchronized multi-camera systems using an arbitrary number of fiducial marker-based patterns.
Calibration results are obtained via successive stages of refinement to reliably estimate both the poses
of the calibration boards and cameras in the system. Our method is not constrained by the number
of cameras, their overlapping field-of-view (FoV), or the number of calibration patterns used. More-
over, neither prior information about the camera system nor the positions of the checkerboards are
required. As a result, minimal user interaction is needed to achieve an accurate and robust calibra-
tion which makes this toolbox accessible even with limited computer vision expertise. In this work,
we put a strong emphasis on the versatility and the robustness of our technique. Specifically, the
hierarchical nature of our strategy allows to reliably calibrate complex vision systems even under
the presence of noisy measurements. Additionally, we propose a new strategy for best-suited im-
age selection and initial parameters estimation dedicated to non-overlapping FoV cameras. Finally,
our calibration toolbox is compatible with both, perspective and fisheye cameras. Our solution has
been validated on a large number of real and synthetic sequences including monocular, stereo, mul-
tiple overlapping cameras, non-overlapping cameras, and converging camera systems. Project page:
https://github.com/rameau-fr/MC-Calib
© 2022 Elsevier Ltd. All rights reserved.
1. Introduction tion (SfM) (Schönberger and Frahm, 2016; Moulon et al., 2016;
Wu et al., 2011) and Simultaneous Localization and Mapping
The recent years have seen a rapid increase in the demand (SLAM) (Mur-Artal and Tardós, 2017; Rosinol et al., 2020; Qin
for polydioptric (multi-camera setup) vision systems in multi- et al., 2018), the development of novel calibration toolboxes
ple fields such as autonomous vehicle navigation (Heng et al., for complex vision systems attracted significantly less atten-
2019), human reconstruction (Alexiadis et al., 2016), indoor tion. As a result, most existing software for camera calibration
robotics applications (Urban et al., 2016; Kuo et al., 2020) and focuses on monocular and stereo systems (Bouguet, 2004; Mei
video surveillance (Rameau et al., 2014). These systems are and Rives, 2007; Scaramuzza et al., 2006) but does not consider
particularly desirable since they allow covering a large field-of- the problem of a multi-camera rig.
view (FoV) and computing metric scale 3D information from
a scene. Despite advantages, these systems remain complex The relevance of our work can be understood in the context
to deploy in practice due to their tedious calibration and the where existing calibration frameworks dedicated to polydiop-
absence of efficient and versatile publicly available calibration tric systems are often designed to deal with specific and re-
toolboxes. stricted setups. For instance, a given and limited number of
While significant efforts have been invested towards ro- cameras (Bouguet, 2004), an overlapping FoV (Rehder et al.,
bust and effective software dedicated to Structure from Mo- 2016), prior knowledge on the intrinsic parameters (Lébraly13
et al., 2010), external vision systems (Zhao et al., 2018), mir-
ror (Kumar et al., 2008; Lébraly et al., 2010), limited mo-
∗∗ Correspondingauthor: Tel.: +82-42-350-5465; tion (Liu et al., 2016) or pre-computed reconstruction of the en-
e-mail: [email protected] (In So Kweon) vironment (Lin et al., 2020; Ataer-Cansizoglu et al., 2014) are
2
Fig. 1. Representative calibration result obtained with our calibration pipeline. (left) Camera rig composed of 3 Intel RealSense (Keselman et al., 2017)
cameras where the 6 infrared cameras are being calibrated, (middle) calibration result, (right) image samples from each camera.
g0
c0 c1
b0 b1
t c1
Mb
1
c3
g0 c2 g2
g1
t
Mg
o0 c0 c1
0
c1 Board/Object graphs
Mo
o1 Mc
0 M
g1 0 o1
g0
c2 b0 b2
t o1
Mg o2
1 g1
b1 b3
b2 o0
b4
o1
Fig. 4. Overview of a multi-camera system to be calibrated. In this figure, the inter-board, the inter-camera, inter-object, and inter-group transformations
are depicted in red, blue, purple, and orange respectively. (a) A multi-camera system composed of three cameras {c0 , c1 , c2 } observing three boards
{b0 , b1 , b2 } at a time t. Notice that the two cameras c0 and c1 have an overlapping field of view such that they form a camera group g0 while the camera c2
shares no FoV and forms a group g1 alone. (b) The two graphs used in our pipeline. Note that the objects o0 and o1 as well as the camera groups g0 and g1
are similar to the configuration of the left figure. A third object and camera group has been included to show the expandability of the approach.
N rgin ing
bo g F
nv lap
e
Co ver
ct
ar
e
rid
pe
ey
eo
ti-
sh
yb
on
ul
er
o
Pe
M
Fi
St
H
Scaramuzza et al. (2006) ✓ ✓ seez, 2015), and Matlab. To deal with cameras with large ra-
Caron and Eynard (2011) ✓ ✓ ✓ ✓ dial distortions, multiple ad hoc solutions have also been pro-
Kalibr Rehder et al. (2016) ✓ ✓ ✓ ✓ ✓
Mei and Rives (2007) ✓ ✓ posed (Scaramuzza et al., 2006; Mei and Rives, 2007).
Bouguet (2004) ✓ ✓
Itseez (2015) ✓ ✓ ✓ The extension of single-camera calibration techniques to a
Lin et al. (2020) ✓ ✓ ✓ ✓ ✓ ✓ stereo-vision system with large overlapping field-of-view is
Li et al. (2013) ✓ ✓ ✓ ✓ ✓
Heng et al. (2013) ✓ ✓ ✓ ✓ ✓ ✓ ✓ trivial and has been implemented in most camera calibration
Liu et al. (2016) ✓ ✓ ✓ ✓ ✓ toolboxes (Bouguet, 2004; Mei and Rives, 2007). However,
Ours ✓ ✓ ✓ ✓ ✓ ✓ ✓
these toolboxes do not extend to the calibration of more than
Table 1. Summary of existing camera calibration toolboxes. two cameras. This limitation can be partly explained by the type
of checkerboard utilized. Indeed, to calibrate a multi-camera
system, indexed observations of the 3D points on the board
3. Bibliography should be visualized simultaneously by different cameras. Us-
ing a traditional checkerboard, the entire board needs to be vis-
Camera calibration is the initial stage of most 3D recon-
ible (to estimate the indexing of the points) which is hardly ap-
struction techniques, thus, it has been an important research
plicable when a large number of cameras is utilized and/or if
topic since the very beginning of photogrammetry. One of
the baseline between the cameras is large. To cope with this
the first practical camera calibration pipelines has been pro-
limitation, Li et al. (2013) propose to use a randomly textured
posed by Tsai (1987). This approach requires a single image of
calibration pattern on which unique keypoints can be detected
the calibration pattern assuming its coplanarity with the cam-
to perform the calibration of a vision system even when a lim-
era plane. While this assumption is difficult to ensure in prac-
ited overlapping between fields of views is available.
tice, the method proposed by Zhang (2000) only needs multiple
observations of a checkerboard without any restriction regard- More recently, the development of effective fiducial marker
ing its position. The Zhang’s calibration pipeline is currently systems allows estimating the index of the observed cor-
5
ners without the need to visualize the entire board. Among bust non-linear optimizations – allowing to deal with outliers.
well-known Augmented Reality (AR) markers, we can men- On the application side, the calibration of non-overlapping
tion Charuco (Itseez, 2015) and AprilTag (Olson, 2011; Wang cameras is particularly critical for automotive vision systems to
and Olson, 2016) which have been widely used for multi- provide an all-around view of the scene. For the sake of prac-
camera calibration systems. Such specific calibration markers ticability, numerous strategies have been proposed to simplify
have drastically eased the calibration of complex vision sys- the calibration of such systems without the need for calibration
tems (Xing et al., 2017; Rehder et al., 2016; Strauß et al., 2014). boards. A seminal work has been proposed in (Heng et al.,
The previously mentioned calibration approaches assume 2013), where the displacement of each camera is estimated (us-
that at least a partially overlapping field of view between the ing visual odometry) to calibrate the system via a hand-eye cal-
cameras is available (Rehder et al., 2016) or that the checker- ibration technique. While this approach does not require set-
boards can be observed together such that they can be merged ting up multiple calibration boards, additional information to
in a single 3D calibration object (Strauß et al., 2014). However, estimate the scale of the displacement (i.e. wheel encoder or
they do not allow generic non-overlapping and converging sys- stereo vision systems) is needed. Moreover, the accuracy ob-
tems calibration without specific a-priori. To conduct the cal- tained with such a strategy is scene-dependent and is relatively
ibration of non-overlapping cameras, many approaches follow- complex to deploy due to a large number of stages involved. To
ing the hand-eye estimation strategy have been proposed (Tsai simplify this calibration process, Ataer-Cansizoglu et al. (2014)
et al., 1989). In the literature, this type of calibration is often take advantage of the prior reconstruction of an arbitrary cali-
achieved under certain assumptions (Lébraly13 et al., 2010; Im bration scene (from a single RGB-D camera) to calibrate a set
et al., 2016): known intrinsic parameters; one board per cam- of non-overlapping cameras. This approach - simple and ef-
era is used (and this board remains the same for the entire se- fective - provides 3D metric scale calibration estimation, but, it
quence); the motion used for calibration should not be degener- requires a prior reconstruction of the scene and cannot guaran-
ate (translation in each direction). tee repeatable calibration results.
More complex calibration strategies, involving additional The previously described techniques assume pre-calibrated
hardware, have also been developed for the calibration of non- intrinsic camera parameters. To ease the automatic calibration
overlapping systems. For instance, in (Kumar et al., 2008), a process further, Lin et al. (2020) propose to use a radial pro-
mirror is utilized to virtually obtain a shared view of the calibra- jection model to estimate the poses of the cameras in a pre-
tion board. A more flexible technique has also been proposed reconstructed 3D scene. The advantage of this strategy is that
by Zhao et al. (Zhao et al., 2018) where an external camera is no prior intrinsic parameter is needed to estimate the extrinsic
used to compute the displacement of the multi-camera rig to and intrinsic parameters of the cameras. While this approach is
be calibrated. Despite their complexity and lack of scalability, practical and versatile, a scene reconstruction at a metric scale
these approaches have the advantage to be more robust against remains a complex task that can be affected by drifts and arti-
degenerate motions than their hand-eye-based counterparts. facts. Moreover, the approach developed by Lin et al. (2020)
However, hand-eye-based approaches tend to be more suffers from structural limitations related to the pose estima-
generic and do not require any specific and cumbersome se- tion via the radial projection model. For instance, the vision
tups. A good illustration of the versatility of hand-eye-based system cannot be calibrated unless at least two cameras have a
approaches is the toolbox “Caliber” (Liu et al., 2016) which non-parallel principal axis. Admittedly, the checkerboard-free
shares similarities with our technique. This toolbox has also strategies are very desirable for practical tasks but remain com-
been designed to be theoretically compatible with any configu- plex to deploy in practice due to their lack of accuracy, use case
ration of cameras. However, in practice, this technique suffers limitation (i.e. cannot be utilized for converging field-of-view
many technical shortcomings. For instance, it is not compatible camera calibration), and repeatability issues.
with fisheye or hybrid (combination of perspective and fisheye While this literature covers the problem of multiple camera
cameras) vision systems. Moreover, no fiducial markers are systems used in robotics, another type of multi-camera sys-
used which implies that many human manipulations of the data tem that we call converging camera system is often needed for
are required. On the contrary, our proposed toolbox can deal single-shot 3D scanner (Pesce et al., 2015) (see Fig. 3). This
with various camera distortion models and is fully automatic. kind of camera system can hardly be calibrated using traditional
Furthermore, neither the motion of the camera rig nor the num- planar patterns and often requires 3D calibration patterns. Only
ber of boards is limited by our strategy. a few approaches address this particular problem. A represen-
In this work, we also rely on a hand-eye calibration strat- tative work is (Forbes et al., 2002) where the geometry of a 3D
egy to estimate the pose between the non-overlapping camera cube is refined using multiple observations to estimate the rel-
groups. However, we have extended it with multiple strategies ative poses between the cameras composing the system. The
to improve the stability and accuracy of the proposed approach. major limitation of this work is the need for prior information
Another notable difference with (Liu et al., 2016) is the overall regarding the position of the 3D points on the calibration ob-
structure of our method. In our case, we design a hierarchical ject. In our work, this object structure is automatically esti-
strategy where problems are solved gradually with a systematic mated without any prior 3D information provided by the user.
non-linear refinement leading to a good convergence. Finally, Worth noting that such vision system can also be calibrated with
we focus our method to be particularly robust by including a hand-eye calibration techniques. However, such strategies re-
RANSAC process (prevents wrong markers’ detection) and ro- quire each camera to observe a unique board during the calibra-
6
b3
tion process which drastically restricts the variety of possible M b1
motions, leading to biased calibration results. On the contrary,
our technique can take advantage of the entire 3D object, allow- b1
b3
ing a wider range of motions. Therefore, our system simplifies
the calibration process and avoids degenerated configurations.
Our solution is a fully functional calibration toolbox called
“MC-Calib”. To underline the relevance of this software, we
provide a quick overview of the existing multi-camera calibra- t c0
t
Mb
c0 Mb
tion toolboxes with their inherent limitations in Table 3. 1
3
4. Methodology c0
In this section, we describe the technical details of the pro- Fig. 5. Board pose estimation at a time t. Here, a camera c0 can see two
posed multi-camera calibration pipeline. Before digging into boards b1 and b3 in a single frame and estimate their inter-board pose.
the detailed description of each stage composing our strategy,
we propose an overview of the entire calibration framework.
First of all, a Charuco board detection (Sec. 4.1) is performed 40%. Note that this threshold can be adjusted by the user in
for all the images acquired by the cameras and the detected 2D case of small overlapping FoV camera systems calibration.
location are stored. These 2D observations and their respec-
tive 3D locations (expressed in their board referential) are used 4.2. Intrinsic parameters initialization
to initialize the intrinsic parameters (Sec. 4.2) for every cam- For each ith camera ci , we collect the 3D↔2D correspon-
era. After estimating the internal parameters of the cameras, dence pairs from all the images containing checkerboards.
the pose of each camera with respect to the observed boards is These matches are used to initialize the intrinsic parameters
estimated (Sec. 4.3) via a perspective-n-point technique. Kci and distortion coefficient kci . For perspective cameras,
The inter-board transformations between all boards visible we adopt the well-known (Zhang, 2000) calibration technique
in a single image are computed (Sec. 4.4) to merge the boards (Brown distortion model) while the calibration of fisheye cam-
sharing a co-visibility into 3D objects (a 3D object is a set of eras is accomplished with the implementation available in
3D boards). After the refinement of the 3D object structures, OpenCV (Itseez, 2015) (Kannalla distortion model). This ini-
we group the cameras (Sec. 4.5) which have seen similar 3D tialization is relatively slow if a large number of images are
objects synchronously via a graph-based strategy. At this stage, used, therefore, we subsample the number of images by ran-
if all the cameras in the system share a globally or non-globally domly selecting a subset of 50 boards observations per camera.
overlapping field of view, the calibration is finalized via a final If less than 50 board observations are available, all images are
bundle adjustment (Sec. 4.7). If multiple camera groups re- utilized. Notice that the intrinsic parameters are refined using
main, a non-overlapping camera group calibration (Sec. 4.6) is all the images in the next stage.
performed between each pair of groups. This estimation is used For certain complex scenarios involving a large number of
to merge all the camera groups and the 3D objects before being non-overlapping cameras, it is sometimes tedious to acquire
refined to obtain the entire parameters of the camera system. enough diversified viewpoints to reach an accurate intrinsic
calibration of the individual cameras. Therefore, our toolbox
4.1. Checkerboard detection and keypoints extraction can also use pre-computed intrinsic parameters provided by the
The initial stage of our calibration process is the detection user. As another functionality, it is compatible with hybrid sys-
of the checkerboards in the images and the precise localiza- tems mixing fisheye and perspective cameras under the condi-
tion of their 2D corners (see Fig. 1). To deal with any com- tion the user specifies the type of each camera in the system.
plex setups, we propose to utilize fiducial checkerboard mark-
ers. Specifically, we take advantage of the CharucoBoard (It- 4.3. Board pose estimation and intrinsic refinement
seez, 2015) mixing a standard planar checkerboard pattern with Given the initial intrinsic parameters — computed from the
ArUco fiducial markers (Garrido-Jurado et al., 2014). previous stage 4.2, we estimate the relative pose of all the cam-
During this stage, all the images from all the cameras are pro- eras for each observed board. An illustration of this process
cessed to store their 2D keypoints location and corresponding for an arbitrary frame t is visible in Fig. 5. Notice that a sin-
3D points in their board’s referential. This detection is critical gle image can contain multiple boards, for instance the camera
since the entire calibration of the system strongly depends on c0 at frame t sees two boards b1 and b3 simultaneously, thus
the robustness and accuracy of these 2D keypoints. Thus, to both transformation t Mcb01 and t Mcb03 are computed and stored.
improve the accuracy of the calibration, we apply an effective The estimation of these poses is achieved with a PnP algo-
corner refinement process Ha et al. (2017). To avoid degener- rithm (Gao et al., 2003) wrapped in a RANSAC robust estima-
ated configuration, we additionally apply a collinearity check. tion process (Fischler and Bolles, 1981). This RANSAC stage
Moreover, to improve the overall robustness, the boards with is intended to remove very large outliers only (e.g. error su-
less than a certain percentage of visible keypoints are discarded perior to 10 pixels reprojection error), improving the overall
from further consideration – we typically set this threshold to robustness of our pipeline. The inlier points are then used to
7
refine the pose estimation of the camera w.r.t. to the board via t c2
o0
Mo
a Levenberg-Marquardt non-linear refinement. Considering a 0
t0
Mg
4.6.1. Hand-eye calibration for non-overlapping cameras t1 1
o1
t1
t0 Mg0 = t1 Mgo00 t0 Mog00 (9)
t1
t0 Mg1 = t1
Mgo10 t0 Mog01 . (10) Fig. 7. Representation of two camera groups (g0 in blue and g1 in yellow)
with non-overlapping field-of-view observing one object each (for the sake
of clarity and simplicity, each camera group contains a single camera and
As a result, the relationship linking camera groups can be writ- each group is composed of a single board in this example).
ten tt10 Mg1 Mgg10 = Mgg10 tt10 Mg0 , without loss of generality, this rela-
tion can be generalized for all the frames:
tj t
contains outliers. Therefore, we propose an effective manner to
g1
∀ti ∈ [1 · · · T ], ∀t j ∈ [1 · · · T ], t i M g1 M g0 = Mgg10 tij Mg0 . (11) select the best views to calibrate each pair of non-overlapping
camera groups in the system in a fast and robust manner.
This problem takes the form of a system AX = XB which Our best view selection is designed to maximize the diver-
can be resolved using a hand-eye calibration technique. In this sities of views used for the calibration to avoid degenerated
work, we employ the approach proposed by Tsai et al. (1989) configurations and to improve the robustness against outliers.
which consists in a hierarchical resolution of the problem: 1) For this purpose, for each frame, we concatenate the transla-
rotation estimation first and 2) translation computation. To per- tional component of both camera groups to cluster the frames
form the rotation estimation, the authors propose to utilize a as depicted in Fig. 8. Therefore, the most similar poses are as-
variation of the Rodrigues angle-axis representation such that signed to the same cluster. In this situation, the rotational com-
the rotation vector rgg10 can be linearly resolved as follows: ponent does not need to be considered to ensure the diversity of
ht j tj
i g ′ tj tj the poses across clusters since a rotation of one of the groups
ti rg1 + ti rg0 × rg0 = ti rg0 − ti rg1 ,
1
(12)
will inevitably lead to a translation of the second group. In
where the operator [·] × stands for the transformation of a 3D our framework, a k-means clustering (Lloyd, 1982) technique
vector to a skew-symmetric matrix and rgg10 = (2rgg10 ′ )/(1+|rgg10 ′ |2 ). is utilized with the number of clusters fixed to 20.
After the rotation being solved, the translation tgg10 can be com- After this initial clustering, we initiate our bootstrapping
puted via the following set of linear equations: strategy which consists in the successive estimation of the inter-
t j group pose via the selection of multiple mini-batches of frames.
g1 t j tj
g
ti Rg1 − I tg0 = Rg0 ti tg0 − ti Rg1 ,
1
(13) Specifically, for each iteration of our bootstrapping algorithm,
6 clusters are randomly sampled (from the initial 20 clusters),
where I is the identity matrix. Note that at least two pairs of among each of these 6 clusters one pose is chosen randomly.
motion are needed to perform this hand-eye calibration. For The resulting set of 6 pair of poses is used to perform the hand-
further details, we invite the reader to refer to Tsai et al. (1989). eye calibration (as described in Sec. 4.6) such that the inter-
Note that, this initial calibration is achieved using only the group pose can be computed. The validity of the set is then
two objects that have been seen the largest number of times evaluated by estimating the consistency of the rotational solu-
simultaneously by both camera groups. tion provided by the hand-eye calibration algorithm. If the max-
imum rotational error in the set is superior to 5◦ , the solution is
4.6.2. Best view selection and bootstrapped initialization rejected and, if the set is consistent, this result is stored. This
The hand-eye calibration strategy can be applied for all possi- mini-batch hand-eye estimation is repeated 200 times leading to
ble combinations of frames but the complexity of the problem is a set of plausible inter-group poses. Finally, the estimation of
growing quadratically with the number of frames which is prob- the inter-group pose is obtained by computing the median value
lematic (computational time-wise) for large video sequences. of each translation and rotation (Rodrigues representation) el-
Moreover, successive frames exhibit similar poses which does ement which passed the rotational test. This initial solution is,
not contribute much to the final solution. Furthermore, using thereafter, refined in a non-linear manner. Our bootstrapped
all the frames at once can be problematic if the set of frames initialization procedure has proved to be very effective against
9
Table 2. Average intrinsic and extrinsic errors over all the cameras on syn-
thetic data generated by image rendering.
o0
Seq
t8 Seq01 Seq02 Seq03 Seq04 Seq05
Error
t7 focal (px) 27.601 2.229 27.611 27.648 0.124
t6 pp (px) 0.396 2.060 0.514 0.464 0.718
t0
t1
t5 Rotation (◦ ) 0.002 0.056 0.002 0.005 0.046
t4
t2
t3 Translation (m) 0.000 0.006 0.000 0.000 0.002
Reprojection (px) 0.022 0.023 0.014 0.017 0.090
Fig. 8. Example of the resulting clustering (3 clusters: blue, yellow, and red)
for a single camera orbiting around a calibration board. We can see that
nearby frames belong to the same cluster. In practice, a concatenation of 5.1. Experiments on rendered images
two non-overlapping cameras or camera groups is used for this clustering. To provide a quantitative estimation regarding the ac-
curacy and versatility of the proposed technique, we use
Blender (Community, 2018) to generate a synthetic dataset
outliers and noisy measurements.
composed of 5 different calibration scenarios (see Fig. 9 ): 1)
Stereo system (2 cameras); 2) Non-globally overlapping vision
4.7. Merging camera groups and Bundle adjustment system (3 cameras); 3) Non-overlapping system (4 cameras);
4) Unbalanced non-overlapping vision system (3 overlapping
After the initial poses between all the non-overlapping pair cameras and one non-overlapping camera); 5) Converging vi-
of camera groups are estimated, the camera groups and the ob- sion system (4 cameras). The dataset, the codes, and Blender
jects are merged using a similar graph strategy described in Sec- 3D models used for its creation are available to the public via
tion 4.4. Finally, the entire system (relative position between all the following link1 . For each sequence, 100 synchronized and
boards, camera poses, and the intrinsic parameters) can be re- distortionless frames per camera have been captured at a reso-
fined to minimize the reprojection error in all frames: lution of 1824 × 1376 px. A set of representative synthetically
generated images is available in Fig. 9. For this experiment, the
field of view of the synthetic cameras is fixed at 65◦ of hori-
Mb X
Nc X T X
S
X
t s zontal FoV. The mean calibration error against the ground-truth
bj bj
min
b b
ci pb j (14) (over all the cameras in the rigs) are available in Table 2.
rb ,tb ,rcre
i
,t i ,Kci ,kci i =1 j =1 t =1 s =1
f cre f
re f re f
c b
H Seq01: Stereo. While the proposed technique is designed to
− P(Mcrei f t Mcbire f Mbrej f Pbs j , Kci , kci )
. calibrate complex vision systems, it can be employed for the
calibration of rather simple and common vision rig such as
Our hierarchical calibration strategy provides initialization monocular or stereo vision systems. Such calibration can be
that is in the vicinity of convergence leading to very stable and achieved with our toolbox using a single calibration board. To
accurate results. challenge our technique, we utilize 3 individual boards. In this
scenario, our toolbox outputs highly accurate results with a sub-
millimetric translational error and a very low reprojection error.
5. Experiments
Seq02: Non-globally overlapping camera system. For this ex-
periment, we simulate an omnidirectional vision system that
This section contains a large number of assessments on real shares similarities with (Schroers et al., 2018). This system is
and synthetic data. Various use cases are proposed to reflect a composed of 5 cameras arranged in a semi-circle (see Fig. 9(b)).
broad spectrum of scenarios commonly faced in practice. Mul- A single calibration board is used for the entire calibration of
tiple metrics are used to evaluate the quality of the retrieved the vision rig, since each camera shares a partial FoV with its
parameters. The rotational error is calculated as follows: neighbors, the relative poses of the calibration can be achieved
by chaining all the transformations. This process is automati-
1 cally achieved in our calibration pipeline. The difficulty of such
ϵR = acos( (tr(RTest RGT ) − 1)), (15)
2 calibration scenario is the possibility to accumulate a drift be-
tween the reference camera and the other cameras in the system.
where Rest is the estimated rotation matrix and RGT the ground Despite this challenging scenario, our calibration technique is
truth rotation. Regarding the translational and the internal pa- able to reliably estimate the camera poses in the system. We can
rameters’ (principal point pp and focal length) errors, it is the notice a higher mean translational error in this sequence which
euclidean distance between the ground truth and the estimated mostly comes from one camera located on the extreme left of
values. The reported reprojection error is the mean of the Eu- the system with partial visibility of the boards.
clidean distance between the detected and reprojected points for
all the corners observed by all cameras. Note that for the sake
of conciseness we do not provide comparative results for the 1 link to rendered images dataset: https://bosch.frameau.xyz/
aspect ratio λ while the skew factor is assumed to be zero. index.php/s/pLc2T9bApbeLmSz
10
0.6 7
0.1 0.50
2 6
0.4 0.0 0.25 5
1 0.1
0.2 0.00 4
0 Z 0.2Z
Z 0.25Z 3 Z
0.0 0.3
1 0.50 2
0.4
0.2 0.75 1
2 0.5
0
0.4 0.6 1.00
3 1
0.2 0.0 0.2 0.4 0.6 0.8 3 2 1 0 1 2 3 0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.75 0.50 0.250.00 0.25 0.50 0.75 4 3 2 1 0 1 2 3 4
X X X X X
Fig. 9. Calibration results from synthetically rendered images: (a) Seq01: Stereo, (b) Seq02: Non-globally overlapping, (c) Seq03: Non-overlapping, (d)
Seq04: Unbalanced, (e) Seq05: Converging vision system. (f-j) Sample of rendered images from each sequence.
Seq03: Non-overlapping camera system. Our calibration tool- 5.2. Real vision system experiments
box can be used to calibrate any multi-camera vision system. To demonstrate the effectiveness of the calibration approach
In particular, we propose a robust strategy for the calibration of under realistic scenarios, we propose to calibrate diverse vi-
non-overlapping camera systems (see Fig. 9(c)). In this experi- sion systems ranging from stereo to multiple groups of non-
ment, we proposed one of the most common multi-camera sys- overlapping cameras. Moreover, the proposed scenarios involve
tems used to obtain an all-around view (as depicted in Fig. 9(c)). a different number of calibration boards and 3D objects. All the
This multi-camera vision system is surrounded by a set of eight sequences used in this paper can be downloaded freely2 .
calibration boards placed in a circular manner such that the The stereo vision system calibrated in this experiment is a
boards can be visible by multiple cameras simultaneously. De- ZED camera capturing synchronized pair of 1280 × 720px res-
spite the limited amplitude of motions used for this calibra- olution images. For the multi-camera system configurations,
tion, the obtained results are very close to the ground truth with we use up to 4 synchronized Intel Realsense D415i RGB-D
nearly no translational or rotational error (see Table 2). cameras. Specifically, we utilize 2 infrared sensors of each
camera (spatial resolution of 1280 × 720px). While the RGB
Seq04: Unbalanced non-overlapping vision system. To eval- sensor can be used together for the calibration, the large mo-
uate the applicability of our calibration strategy on an unbal- tion blur and the rolling shutter of this color camera disqual-
anced non-overlapping vision system, we simulate 3 overlap- ified it for this experiment. Due to hardware limitations, the
ping cameras on one side and a single camera pointing in the miss-synchronization of RGB-D camera can reach up to 5ms.
opposite direction (see Fig. 9(d)). This scenario is complex This delay does not seem to cause a significant problem dur-
since a wrongly initialized calibration may lead to a wrong con- ing the calibration process since a low reprojection error has
vergence of the calibration due to the unbalanced reprojection been reported. Finally, the hybrid stereo-vision system is
error on both sides of the system. Our technique is both robust composed of two PointGrey Flea3 cameras FL3-U3-13E4C-C
and effective even for such a specific scenario. This satisfy- (1280 × 512px spatial resolution). The left one is equipped with
ing performance can be explained by the design of our method. a lens (LM3NCM) providing a 90◦ horizontal field of view with
Specifically, our method is built to solve the calibration prob- low radial distortion. On the right camera, we install a fisheye
lem in a progressive step-by-step manner where each step is lens (Fujinon FE185C046HA-1) yielding 182◦ horizontal field
designed to provide an accurate initialization to the next one. of view and very large radial distortion. These two cameras
are spaced by a baseline of 20cm. All the experiments pre-
Seq05: Converging vision system. Most existing calibration
sented in this paper have been conducted on a desktop computer
toolboxes are incompatible with converging vision systems.
equipped with 32GB of RAM and a CPU i7-6800K.
Our toolbox can deal with such scenarios by estimating the
structure of the 3D calibration object. While our method can Stereo vision system. To validate our method, we calibrate a
function with any calibration object composed of multiple pla- stereo camera to allow comparison against a widely used strat-
nar boards, in this experiment we propose to simulate the most egy proposed in (Bouguet, 2004). The stereo ZED camera is
common 3D calibration object: a cube composed of 6 planar calibrated with a single board in both cases. While we utilize
boards. A group of 4 converging cameras are orbiting randomly
around the cube such that every faces of the cube can be ob-
served. The results presented in Table 2 demonstrates sub-pixel 2 link to real images dataset: https://bosch.frameau.xyz/index.
(a) (b)
(c)
Fig. 10. Experimental setup for our non-overlapping vision system. (a)
Camera rig, (b) obtained calibration result, (c) boards used for calibration.
Table 3. Stereo calibration parameters comparison between our approach and Bouguet (2004). fL , fR , ppL and ppR are the focal lengths and principal
points of the left and right cameras respectively.
Parameters
fL (px) fR (px) ppL (px) ppR (px) XYZ Rotation Euler (◦ ) XYZ Translation (cm)
Methods
Bouguet (2004) 703.81 707.12 (631.27, 372.15) (650.37, 386.23) (−0.05, 0.61, −0.47) (−11.98, 0.02, −0.07)
Ours 701.52 704.33 (629.06, 375.08) (651.27, 387.25) (−0.0214, 0.3731, −0.5487) (−12.0231, 0.0250, −0.1151)
Difference 2.29 2.79 3.67 1.35 0.25 0.05
(a)
4
Z (Cm)
2
0
-2 (a) (b)
0 10 20 30
X (Cm)
(b)
0.15
Translational difference
Rotational difference tion is visible in Fig. 5.2. We can notice that this reconstruction
is consistent, suggesting an accurate calibration of our system.
0.2
0.1
1.5
Z (m)
1
0.5
0
(a)
-1 0 1
X (m)
(b)
Fig. 15. Hybrid stereo-vision system calibration. (a) Picture of the system,
(b, first row) perspective and fisheye images acquired by the cameras, (b,
second row) rectified images with multiple epipolar lines displayed in color. Fig. 16. Experimental setup and calibration results for a converging multi-
camera system. (top-left) Calibration result (note that only the left camera
of each RGB-D camera is displayed for clarity), (top-right) 3D calibration
very high noise, the rotation and translation error never exceed cube and its reconstruction, (bottom-left) experimental setup with a 3D ob-
ject placed in the center of the camera, (bottom-right) images with detected
2◦ and 0.08 meters regardless of the cameras’ configuration.
corners from the camera 1 and 3 respectively.
Worth noting that owing to the filtering and the corner refine-
ment processes, in practice, it is very unlikely to reach inac-
curate corner localization. Regarding the intrinsic parameters,
the deviation from the ground truth remains reasonable with a zero for all three studied scenarios. In practice, such an extreme
maximum of 70px for the principal point and 80px for the fo- presence of outliers is highly unlikely. Nevertheless, without
cal length. Noticeably, the light-field configuration has higher our RANSAC filtering and robust optimization scheme, even a
intrinsic parameters errors which can be related to the limited very few outliers lead to a complete failure of the calibration.
variety of viewpoints in the sequence. Interestingly, the non-
overlapping scenario, which is assumed to be more complex to 5.5. Robustness evaluation of the hand-eye calibration
resolve, seems to reach higher accuracy. This can be attributed In this section, we highlight the relevance of our bootstrapped
to the robustness of the proposed bootstrapping strategy. hand-eye calibration technique (covered in Sec. 4.6.2) for non-
overlapping vision systems under the presence of wrongly esti-
5.4. Robustness against outliers mated poses (outliers). Not only our technique allows a fast and
This section confirms the stability of our approach against constant time hand-eye calibration, but it is also significantly
outliers. Following the same evaluation environments as in more robust to outliers thanks to our minibatch estimation and
Sec. 5.3, we evaluate the robustness of our approach under the our rotation consistency testing stage. To evaluate the level of
presence of hard outliers which may occasionally occur dur- robustness offered by our framework, we synthetically gener-
ing the detection of fiducial markers. In contrast to Sec. 5.3, ate 100 pairs of poses from two non-overlapping cameras. In
for this assessment, no noise is added to the inlier points. We
compute the success rate (mean reprojection error inferior to
5px) over 100 trials for a different level of outlier contamina-
tion ranging from 0 to 70%. An outlier is a point with a devia-
tion of at least more than 10px from its real pixel position (the
outliers are generated randomly in the image with a uniform
distribution). The same number of outliers is enforced for each
image. The computed success rate is available in Fig. 19(a)
where, in most scenarios, our solution is robust up to 60%
of outlier contamination, leaving only 14 points per board to
perform the calibration of the system. This resilience can be
mostly attributed to the RANSAC algorithm used to reject in- (a) (b) (c)
correct points. Since the boards contain a relatively low num-
ber of points, 1000 RANSAC iterations are usually enough to Fig. 17. Reconstructed object using our calibration parameters. (a) Picture
discover a set of uncorrupted points to perform the system cal- of the object, (b) aligned 3D points cloud from the 4 RGB-D cameras dis-
ibration. At a level of 70% of outliers, the success rate falls to played in red, green, blue, and yellow respectively, (c) meshed result.
14
100 100
2.0 Conventional HE
30 100 Ours
0.04
1.5 80
0.03 20
Sucess rate
1.0
0.02
40 40 40
10
0.01 0.5 Stereo
20 Non-overlapping
light-field
20 20
0.00 0.0 0 0
0 2 4 6 8 0 2 4 6 8
standard deviation of the 2D corner noise (pixel) standard deviation of the 2D corner noise (pixel) 0
0% 10% 20% 30% 40% 50% 60% 70%
0
0% 2% 4% 6% 8% 10% 12% 14%
percentage of outlier contamination percentage of outlier contamination
0.12 50
2.0
0.10 80 Fig. 19. Robustness against outliers. (a) Success rate of the entire calibra-
0.08
1.5 tion pipeline versus various outlier percentage contamination (for 3 sce-
60
30
0.06
narios depicted in red, green and blue) (b) success rate of our hand-eye
1.0
40 20 calibration technique for various levels of outliers contamination (the red
0.04
0.5 and green bars depict standard (Tsai et al., 1989) and our hand-eye cali-
20 10
0.02 bration procedure respectively).
0.00 0.0 0 0
0 2 4 6 8 0 2 4 6 8
standard deviation of the 2D corner noise (pixel) standard deviation of the 2D corner noise (pixel)
250
translational error (m)
2.0 25 60
20 800
0.04 1.5 200
15 40 600
1.0 150
0.02 10 400
20
0.5 5 100
200
0.00 0.0 0 0
0 2 4 6 8 0 2 4 6 8 0 50
standard deviation of the 2D corner noise (pixel) standard deviation of the 2D corner noise (pixel) 1 2 3 4 5 6 7 8 1 2 3 4 5 6
Number of cameras Number of boards
(a) (b)
Fig. 18. Robustness against various quantity of noise with 100 iterations per
level of noise, the tick lines represent the mean error value and the trans-
parent envelopes depict the standard deviation. (first column) Translation Fig. 20. (a) Computational time vs number of cameras (8 cameras, 1200
and rotation error assessment for the three sequences: stereo, light-field, images per camera and 4 calibration boards). (b) Computational time
and non-overlapping, (second column) focal length and principal point er- vs number of boards (2 cameras, 550 images, 6 calibration boards). The
ror for the three sequences: stereo, light-field, and non-overlapping. transparent red envelop depicts the standard deviation.
Fig. 19(b), we provide the success rate for 500 trials at a dif-
ferent level of outlier corruption. In this experiment, we do not the elapsed time. To test with a various number of cameras, we
include any non-linear refinement of the pose. We consider the examined a non-overlapping system composed of 4 stereo cam-
pose estimation successful if the accuracy in rotation and trans- eras as described in Sec. 5.2. In this experiment, 1200 images
lation are lower than 5◦ and 2cm respectively. per camera are captured, raising the total number of images to
To better understand the importance of the proposed tech- be processed to 9600, while 4 boards are utilized. In this con-
nique, we compare our solution with a standard hand-eye cali- text, it can take up to 15 minutes to calibrate the entire system.
bration solution that directly tries to resolve the problem using However, the computational time decreases significantly when
all the available poses (Tsai et al., 1989). These conventional reducing the number of cameras utilized. To evaluate the com-
hand-eye calibration techniques have not been designed to deal putational time versus the number of employed boards, we use a
with outliers, thus, the presence of a single outlier leads to very stereo sequence of 550 images of a calibration cube composed
large errors as underlined in Fig. 19(b). On the contrary, our of 6 boards. We decrease the number of boards and measure
technique is specifically designed to deal with outliers and al- the computational time for each scenario ranging from 1 to 6
lows estimating the inter-camera pose even if multiple outliers boards. Once again, we can notice (see Fig. 20(b)) that the cal-
are contaminating the set (with 6% of outliers our algorithm can ibration time decreases with fewer boards. The reason is that a
return a successful pose estimation 60% of the time). larger number of boards to be detected also leads to more com-
putation for their detection.
5.6. Computational time
To better understand which part of the algorithm is time-
While our strategy has not been deliberately designed to be consuming, we propose to analyze the mean computational
computationally effective – since the calibration stage is usually time per stage of the algorithm (see Table 4). This evaluation is
conducted offline, our C++ implementation allows a relatively achieved with 4 non-overlapping stereo cameras (1200 frames
quick calibration of any camera system. To give an overview per camera and 4 calibration boards). As can be seen, the most
of the method’s speed, we have performed tests with various time-consuming part is the detection of the Charuco boards fol-
number of cameras (see Fig. 20(a)) and boards (see Fig. 20(b)). lowed by the initialization of the intrinsic parameters. The rest
In these experiments, the calibration is repeated 25 times for of the proposed calibration process is very light and takes under
each instance to analyze the mean and standard deviation of one minute to calibrate complex multi-camera systems.
15
and motion capturing. IEEE Transactions on Circuits and Systems for Video
Table 4. Computational time per stage for 4 non-overlapping stereo sys-
Technology (TCSVT) 27, 798–813.
tems (8 cameras) with 1200 frames per camera and 4 boards.
Ataer-Cansizoglu, E., Taguchi, Y., Ramalingam, S., Miki, Y., 2014. Calibration
of non-overlapping cameras using an external slam system, in: International
Time
(s) (%) Conference on 3D Vision.
Stage Barreto, J.P., 2006. A unifying geometric representation for central projection
systems. Computer Vision and Image Understanding (CVIU) .
Boards detection 1049.4 90.8 Bouguet, J.Y., 2004. Camera calibration toolbox for matlab. http://www. vision.
Intrinsic estimation 92.0 7.9 caltech. edu/bouguetj/calib doc/index. html .
Objects merging 1.4 0.1 Caron, G., Eynard, D., 2011. Multiple camera types simultaneous stereo cali-
Camera merging 2.2 0.19 bration, in: ICRA.
Community, B.O., 2018. Blender - a 3D modelling and rendering package.
Non-overlap. calib. 6.6 0.6 Blender Foundation. Stichting Blender Foundation, Amsterdam.
Final Optimization 4.2 0.3 Dijkstra, E.W., 1959. A note on two problems in connexion with graphs. Nu-
Total 1155.9 100 merische mathematik 1, 269–271.
Duane, C.B., 1971. Close-range camera calibration. Photogramm. Eng 37,
855–866.
Fischler, M.A., Bolles, R.C., 1981. Random sample consensus: a paradigm for
6. Conclusion model fitting with applications to image analysis and automated cartography.
Communications of the ACM 24, 381–395.
In this paper, we have presented one of the most flexible, ro- Forbes, K., Voigt, A., Bodika, N., 2002. An inexpensive, automatic and ac-
bust, and user-friendly camera calibration toolbox to date. It al- curate camera calibration method, in: South African Workshop on Pattern
Recognition.
lows calibrating fisheye, perspective, and hybrid vision systems Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F., 2003. Complete solution clas-
composed of an arbitrary number of cameras without any priors sification for the perspective-three-point problem. IEEE Transactions on
or restrictions on their location. Moreover, an arbitrary number Pattern Analysis and Machine Intelligence (TPAMI) 25, 930–943.
of calibration boards can be used and placed without specific Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F.J., Marı́n-Jiménez,
M.J., 2014. Automatic generation and detection of highly reliable fiducial
limitations. Regarding the stability of the technique, our hier- markers under occlusion. Pattern Recognition 47, 2280–2292.
archical calibration strategy ensures a good convergence of the Ha, H., Perdoch, M., Alismail, H., So Kweon, I., Sheikh, Y., 2017. Deltille
intrinsic and extrinsic parameters of the camera rig. This archi- grids for geometric camera calibration, in: ICCV.
tecture combines robust estimation strategies (i.e. bootstrapped Heng, L., Choi, B., Cui, Z., Geppert, M., Hu, S., Kuan, B., Liu, P., Nguyen, R.,
Yeo, Y.C., Geiger, A., et al., 2019. Project autovision: Localization and 3d
initialized of non-overlapping, RANSAC, and robust non-linear scene perception for an autonomous vehicle with a multi-camera system, in:
optimization) to ensure a satisfying calibration. Through a large ICRA.
series of experiments, we have demonstrated the robustness, ac- Heng, L., Li, B., Pollefeys, M., 2013. Camodocal: Automatic intrinsic and
extrinsic calibration of a rig with multiple generic cameras and odometry,
curacy, and relevance of the approach for multiple use cases.
in: IROS.
Our toolbox still has a few limitations. In its current form, it Im, S., Ha, H., Rameau, F., Jeon, H.G., Choe, G., Kweon, I.S., 2016. All-around
can only exploit Charuco markers while the addition of more depth from small motion with a spherical panoramic camera, in: ECCV.
advanced AR markers, such as AprilTag (Olson, 2011), might Itseez, 2015. Open source computer vision library.
Kannala, J., Brandt, S.S., 2006. A generic camera model and calibration
be an interesting extension. Besides, additional camera models method for conventional, wide-angle, and fish-eye lenses. PAMI 28, 1335–
could be included, such as spherical camera models (Usenko 1340.
et al., 2018; Barreto, 2006). Finally, MC-Calib is designed Keselman, L., Iselin Woodfill, J., Grunnet-Jepsen, A., Bhowmik, A., 2017. Intel
for unsynchronized camera systems or for rolling shutter cam- realsense stereoscopic depth cameras, in: IEEE Conference on Computer
Vision and Pattern Recognition Workshops.
eras. Aside from these restrictions, our technique does not suf- Kumar, R.K., Ilie, A., Frahm, J.M., Pollefeys, M., 2008. Simple calibration of
fer other limitations other than usual corner detection-related non-overlapping cameras with a mirror, in: CVPR.
problems (e.g. motion blur, out-of-focus blur, etc.). We believe Kuo, J., Muglikar, M., Zhang, Z., Scaramuzza, D., 2020. Redesigning slam for
this work can be useful for most applications requiring multi- arbitrary multi-camera systems, in: ICRA.
Lébraly, P., Deymier, C., Ait-Aider, O., Royer, E., Dhome, M., 2010. Flexi-
camera systems, in particular in robotics and for autonomous ble extrinsic calibration of non-overlapping cameras using a planar mirror:
cars where multiple fisheye cameras are often employed. Application to vision-based robotics, in: IROS.
Lébraly13, P., Ait-Aider13, O., Royer23, E., Dhome13, M., 2010. Calibra-
tion of non-overlapping cameras-application to vision-based robotics, in:
Acknowledgement BMVC.
Li, B., Heng, L., Koser, K., Pollefeys, M., 2013. A multiple-camera system
Francois Rameau was supported under the framework of calibration toolbox using a feature descriptor-based calibration pattern, in:
IROS.
international cooperation program managed by the National Lin, Y., Larsson, V., Geppert, M., Kukelova, Z., Pollefeys, M., Sattler, T., 2020.
Research Foundation of Korea(NRF-2020M3H8A1115028, Infrastructure-based multi-camera calibration using radial projections, in:
FY2022). Jinsun Park was supported by Basic Science Re- ECCV.
search Program through the National Research Foundation Liu, A., Marschner, S., Snavely, N., 2016. Caliber: Camera localization and
calibration using rigidity constraints. International Journal of Computer Vi-
of Korea(NRF) funded by the Ministry of Education(NRF- sion (IJCV) 118, 1–21.
2021R1I1A1A01060267). Lloyd, S., 1982. Least squares quantization in pcm. IEEE transactions on
information theory 28, 129–137.
Mei, C., Rives, P., 2007. Single view point omnidirectional camera calibration
References from planar grids, in: ICRA.
Moulon, P., Monasse, P., Perrot, R., Marlet, R., 2016. Openmvg: Open multiple
Alexiadis, D.S., Chatzitofis, A., Zioulis, N., Zoidi, O., Louizis, G., Zarpalas, view geometry, in: International Workshop on Reproducible Research in
D., Daras, P., 2016. An integrated platform for live 3d human reconstruction
16
Pattern Recognition.
Munoz-Salinas, R., 2012. Aruco: a minimal library for augmented reality ap-
plications based on opencv. Universidad de Córdoba 386.
Mur-Artal, R., Tardós, J.D., 2017. Orb-slam2: An open-source slam system
for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics
(TRO) 33, 1255–1262.
Olson, E., 2011. Apriltag: A robust and flexible visual fiducial system, in:
ICRA.
Pesce, M., Galantucci, L., Percoco, G., Lavecchia, F., 2015. A low-cost multi
camera 3d scanning system for quality measurement of non-static subjects.
Procedia CIRP 28, 88–93.
Qin, T., Li, P., Shen, S., 2018. Vins-mono: A robust and versatile monocular
visual-inertial state estimator. IEEE Transactions on Robotics (TRO) 34,
1004–1020.
Rameau, F., Demonceaux, C., Sidibé, D., Fofi, D., 2014. Control of a ptz
camera in a hybrid vision system, in: VISAPP.
Rehder, J., Nikolic, J., Schneider, T., Hinzmann, T., Siegwart, R., 2016. Ex-
tending kalibr: Calibrating the extrinsics of multiple imus and of individual
axes, in: ICRA.
Rosinol, A., Abate, M., Chang, Y., Carlone, L., 2020. Kimera: an open-source
library for real-time metric-semantic localization and mapping, in: ICRA.
Scaramuzza, D., Martinelli, A., Siegwart, R., 2006. A toolbox for easily cali-
brating omnidirectional cameras, in: IROS.
Schönberger, J.L., Frahm, J.M., 2016. Structure-from-motion revisited, in:
CVPR.
Schroers, C., Bazin, J.C., Sorkine-Hornung, A., 2018. An omnistereoscopic
video pipeline for capture and display of real-world vr. ACM Transactions
on Graphics (TOG) 37, 1–13.
Strauß, T., Ziegler, J., Beck, J., 2014. Calibrating multiple cameras with non-
overlapping views using coded checkerboard targets, in: 17th international
IEEE conference on intelligent transportation systems (ITSC), IEEE. pp.
2623–2628.
Sturm, P., Ramalingam, S., 2011. Camera models and fundamental concepts
used in geometric computer vision. Now Publishers Inc.
Tsai, R., 1987. A versatile camera calibration technique for high-accuracy 3d
machine vision metrology using off-the-shelf tv cameras and lenses. IEEE
Journal on Robotics and Automation 3, 323–344.
Tsai, R.Y., Lenz, R.K., et al., 1989. A new technique for fully autonomous and
efficient 3 d robotics hand/eye calibration. IEEE Transactions on robotics
and automation 5, 345–358.
Urban, S., Wursthorn, S., Leitloff, J., Hinz, S., 2016. MultiCol Bundle
Adjustment: A Generic Method for Pose Estimation, Simultaneous Self-
Calibration and Reconstruction for Arbitrary Multi-Camera Systems. Inter-
national Journal of Computer Vision (IJCV) , 1–19.
Usenko, V., Demmel, N., Cremers, D., 2018. The double sphere camera model,
in: 3DV.
Wang, J., Olson, E., 2016. Apriltag 2: Efficient and robust fiducial detection,
in: IROS.
Wu, C., et al., 2011. Visualsfm: A visual structure from motion system .
Xing, Z., Yu, J., Ma, Y., 2017. A new calibration technique for multi-camera
systems of limited overlapping field-of-views, in: IROS.
Yu, Z., Yoon, J.S., Lee, I.K., Venkatesh, P., Park, J., Yu, J., Park, H.S., 2020.
Humbi: A large multiview dataset of human body expressions, in: CVPR.
Zhang, Z., 2000. A flexible new technique for camera calibration. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence (TPAMI) 22, 1330.
Zhao, F., Tamaki, T., Kurita, T., Raytchev, B., Kaneda, K., 2018. Marker-based
non-overlapping camera calibration methods with additional support camera
views. Image and Vision Computing 70, 46–54.