1. Introduction
In this study, we present a fiduciary-free and robust image stabilization approach for simultaneous multi-sample time-lapse microscopy (SMSTM). Frame-to-frame image stabilization is an important task in many fields of microscopy such as fluorescence microscopy [
1], single molecule localization, super-resolution microscopy [
2], intravital video microscopy [
3], etc. In particular, when estimating cell activity (cell movement, movement directionality, etc.), it is important to eliminate lateral jitter between neighboring frames of associate cells [
4,
5,
6], to track them and compute their velocities. Here, the challenge lies in differentiating the individual cell’s motion from the motion of the field of view (FOV).
SMSTM is utilized in drug-related studies, to quickly evaluate multiple compounds distributed over several wells of a well plate, within the same time frame. During an experiment, the individual samples are scanned serially, which means that for each time step an image of each sample is taken and combined into a time-lapse sequence in post. The sample stage is moved laterally from well to well, to switch between a large number of cell-culture samples. Maintaining the same region of interest (RoI) for each sample over multiple switching cycles is restricted by long stage travel and the limited precision of the stage actuators. Relocating the RoI for each sample can be realized by utilizing fiduciary sample holders (also called image lock-plates) in combination with a internal device feed-back loop. This enables the microscopic system to reacquire the previous sample RoI and keep the image stable for the entire observation independent of directed or random cell movement. This method can be classified as an active stabilization method, where the stabilization is performed during observation. This has the advantage of locating the exact same location within the well plate, independently of the morphology or brightness of the observed objects (e.g., cells). A passive method is presented in this study. The frames are aligned in post, using computer vision to maintain the lateral RoI. In a more general sense, this task can be defined as image stabilization or fiduciary-free frame alignment for images without a clear point of reference.
Video stabilization and time-lapse observation can be achieved through image registration. This describes the process of identifying features in images or maps, with the goal of aligning them relative to a common coordinate system and origin. There are two types of image registration procedure: brightness-based and feature-based.
Brightness-based procedures are often used in applications containing unresolved objects, e.g., astronomical applications to match stars in the sky [
7], or microscopic observations using fluorescence microscopy. Unresolved particles from fluorescence images were detected and utilized for frame-to-frame drift correction [
8]. Stabilizing a set of images based on unresolved features has the advantage that, if features are point-like, the features can be localized very precisely, depending on the underlying broadening mechanism, which is either introduced by the media between instrument and the observed object or the limited resolution capabilities of the optical instrument. In either case, sub-pixel precise localization is possible using a point spread function [
9].
Feature-based methods like phase-cross-correlation (PCC) and optical flow (OF) are where resolved features are used to determine the correspondence between a set of images. PCC is based on the Fourier shift theorem, in which the normalized cross-power spectrum is computed to factor out the phase difference created by two images that are shifted by the (u,v) to each other. The approach relies on frequency-domain representation and returns transversal displacement components (u,v), while most OF approaches, such as the Lucas–Kanade(LK) [
10] and the TV-L1 method, rely on the fact that the flow (motion) is stable in a predefined, surrounding region of each pixel [
11,
12].
However, for applications using real-life data, previous studies claimed that it is necessary to separate moving from non-moving objects. This becomes especially difficult in datasets where many objects exhibit a directional group motion [
2] that reassembles turbulent- instead of laminar flow. Chen et al. (2023) introduced a branch-and-bound algorithm to find subsets of point clouds as well as complementary information about cell shape and location to compute the matching likelihood of cell pairs in two imaging modalities [
13]. This approach is based on feature detection and requires precise cell segmentation, performed using, e.g., Cellpose, which usually requires parameter tuning to achieve sufficient segmentation precision [
14]. However, this approach is problematic because, at the time of writing, there is no single approach that can be considered robust enough to segment any image dataset and track all objects without requiring parameter tuning or retraining [
15].
In this manuscript, we introduce a frame-to-frame matching approach for SMSTM, based on recurrent all pairs field transforms (RAFT) [
16], as presented in
Figure 1. Here, the translation vector between two images is estimated by first computing the displacement field and then computing the median of its x and y components. We were able to produce precise matching results for a range of time-lapse observations. The RAFT model trained on the Sintel [
17] dataset performed better than traditional approaches such as PCC, LK, and TV-L1, but also significantly better than the RAFT model based on KITTI [
18] without properly characterizing the individual cell movements. The sample stage of SMSTM exhibited only lateral (x-horizontal and y-vertical) movement. Therefore, transformations such as rotations, shearing, and non-linear holomorphic transformations were not considered. In the following, we will introduce the data apprehension approaches and elaborate on the methods for testing and comparing the different registration approaches. Next, we elaborate on the frame alignment workflow and how it is used to correct for frame drift. In the results section, we compare different image registration approaches on SMSTM and synthesized data with variable time spacing. Finally, we discuss the viability and elaborate on possible trade-offs.
2. Materials
2.1. Cell Cultures and Reagents
We performed image stabilization for observations containing the following cell cultures and reagents.
2.1.1. Cell Cultures
Human Astrocytes (HA, iPSC-derived, Normal, iX cells Technologies): 8–15 cells/well (medium: 150 µL), human iPSC-derived human astrocytes that display typical astrocytic morphology and express key markers of, e.g., GFAP, ALDH1L1 when cultured in Human Astrocyte Maintenance Medium (Cat# MD-0109-100ML).
2.1.2. Reagents
QD-A: 30 nM, A: 5 µM, Plant extra (KNK XXX extra, MIT142 extra, KNK808 extra (final concentrations: 4 ng/µL)). DMSO (Control: final concentrations: ), Romaric Acid (RA) (Negative Control: final concentrations: 50 µM).
2.2. Simultaneous Multi-Sample Time-Lapse Observations (SMSTM)
Time-lapse imaging was conducted by the Regenerative Medicine and Cell Therapy Laboratories of the KANEKA CORPORATION, using an Incucyte SX1 (Sartorius Ltd., Goettingen, Germany) to perform SMSTM. This means multiple time-lapse observations were performed of samples. Recording was performed at every 20 min. Dynamics of QD-labeled A
and A
peptides were recorded with an Incucyte-SX1 (Sartorius, Bohemia, New York, NY, USA), and the exposure time was set to 90 h. A ×20 objective lens was used for image apprehension. The scan seating parameters were set as follows: Acquisition time: 400 ms, Iwaki-96 well plate (Catalog Numbers 3860-096) or image lock-late (Catalog Numbers 4379), three images per well. Estimated blob diameter: 5, threshold (RCU): 0.8. When a solvent (e.g., dimethyl sulfoxide, etc.) was necessary to prepare the dilution of plan extra, the same concentration of solvent was used for the pretreatment solution. The field of view (FOV) had a physical size of
mm ×
mm and an image resolution of 1408 pixel × 1040 pixel. The frame rate was 20min/frame for human astrocyte observations and 60 min/frame for 2.5 d-neural cell observations. The individual frames exhibited a strong displacement, as presented in
Figure 2, left. The lateral x-y displacement was caused by relocating the sample stage between each observation, to screen multiple samples. Focus drift (vertical displacement), as presented by Ma et al. (2023), was prevented by re-focusing before each observation [
19].
2.3. Single-Sample Time-Lapse Observations
The data apprehension and data analysis software of the Incucyte SX1 is closed-source and therefore we were not able to produce a proper reference dataset utilizing the image lock-plate without applying the locking mechanism. Further, the fiduciary markers of the image lock-plate that the internal feedback mechanism of the Incucyte SX1 uses were not accessible. Therefore, we introduced an artificial frame-to-frame lateral jitter to serially apprehended datasets, which was physically unable to exhibit any frame jitter or drift. Next, the properties of the dataset are elaborated.
SH-SY5Y cells (0.1–0.2 × cells) were re-plated onto mg/mL poly-D-lysine coated glass-bottomed 96-well micro-plates (IWAKI, Haibara, Japan). Cells were incubated overnight at 37 °C in humidified air containing 5% . To inhibit actin polymerization and/or microtubule depolymerization, cells were treated with cytochalasin D and/or taxol at various concentrations. After incubation with inhibitors at 37 °C in humidified air containing 5% for one hour, cells were observed under, and time-lapse images were captured with, an inverted microscope (Ti-E; Nikon, Tokyo, Japan) equipped with a color CMOS camera (DS-Ri2; Nikon, Tokyo, Japan) and an objective lens (PlanApo 20×/0.75 NA; Nikon, Tokyo, Japan), resulting in a FOV with a physical size of 640 µm × 640 µm and an image resolution of 1608 pixel × 1608 pixel. During observation, cells were maintained in DMEM/F12 (1:1) (Gibco/ Life Technologies, Waltham, MA, USA) supplemented with 10% FBS and 100 µ/mL penicillin and 100 µg/M Lstreptomycin and warmed in a chamber set to 37 °C chamber (INUBTF-WSKM-B13I; Tokai Hit, Fujinomiya, Japan). Bright-field images were captured every minute for six to seven hours and exported using NIS-Elements AR software version 4.5 (Nikon). The images were captured in 8 bit RGB and exported by the camera (internally processed) in 8-bit greyscale.
2.4. Single-Sample Jitter and Stage Drift Synthesis
As a baseline and to evaluate our image stabilization approach and to compare it to previous methods, we introduced artificial translational frame jitter into single-sample time-lapse observations. As presented in
Figure 2 Left, the simultaneously produced data taken with the Incucyte SX1 (without image lock-plate) exhibited a periodical jitter in horizontal (x) and vertical (y) directions with a primary amplitude
∼ 100 pixel and seemed to also exhibit an underlying modulation, exhibiting an secondary amplitude
∼ 20 pixel. We defined the translation transformation as shown in Equations (
1) and (
2) to imitate the lateral jitter-evolution of the sample stage, horizontally (
) and vertically (
).
with
t being the frame number associated with a regular time interval and
with
and
and the displacement amplitude
as a random vector
. Using affine transformation with bi-linear interpolation, we produced a new set of displaced images that were used as the ground truth in this study. A comparison of the frame-to-frame displacement behavior between the Incucyte SX1 (without image lock-plate) observations and the synthesized data is presented in
Figure 2 Right and Left, respectively.
5. Discussion
In this study, we developed a method to correct 2D microscopic time-lapse observations below pixel precision, without the need for fiduciary methods, using RAFT. We compared our approach to established image registration and optical flow approaches. We visually confirmed the stabilization results, applied the stabilization to stable data, and tested our approach on synthesized time-lapse observations. Each registration approach had a set of variable parameters, and we estimated the best parameter from the distribution presented for each approach in
Figure 10.
On the left-hand-side, cross-correlation-based stabilization exhibited strong fluctuations in terms of the maximum displacement error for low upscaling factors, which disappeared and become almost stable at an upscaling factor of 81. As presented in the center of
Figure 10, Lucas–Kanade-based stabilization exhibited a minimum dispersion at a radius of 23 and was relatively stable for higher radii. RAFT-based stabilization only showed a very minor dependence of the displacement error on the number of iterations.
Figure 11A presents the first and second frames of the sample denoted “230208E3-3 RA”. The displacement maps for the x and y directions are presented in B. Regions R1-3 indicate regions where cell movement was clearly visible but not recognized by RAFT. In general, the cell structure could be categorized as weakly modulated structures in comparison the image dimensions. The background features could be considered as strongly modulated features. RAFT seemed to be especially sensitive to small (high-frequency) features. The motility of individual cells was not reflected in the displacement maps. Therefore, the RAFT approach appeared to work well because it characterized the entire scene well, but not the individual cells. We conclude that the Sintel-trained model was not suitable for tracking individual cells. The movement of the individual cells could be categorized as elastic, which means that the cell morphology of the individual cells changed drastically from one frame to another. The detailed functions of the Sintel-based RAFT model were not fully comprehensible to the authors of this paper, which was caused by the optimization (training) of the many parameter model. The general process describing the optical flow method (e.g., Lucas–Kanade) was comprehensible, and it is generally known that optical flow is robust against shape-invariant translations, but not for structures exhibiting strong morphological changes, as presented in our study.
When analyzing the displacement histograms as presented in
Figure 11C, it becomes inherently clear that this mode presented the best solution for an optimal dataset. However, in our approach, we chose the median (median = mode for symmetric distributions), since it better characterized flat tops, as presented in the top histogram of
Figure 11C. For comparison, black arrows indicate the NN displacement (background of B) characterized by the median of the histograms (C).
As presented in
Figure A1 presented in the
Appendix A, RAFT-based stabilization is computationally expensive and can be accelerated by using GPU-acceleration. This has the drawback that, for most commercial (affordable) GPUs, the floating-point precision (32 bit) is limited to half-precision (16 bit). We computed all samples presented above using half-precision. This did not affect the overall performance significantly, as presented in
Figure 12, where the residual was mostly below 0.01 pixels and occasionally spiked to 0.1 pixels.
During the investigation stage of this study, we were not aware that the drift was only lateral and tried to solve for the holomorphic (off-axis) components of the transformation matrix (
1). However, if off-axis components are truly zero, then a holomorphic solver will still produce minor off-axis values larger than zero and therefore not stabilize the frames correctly (because of the large degree of freedom). Therefore, our approach is limited to instruments exhibiting similar frame dislocation properties as the Incucyte-SX1 and cannot correct for holomorphic dislocations and transformations.
In general, image alignment strongly depends on the degree of feature change within images. In this case, this can be seen as the degree of motility and morphology of the individual cells within the FOV. We confirmed visually that the cells within the FOV are motile and undergo permanent morphological change. For future research, it will be necessary to compute cell motility and morphology precisely and investigate their relationship to stabilization accuracy.