Isscc2020 05 Digest
Isscc2020 05 Digest
The session opens with two LiDAR papers, first a 40-channel, 225m max-distance, 240×192 resolution LiDAR SoC, followed by a 1M pixel,
VAPD-based LiDAR system capable of 10cm resolution. A 500MHz demodulation, indirect time-of-flight iTOF sensor based on Ge-on-Si is then
presented after which a QVGA iTOF sensor featuring motion artifact suppression and background light cancelation is described. Rolling and
global shutter pixel scaling continues with the presentation of a 2.3μm, 1Mpixel voltage-domain global shutter sensor followed by an 44Mpixel
CMOS image sensor with 0.7μm pixel pitch. A 132dB dynamic range, single exposure sensor is then described followed by a 1.45μm pixel, BSI
sensor featuring an in-pixel differential amplifier for improved read noise. The session ends with two event-driven vision sensors, the first
presenting a multi-mode sensor capable of motion and saliency detection, followed by a 3D-stacked, 4.86μm event-based sensor.
1:30 PM
5.1 A 240×192Pixel 10fps 70klux 225m-Range Automotive LiDAR SoC Using a 40ch 0.0036mm2 Voltage/Time
Dual-Data-Converter-Based AFE
S. Kondo, Toshiba, Kawasaki, Japan
In Paper 5.1, Toshiba describes a 240×192 pixel automotive LiDAR SoC capable of 225m maximum range distance
at 10fps, while operating at up to 70klux ambient light based on 40-channel voltage/time dual-data-converter-based
AFE.
2:00 PM
5.2 A 1200×900 6μm 450fps Geiger-Mode Vertical Avalanche Photodiodes CMOS Image Sensor for a 250m
Time-of-Flight Ranging System Using Direct-Indirect-Mixed Frame Synthesis with Configurable-Depth-Resolution
Down to 10cm
T. Okino, Panasonic, Nagaokakyo, Japan
In Paper 5.2, Panasonic presents a 1200×900 VAPD-based time-of-flight system capable of both direct and indirect
ranging up to a maximum range distance of 250m with a 10cm minimum resolution.
2:15 PM
5.3 An Up-to-1400nm 500MHz Demodulated Time-of-Flight Image Sensor on a Ge-on-Si Platform
C.-F. Liang, Artilux, Hsinchu, Taiwan
In Paper 5.3, Artilux presents an indirect time-of-flight range sensor based on a Ge-on-Si platform capable of 500MHz
demodulation frequency, while offering sensitivity at wavelengths up to 1400nm.
2:30 PM
5.4 A Dynamic Pseudo 4-Tap CMOS Time-of-Flight Image Sensor with Motion Artifact Suppression and Background
Light Cancelling Over 120klux
D. Kim, Sungkyunkwan University, Suwon, Korea
In Paper 5.4, Sungkyunkwan University presents a QVGA iTOF CMOS sensor. Based on a pseudo 4-tap scheme, this
device features motion artifact suppression and switching ΔΣ background light cancelation. The sensor offers ranging
up to 4m under 120klux.
5
3:15 PM
5.5 A 2.1e- Temporal Noise and -105dB Parasitic Light Sensitivity Backside-Illuminated 2.3μm-Pixel Voltage-Domain
Global Shutter CMOS Image Sensor Using High-Capacity DRAM Capacitor Technology
J-K. Lee, Samsung Electronics, Hwaseong, Korea
In Paper 5.5, Samsung Electronics presents a 2.3μm pitch, 1Mpixel, voltage-domain global shutter CMOS image
sensor realizing a low parasitic light sensitivity of less than -105dB, random noise of 2.1e- RMS and 42% QE at
940nm light, utilizing high-density in-pixel capacitors.
3:30 PM
5.6 A 1/2.65in 44Mpixel CMOS Image Sensor with 0.7μm Pixels Fabricated in Advanced Full-Depth Deep-Trench
Isolation Technology
H. Kim, Samsung Electronics, Hwaseong, Korea
In Paper 5.6, Samsung Electronics demonstrates a 1/2.65in, 44Mpixel, CMOS image sensor with 0.7μm pixel pitch
fabricated using advanced full-depth DTI technology.
3:45 PM
5.7 A 132dB Single-Exposure-Dynamic-Range CMOS Image Sensor with High Temperature Tolerance
Y. Sakano, Sony Semiconductor Solutions, Atsugi, Japan
In Paper 5.7, Sony Semiconductor Solutions presents a 132dB single-exposure dynamic range, 5.4Mpixel, stacked
back-illuminated CMOS image sensor using a sub-pixel architecture with high temperature tolerance. The effective
handling electrical charges of 2400ke- and the random noise of 0.6e- rms are achieved, while maintaining a minimum
composition SNR of 25dB at 100°C.
4:15 PM
5.8 A 0.50e-rms Noise 1.45μm-Pitch CMOS Image Sensor with Reference-Shared In-Pixel Differential Amplifier at
8.3Mpixel 35fps
M. Sato, Sony Semiconductor Solutions, Atsugi, Japan
In Paper 5.8, Sony Semiconductor Solutions presents a 1.45μm pixel, BSI-stacked 1/2.8 inch CIS achieving 0.5e-
RMS using a reference-shared in-pixel differential amplifier to improve read noise by 30% versus a conventional in-
pixel differential amplifier.
4:30 PM
5.9 A 0.8V Multimode Vision Sensor for Motion and Saliency Detection with Ping-Pong PWM Pixel
T-H. Hsu, National Tsing Hua University, Hsinchu, Taiwan
In Paper 5.9, National Tsing Hua University presents a multi-mode vision sensor for motion and saliency detection
based on a ping-pong PWM pixel operating with a 0.8V supply. The sensor features in-pixel frame differencing, and
mixed-mode event counting capability, achieving a frame rate of 510/890fps for frame difference / saliency detection
modes at 74.4/121.6μW.
4:45 PM
5.10 A 1280×720 Back-Illuminated Stacked Temporal Contrast Event-Based Vision Sensor with 4.86μm Pixels,
1.066GEPS Readout, Programmable Event-Rate Controller and Compressive Data-Formatting Pipeline
T. Finateu, PROPHESEE, Paris, France
In Paper 5.10, PROPHESEE presents a 3D-stacked, 1280×720 resolution, event-based vision sensor. Featuring a
4.86μm pixel and 1.066GEPS readout, the sensor achieves a dynamic range of >124dB with a programmable event
rate controller and dynamic data formatting.
5.1 A 240×192Pixel 10fps 70klux 225m-Range Automotive Figure 5.1.3 shows the DDC circuitry (focused on VCO-ADC functions) with the
LiDAR SoC Using a 40ch 0.0036mm2 Voltage/Time MP-PDM-based feedback and the operation principle. Suppressing the VCO non-
linearity is necessary to enhance its effective resolution [5]. Foreground calibration
Dual-Data-Converter-Based AFE [6] is complex and unscalable to several tens of channels. Closed-loop VCO-ADCs
Satoshi Kondo1, Hiroshi Kubota2, Hisaaki Katagiri2, Yutaka Ota2, [7,8] are calibration-free but require area-hungry analog circuits (e.g., op-amps,
Masatoshi Hirono3, Tuan Thanh Ta1, Hidenori Okuni1, Shinichi Ohtsuka2, DACs). We realize an area-efficient and calibration-free VCO-ADC by a multi-
phase-PDM (MP-PDM) based feedback. The principle of MP-PDM is simple: the
Yoshinari Ojima2, Tomohiko Sugimoto2, Hirotomo Ishii2,
VCO output pulse is directly utilized as the pulse current signal feedback to virtual
Kentaro Yoshioka1, Katsuyuki Kimura2, Akihide Sai1, Nobu Matsumoto2 ground. Since the voltage-current conversion can be realized with a simple pulser
1
Toshiba, Kawasaki, Japan and current source, the method only requires small area. Similar to the DAC-based
2
Toshiba Electronic Devices & Storage, Kawasaki, Japan feedback, the ADC accuracy is affected by the settling error and the switching
3
Toshiba, Yokohama, Japan spurs of the output pulse current. MP-PDM solves this by utilizing multiple phases
and frequency-divided VCO output pulses. Compared to a single-phased PDM [7],
A safe and reliable self-driving system is a key enabling technology for a society due to multiple phase implementation, the operation frequency and the magnitude
without traffic jams or accidents; LiDAR plays an essential role for such systems. of the pulse current can be relaxed to 1/4 and 1/8, respectively. While current
To ensure higher levels of safety and comfort, early detection of small objects mismatches in DAC-based feedback degrade the ADC linearity, the mismatches
(e.g., debris/children) is crucial. To achieve this, state-of-the-art LiDARs [1-3] in MP-PDM just become the gain offset. Therefore, sufficient SNDR can be
must attain even more finely scaled pixel resolution: for example, a 0.1-degree achieved without any bulky analog circuitry. Last, the measured ADC performance
angle resolution (a 2× finer resolution than [3]) is required to detect a 20×20cm2 is shown on the right. Thanks to the MP-PDM, the ADC achieves SNDR 39.3dB
object 100m away. However, hybrid LiDAR systems [3] require a pair of TDC/ADC at 400MS/s without any calibration. The performance comparison shows that the
AFEs per pixel to obtain both precise short-range (SR) distance measurement ADC achieves extremely small area: its area is 4× smaller that of [3] and achieves
(DM) and 200m long-range (LR) DM. Scaling the pixel resolution will significantly 2dB higher SNDR.
enlarge the SoC area and explode its cost.
The LiDAR SoC was fabricated in 28nm CMOS (Fig. 5.1.7). Figure 5.1.4 shows
We report a dual-data converter (DDC) that consolidates the functions of ADC the DM performance of the LiDAR using the SoC, where the 10%-reflectivity target
and TDC into a single circuit; as such, a significant reduction in the area of the was moved through 5 to 225m under 100klux sunlight (70klux at the target
Hybrid LiDAR AFE is achieved. The DDC acquires both high-precision time and surface). The DM success rate of SR-DM is 100% up to 15m and that of LR-DM
voltage data from the input: although it achieves 5× smaller AFE area than prior is 100% at 225m. Moreover, smaller σ error can be achieved in SR-DM compared
arts [2,3], the conversion accuracy is also higher. This innovation leads to a 40- to that of LR-DM, up to 15m (left-bottom). Since variance in arrived photon
channel AFE integration of the SoC without increasing cost. Moreover, owing to modulates the SiPM amplitude, it causes non-linearity in SR-DMs. Thus, area-
the high ADC performance of DDCs, DM under 70klux sunlight was 12% longer hungry CFD circuits are utilized to cancel such effects [3,4]. Shown in the right,
than that in [3], achieving 225m. we report a peak-triggering (PT) technique to move the photon-variance
Figure 5.1.1 shows the block diagram of our high-pixel-resolution LiDAR system cancellation to the digital-domain and minimize the AFE area. PT averages the ToF
with the DDC-based AFE. The conventional hybrid AFE [3] (bottom left) consists of both rising/falling edges to estimate the photon-invariant peak time of the SiPM
of 2-channels: the SR-AFE measuring the short-range distance with a precise TDC output. The measured SR-DM is reported with/without PT in the bottom-right;
and area-hungry CFDs [4], and the LR-AFE measuring the long-range distance. the PT successfully suppresses photon-variant non-linearity.
Also 2 SiPM channels are utilized because saturation-tolerant SiPMs are required Figure 5.1.5 shows the measured point cloud using SR/LR composite data and
for SR-DM. Thus, a pair of SiPM/AFE channels are required to obtain a single the signal intensity images captured by the LiDAR system. These obtain much
pixel; the AFE implementation is highly redundant and the pixel-resolution is not clearer image than [3], and can clearly recognize multiple pedestrians. The bottom
scalable. Our DDC-based AFE permits a 5× smaller area and a single-channel shows the DM success rate when using 96/192pixel vertical-pixel LiDARs, where
implementation to further scale the LiDAR pixel resolution. The DDC is based on the vertical size of the target (75m away) was changed by 7.5cm. When compared
VCO frequency modulation, which enables the acquisition of both 40ps-resolution at 90% DM success, the 192pixel LiDAR detects 1/2 smaller objects compared
time data and 6b 400MS/s voltage data from a single SiPM output, unifying the to 96pixel, thus suitable for self-driving applications.
TDC/ADC function. The DDC performance has higher ADC resolution than [3] and
achieves 12% longer DM; 0-to-225m range measurement is accomplished with Figure 5.1.6 shows the LiDAR performance comparison. The proposed LiDAR
a single channel implementation. Moreover, we utilize a short dead-time SPAD to using DDC-based AFE achieves both the highest pixel resolution and the longest
unify the conventional 2-channel SiPMs, which has higher saturation tolerance DM range in the table, with the smallest AFE/channel area, at the same background
with fast quenching. Furthermore, we use a peak-triggering technique to move light condition.
the DM non-linearity correction to the digital domain, and the area-hungry CFD is References:
removed. [1] Velodyne, “High Definition Lidar White Paper”. Accessed on Aug. 28, 2019,
Figure 5.1.2 shows the operation principle of the voltage/time dual data https://velodynelidar.com/lidar/products/white_paper/HDL%20white%20paper_O
conversion. By modulating the VCO pulse density with the SiPM output and CT2007_web.pdf
differentiating/integrating the VCO pulses, we obtain both voltage/time conversion [2] C. Niclass et al., “A 0.18-μm CMOS SoC for a 100-m-Range 10-Frame/s
results from a single circuit, respectively. The DDC consolidates the functions of 200×96-Pixel Time-of-Flight Depth Sensor,” IEEE JSSC, vol. 49, no. 1, pp. 315–
ADC and TDC into a single VCO, thus achieving a significant area reduction for 330, Jan. 2014.
Hybrid LiDAR AFEs. The concept of voltage acquisition is the same as VCO-ADCs [3] K. Yoshioka et al., “A 20ch TDC/ADC Hybrid SoC for 240×96-Pixel 10%-
[5-8]: by counting the VCO pulses within a certain period (or differentiating), the Reflection <0.125%-Precision 200m-Range Imaging LiDAR with Smart
input signal is converted to the digital domain. Simultaneously, high-precision Accumulation Technique,” ISSCC, pp. 92–93, Feb. 2018.
time data is acquired by integrating (or counting) the VCO output pulses from the [4] D. Binkley, “Performance of Non-Delay-Line Constant-Fraction Discriminator
laser output to the timing when the input exceeds a set threshold (Trigger). Since Timing Circuits,” IEEE Trans. Nucl. Sci., vol. 41, no. 4, pp. 1169–1175, Aug. 1994.
a VCO is packed into each AFE channel, our system does not require GHz-clock [5] A. Iwata et al., “The Architecture of Delta Sigma Analog-to-Digital Converters
distribution to AFEs as in [2,3], eliminating power-hungry high-speed buffers in Using a Voltage-Controlled Oscillator as a Multibit Quantizer,” IEEE TCAS-II, vol.
the SoC. Since we utilize 16-phase ring oscillators for the VCO, the time LSB 46, no. 7, pp. 941–945, July 1999.
resolution and voltage data resolution is improved to 40ps and 6b, respectively. [6] M. Baert, W. Dehaene, “A 5GS/s 7.2 ENOB Time-Interleaved VCO-Based ADC
Moreover, the DDC segments the time data into coarse/fine ToF to eliminate the Achieving 30.5fJ/conv-step,” ISSCC, pp. 328-329, Feb. 2019.
effect of the VCO poor jitter performance. DDC calculates coarse ToF by [7] W. El-Halwagy et al., “A Programmable 8-bit, 10MHz BW, 6.8mW,
integrating the stable 400MHz clock pulses for voltage sampling and fine ToF by 200MSample/sec, 70dB SNDR VCO-Based ADC Using SC Feedback for VCO
integrating the pulse count between the 400MHz clock and Trigger. Thus, DDC Linearization,” IEEE ICECS, pp.157-160, Dec. 2013.
only utilizes the VCO pulse information during a very short term for time [8] M. Park and M. Perrott, “A 0.13μm CMOS 78dB SNDR 87mW 20MHz BW CT
conversion. Importantly, since almost all DDC block consists of digital circuitry, ΔΣ ADC with VCO-Based Integrator and Quantizer,” ISSCC, pp. 170-171, Feb.
the DDC area copes with process scaling. 2009.
Figure 5.1.1: Block diagram of conventional hybrid AFE (left) and DDC-based Figure 5.1.2: Operation principle of dual-data (voltage/time) conversion
AFE (right). utilizing differentiation/integration of VCO output pulse.
Figure 5.1.5: Captured LiDAR images (top: SR/LR composite point cloud,
middle: signal intensity image) and measured succession-rate dependency on
the vertical target size when using 96/192pixel vertical-pixel LiDARs (bottom). Figure 5.1.6: Comparison with state-of-the-art LiDAR systems.
5.2 A 1200×900 6μm 450fps Geiger-Mode Vertical operation) as illustrated in the top left of Fig. 5.2.3. As a result, the amount of
Avalanche Photodiodes CMOS Image Sensor for a 250m charge is controlled to be identical for different pulses within each pixel, i.e.,
~2500e-. As shown in the top-right of Fig. 5.2.3, with CPS, the variance is reduced
Time-of-Flight Ranging System Using Direct-Indirect- to 12mV from that without CPS, i.e., 120mV. Second, QΔVt is removed by a
Mixed Frame Synthesis with Configurable-Depth- photon-count-equalizer (PCE) implemented in an FPGA as shown in Fig. 5.2.3
Resolution Down to 10cm bottom half. Histograms of a photon-count image before (left) and after (right)
PCE processing consisting of peaks due to 0 (dark count) to 15 photons are
Toru Okino, Shota Yamada, Yusuke Sakata, Shigetaka Kasuga, shown. The improvement made by PCE is clear, in that the variance is reduced
Masato Takemoto, Yugo Nose, Hiroshi Koshida, Masaki Tamaru, by more than a factor of 2.
Yuki Sugiura, Shigeru Saito, Shinzo Koyama, Mitsuyoshi Mori,
A chip micrograph is presented in Fig. 5.2.7.
Yutaka Hirose, Masayuki Sawada, Akihiro Odagawa, Tsuyoshi Tanaka
The size of the sub-range or the depth resolution is limited by the global shutter
Panasonic, Nagaokakyo, Japan pulse of 10ns giving a 1.5m resolution, enough for over 100m ranging but
Long-range (~250m) measurement systems with 3D resolution are highly insufficient for measurement of a few meters. To improve the depth resolution,
anticipated for various applications such as automotive, surveillance and robotics the present CIS has an operation mode in which phase difference between pulses
systems. Direct time-of-flight (ToF) systems based on CMOS image sensors (CIS) obtained from an object located within the size of each sub-range but taken with
with avalanche photodiodes with the Geiger-mode or single-photon avalanche different gate timing or subranges is calculated, as illustrated in Fig. 5.2.4.
diodes (SPAD) have been developed for these purposes. One difficulty is to meet Suppose an object located near the boundary of first and second sub-ranges, the
practical requirements of long-range (~250m) measurement capability together respective signals S1 and S2 from the object are detected in the two corresponding
with both lateral and depth resolution. Although indirect-ToF systems based on sub-range images. The displacement of the object from the center of the assumed
modulation and phase sensitive detection have fine depth resolution of the order first subrange of the center, Δd, can be obtained as the phase difference of the
of sub-cm, their full range is limited only to ~4m with standard Si-photodiodes two pulse signals accumulated during each sub-range measurement as
[1] and ~20m with the gain assist of APD [2]. Direct-ToF systems based on SPAD Δd = d×{(S1 – S2)/(S1 + S2)} ×1/2. (1)
pixels with time-to-digital converters (TDC) have long range measurement
capability but have not reached megapixel resolution [3,4]. Another direct-ToF of where d is the original sub-range depth. In principle, the resolution of Δd is set
sub-range syntheses (SRS) system, which detects single photons returned from by that of ADC, i.e., 7b, down to 1cm. In practice, we obtain a finest resolution of
an object located in each sub-range by synchronous gating, was demonstrated 10cm as shown in Fig. 5.2.4, where square wood bars with 10cm size are clearly
to have a long range (250m) measurement capability with high lateral resolution measured and the standard deviation is about 5cm. We confirm that this
of 10cm. However, it has a depth resolution limited by the width of optical pulse performance is preserved up to 20m distance.
source [5]. In order to circumvent this issue, in this work, we develop an SRS- Figure 5.2.5, right side, shows a long-range (~60m) ToF image comprising 15
ToF system based on a 6μm pitch, 1200×900pixels, Geiger-mode operated vertical sub-ranges of direct-ToF subranges with 7.5m depth resolution for long distance
avalanche photodiodes (VAPD) CIS that enables one to configure depth resolution (>13m) and of indirect-ToF images with 10cm resolution for short ranges (<13m),
down to 10cm for short-distance (<20m) ranging by building an indirect-ToF together with a direct-ToF image (left side) with a depth resolution of 7.5m
system on the previous one [5], where a capability for long-distance (250m) (distance>13m) and of 1.5m (distance<13m). Each vertical bar of the railing with
ranging with a lateral resolution of 10cm is preserved. The full range comprising a 10cm period is clearly resolved in the near side of the IToF image on the right.
15 subranges is measured in real time (30fps) with a 450fps speed of each Note that the right-side image constitutes the direct-indirect-mixed ToF subranges
subrange capture. Charge packets of Geiger-mode pulses accumulated on an in- all in one frame.
pixel charge accumulator (ICA) are converted to a precise photon-count-image
In Fig. 5.2.6, on a comparison table, the present CIS is compared with state-of-
by a single-slope analog-to-digital converter (SSADC). In addition, by phase
the-art ToF systems and is found to have superiority in the pixel resolution, frame
differentiating accumulated Geiger-mode charge on the ICA, resolving the near
rate, and realization of direct-indirect integration on a single chip.
side of a sub-range down to 10cm is demonstrated. Thus, a Geiger-mode ToF
system capable of imaging a frame with direct-indirect mixed subranges by a In conclusion, we report a Geiger-mode VAPD-based direct-indirect-mixed
single CIS chip is achieved. imaging system with a long range (~250m) and a fine depth resolution of 10cm
measurement capability. The system is expected to be useful not only for
A block diagram of the system is illustrated in Fig. 5.2.1. The pixel circuit consists
automotive applications but also for various applications such as surveillance,
of VAPDs and of an in-pixel charge accumulator (ICA) sourced by a charge-
human sensing, traffic sensing and control, and others.
packet-spatula (CPS), i.e., a subthreshold-biased reset transistor (RST), equalizing
charge amount of each Geiger-mode pulse by removing avalanche noise. A 7b Acknowledgement:
single-slope column analog-to-digital converter (SSADC) enables wide dynamic The authors are grateful to Dr. E. Fujii for his continuous support and
range of photon counting as well as phase difference operation on the encouragement.
accumulated charge. In the post-processor (FPGA), a photon count equalizer References:
(PCE) that removes fixed-pattern-noise (FPN) due to the CPS operation, is [1] M. Keel et al., “A 640x480 Indirect Time-of-Flight CMOS Image Sensor with 4-
implemented in addition to a depth-mapping engine to construct subrange tap 7-um Global-Shutter Pixel and Fixed-Pattern Phase Noise Self-Compensation
synthesis and a photon (intensity) imaging engine. Scheme,” IEEE Symp. VLSI Circuits, pp. C258-C259, 2019.
The top side of Fig. 5.2.2 shows a block diagram of the VAPD-CIS. The pixel arrays [2] B. Park et al., “A 64x64 APD-Based ToF Image Sensor with Background Light
are driven by the vertical circuits with a global shutter mode synchronous to 50kHz Suppression up to 200 klx Using In-Pixel Auto-Zeroing and Chopping,” IEEE Symp.
light pulses. After ADC, the digital data are transferred through horizontal shift VLSI Circuits, pp. C256-C257, 2019.
registers (HSR) to LVDS both driven by 240MHz clocks. The output comprises 8 [3] A.R. Ximenes et al., “A 256x256 45/65nm 3D-Stacked SPAD-Based Direct TOF
parallel LVDS channels. The total power consumption is 2.5W. The bottom half Image Sensor for LiDAR Applications with Optical Polar Modulation for up to 18.6dB
of Fig. 5.2.2 illustrates the pixel and column circuit. VAPD is biased to -26V Interference Suppression,” ISSCC, pp. 96-97, Feb. 2018.
generating a Geiger-mode pulse on single photon absorption and is self-quenched [4] M. Perenzoni et al., “A 64×64-Pixel Digital Silicon Photomultiplier Direct ToF
by capacitor-quenching [6]. At a specified timing of each subrange, generated Sensor with 100Mphotons/s/pixel Background Rejection and Imaging/Altimeter
pulse charge is transferred by TRN, from VAPD to the floating diffusion (FD) and Mode with 0.14% Precision up to 6km for Spacecraft Navigation and Landing,”
then to the in-pixel charge accumulator ICA (a MIM capacitor) through the second ISSCC, pp. 118-119, Feb. 2016.
transistor, CNT. For each subrange acquisition, 50 pulses are gated and the [5] S. Koyama et al., “A 220 m-Range Direct Time-of-Flight 688×384 CMOS Image
maximum count on ICA is set at 15. Sensor with Sub-Photon Signal Extraction (SPSE) Pixels Using Vertical Avalanche
Photo-Diodes and 6 kHz Light Pulse Counters,” IEEE Symp. VLSI Circuits, pp.
Generally, Geiger-mode pulses contain variances of avalanche noise (QAN) and
71-72, 2018.
reset transistor (RST) Vt variation (QΔVt) between pixels. To minimize them, we
[6] Y. Hirose et al., “A 400x400-Pixel 6μm-Pitch Vertical Avalanche Photodiodes
propose two methodologies: charge-packet-spatula (CPS) and photon-count-
(VAPD) CMOS Image Sensor Based on 150ps-fast Capacitive Relaxation
equalizer (PCE). First, before transferring each charge packet to ICA, the reset
Quenching (RQ) in Geiger Mode for Synthesis of Arbitrary Gain Images,” ISSCC,
transistor (RST) is turned on into a subthreshold regime to drain out QAN (CPS
pp. 104-105, Feb. 2019.
Figure 5.2.3: Effectiveness of Charge Packet Spatula (CPS) and Photon-Counts- Figure 5.2.4: Demonstration of indirect-ToF operation by phase differentiating
Equalizer (PCE) removing avalanche noise and Vt variance of RSTs. pulses from different subranges showing 10cm resolution.
Figure 5.2.5: Comparison of a Geiger-mode direct-ToF-indirect-ToF mixed Figure 5.2.6: Chip and system performance summary and comparison with
image with an ordinary Geiger-mode long range direct-ToF image. state-of-the-art APD-based and standard-PD CISs.
Figure 5.2.S2: Principle of the Geiger-mode indirect-ToF (Phase-sensitive Figure 5.2.S3: Demonstration of 10cm resolution of an object (pole) at 250m
ranging). distance ranged by the direct-ToF mode.
5.3 An Up-to-1400nm 500MHz Demodulated Time-of- in the bottom of Fig. 5.3.3. Before every exposure, all pixels are reset through
Flight Image Sensor on a Ge-on-Si Platform Msh1/Msh2 and Mrt1/Mrt2. After optical exposure, integration, and demodulation,
the collected photo-charges are stored on C1 and C2. Finally, the readout is
accomplished through the source follower Msf1/Msf2 and the row-select switch
C.-L. Chen*, S.-W. Chu*, B.-J. Chen*, Y.-F. Lyu*, K.-C. Hsu*, Mbt1/Mbt2. A 4-quad measurements are implemented to recover the depth
C.-F. Liang*, S.-S. Su, M.-J. Yang, C.-Y. Chen, S.-L. Cheng, H.-D. Liu, information without suffering from analog non-idealities.
C.-T. Lin, K. P. Petrov, H.-W. Chen, K.-C. Chu, P.-C. Wu, P.-T. Huang,
N. Na, S.-L. Chen All the demodulated pixel voltages are passed to the readout circuit and ADC
before digital processing. As shown in Fig. 5.3.4, the readout circuit consists of
Artilux, Hsinchu, Taiwan, *Equally-Credited Authors (ECAs) sample-and-hold circuits and buffers. 2 signal paths are realized and can be
activated in a time-interleaved way. Every pixel can be read out twice to reduce
Indirect time-of-flight (iToF) image sensors on silicon platforms [1-3] are the circuit noise by 3dB. The ADC is designed with a SAR architecture and the
attracting rising attention because of their better immunity to background light output bit width is 9-12 depending on the system’s operating scenario. The full-
and potential to achieve higher pixel resolution. Shifting the laser wavelength to scale voltage (Vfs) of this ADC can be adjusted according to the input voltage
940nm [3] may further improve the outdoor performance but at a cost of reduced swing, which results in lower quantization noise contribution. The programmable
responsivity and lower modulation speed. As shown in red in Fig. 5.3.1, the solar delay line and a pre-driver for external illumination driver are also depicted in the
irradiance at 940nm is only one-fourth of the case at 850nm, and it keeps going bottom of Fig. 5.3.4. The programmable delay line covers a range from 0° to 90°
down with increasing wavelength and even vanishes at ~1350nm. To further and more delays can be obtained by the quadrature post-divider inside the PLL.
increase the system SNR, using larger laser power is a feasible choice, but this Assuming the modulation/demodulation waveforms are all square waves, the
introduces a laser safety issue [4]. The maximum permissible exposure (MPE) at standard deviation of the recovered depth (σdepth) shows a dependency on the
the cornea versus laser wavelength with 0.1s exposure time is plotted in blue in initial phase offset, which is plotted in the bottom-left of Fig. 5.3.4. Under the
Fig. 5.3.1 as well. Obviously, the exposure limit is quite stringent at 940nm, but same intensity of illumination, the σdepth goes up and down with a period of 90°.
much relaxed at a longer near-infrared (NIR) wavelength. Both curves in Fig. 5.3.1 This is because the noise transfer function changes with phase offset, assuming
indicate that a better SNR can be achieved with lasers emitting at a longer NIR a fixed amount of collected photo-charges. Theoretically, choosing the optimized
wavelength, while the system can still satisfy the constraint of MPE. Unfortunately, phase offset results in 1.4× improvement in SNR. For some applications where
traditional silicon solutions fail to operate at a wavelength beyond 1000nm only relative depth is important, e.g., robotic arms and 3D scanning, an optimized
because of the 1.1eV indirect bandgap. However, it is shown that this can be phase offset can be introduced deliberately to maximize performance. If needed,
solved by using germanium-on-silicon (Ge-on-Si) as the absorption material [5]. the introduced offset can be calibrated at the module level.
Since Ge has a direct bandgap of 0.8eV, Ge-on-Si technology can extend the
absorption wavelength up to nearly 1550nm. This image sensor is developed via TSMC’s 12-inch image sensor platform. Figure
5.3.5 shows a brief summary and comparison with recently published iToF image
In this work, we present an iToF image sensor on a Ge-on-Si platform to achieve sensors. Because of the limited choice in commercial NIR camera lenses and
improved absorption efficiency and higher modulation speed up to 1400nm bandpass filters, the image sensor module, including the NIR coating, are
wavelength. A block diagram of the demonstrated image sensor is shown in Fig. optimized at 940nm. All pixel parameters are measured by wafer-level probing.
5.3.2. In this prototype, a 240×180 image sensor array is implemented as a proof The external responsivity of the pixels is about 0.5A/W up to 1400nm, although
of concept. Four-phase system clocks are generated from a conventional Integer- the microlens is optimized at 940nm as well. The modulation frequency is between
N PLL for modulation and demodulation. Before sending to the pixel array and 10 and 500MHz, which is limited by the on-chip PLL. The demodulation contrast
external illumination driver, these clocks are gated and conditioned by the timing can be set between 0.7 and 0.9 by controlling the voltage of CLKP/CLKN. The
generator for a preset integration time and different operation modes. A standard deviations for indoor and outdoor scenarios are less than 0.5% and
programmable delay line is added in the illumination driver path, which is 0.55%, respectively. This image sensor consumes 200mW under typical
explained below. A readout circuit bridges the image sensor array to column ADCs operation. The depth images at 940nm wavelength are shown in Fig. 5.3.6. To
and the ADC outputs are further processed and integrated in the digital domain create a consistent environment, we emulate the outdoor environment with a
before reaching the output interface. The output interface is mainly composed of controlled high-power halogen lamp. The halogen lamp power is calibrated with
a 2-lane, 1.2Gb/s D-PHY MIPI transmitter. Additional CMOS outputs are also NIR meter to make sure the optical power intensity at 940nm is equivalent to that
available for low-speed/low-cost systems. Two voltage domains are needed for from a ~100klux daylight. Thanks to the high absorption efficiency of Ge, we found
this image sensor. The higher one (VDDH) receives a voltage >2.7V and the lower very little difference between indoor and outdoor results. The die photo with
one (VDDL) receives a voltage of 1.8V. A temperature sensor is also implemented labeled size is shown in Fig. 5.3.7. We believe the demonstrated iToF image sensor
for the possible use of depth calibration and power control. All the blocks can be opens the possibility for operating at a longer NIR wavelength. The performance
accessed through an I2C interface. can be further improved by optimizing the microlens, bandpass filter, camera lens,
and laser source at the desired wavelengths.
The iToF image sensor is developed in a back-side illumination (BSI)
configuration, in which the Ge region is only available in the top wafer and the References:
circuits are located in the bottom wafer to simplify the integration process. These [1] C. S. Bamji et al., “1Mpixel 65nm BSI 320MHz Demodulated TOF Image Sensor
2 wafers are bonded together through a wafer bonding interface. A cross-section with 3.5um Global Shutter Pixels and Analog Binning,” ISSCC, pp. 94-95, Feb.
of the pixel region is shown in Fig. 5.3.3. The pixels follow a 2-tap lock-in pixel 2018.
architecture [5]. The differential demodulation clocks (CLKP and CLKN) are [2] Y. Kato et al., “320x240 Back-Illuminated 10um CAPD Pixels for High Speed
distributed on the top wafer so as to create a continuously switching lateral electric Modulation Time-of-Flight CMOS Image Sensor,” IEEE Symp. VLSI Circuits, pp.
field at the Ge surface (the side closer to VIA) between nodes Demod1 and C288-C289, June 2017.
Demod2 in every pixel. The photo-charges are collected through nodes FD1 and [3] M.-S. Keel et al., “A 640x480 Indirect Time-of-Flight CMOS Image Sensor with
FD2. Since most of the photo-charges are generated inside the Ge layer and the 4-tap 7-um Global Shutter Pixel and Fixed-Pattern Phase Noise Self-Compensation
Ge layer is very thin, the lateral electric field at the Ge surface can effectively sweep Scheme,” IEEE Symp. VLSI Circuits, pp. C258-C259, June 2019.
the photo-charges to node FD1 or FD2. Moreover, the transit time for the photo- [4] IEC 60825-1 Ed. 2.0 (2007); Safety of laser products – Part 1: Equipment
charges drifting to node FD1 or FD2 is short, again, due to the thin Ge layer, and Classification and Requirements.
therefore the demodulation speed is significantly improved. To minimize the [5] N. Na et al., “High-Performance Germanium-on-Silicon Lock-in Pixels for
coupling to any sensitive high-impedance node and relax the design rule Indirect Time-of-Flight Applications,” IEEE IEDM, pp. 751-754, Dec. 2018.
requirement, only FD1 and FD2 are interacted with the wafer bonding interface
overlapping with the pixel area. The CLKP and CLKN are routed to the bottom-
wafer clock drivers outside the pixel region. The pixel demodulation drivers are
realized with tapered inverter chains and the supply of the inverter chains can be
adjusted to maximize performance. The pixel circuits are implemented as a
differential 4-transistor architecture. A simplified timing diagram is also shown
Figure 5.3.1: MPE/Solar Irradiance versus wavelength. Figure 5.3.2: Block diagram of the iToF image sensor.
Figure 5.3.3: Architecture and operation principle of the pixel. Figure 5.3.4: Simplified circuit schematics.
Figure 5.3.5: Summary and comparison table. Figure 5.3.6: Depth images at 940nm.
Figure 5.3.S1: Measurement setup for 3D images. The halogen lamp power is
calibrated with an NIR meter to make sure the optical power intensity at 940nm
Figure 5.3.7: Die photo. is equivalent to that from a ~100klux daylight.
Figure 5.3.S2: Depth images at 1310nm. Because all the optical components
are optimized at 940nm, the sensor receives little signal power and large Figure 5.3.S3: σdepth dependency on phase offset. The phase offset can be
ambient noise, which results in poor image quality. adjusted in the pre-driver so as to achieve better image quality.
5.4 A Dynamic Pseudo 4-Tap CMOS Time-of-Flight Image Figure 5.4.3 shows the operation principle of the 4PH using the proposed dynamic
Sensor with Motion Artifact Suppression and P4T architecture. Two 2T pixels in adjacent rows operate as a P4T pixel that is
demodulated with four phases of demodulation signals in a single frame. In frame
Background Light Cancelling Over 120klux 1, even rows are demodulated with 0 and π phases to acquire ΔQ0 whereas odd
rows are demodulated with π/2-shifted phases for ΔQπ/2. Because each row has
Donguk Kim1, Seunghyun Lee2, Dahwan Park2, Canxing Piao1, only one ΔQ, the other should be interpolated from adjacent rows. Then the depth
Jihoon Park1, Yeonsoo Ahn1, Kihwan Cho1, Jungsoon Shin3, image can be generated in a single frame. In frame 2, phases are alternated by
Seung Min Song3, Seong-Jin Kim2, Jung-Hoon Chun1, Jaehyuk Choi1 changing the destination of even/odd rows by the APHD. If no motion is detected,
the depth image is generated using conventional 4PH with 2T pixels for higher
1
Sungkyunkwan University, Suwon, Korea depth accuracy. However, if any motion is detected, the depth image from the
2
Ulsan National Institute of Science and Technology, Ulsan, Korea P4T is acquired again to suppress the motion artifact that occurs in the two-frame
3
Zeeann, Hanam, Korea demodulation of 2T. To reduce the depth error from the interpolation, we devise
the P4T interpolator based on the depth gradient. By calculating the dual depth
An indirect time-of-flight (iToF) CMOS image sensor (CIS) is a device that provides gradients in the y-direction, ΔQ can be interpolated by applying higher weight
depth as well as a two-dimensional shape of an object by measuring the phase (based on the gradient) to the correlated row for reduced depth error. By using
difference of reflected pulse trains of light. Because the iToF CIS offers high spatial the hybrid acquisition using the P4T and 2T, a depth image with high depth
resolution from the scaled photodetectors such as pinned photodiodes (PPD) [1] accuracy for the background region and reduced motion artifact for the moving
or photogates [2], as well as high depth accuracy, it is advantageous for object objects can be synthesized.
or gesture recognition with higher accuracy compared with conventional 2D
imagers. Even though iToF CISs have been limited to indoor applications, such Figure 5.4.4 shows the switching BGLC circuit, including the over-pixel analog
as gaming, they have a strong demand for outdoor applications that include 3D memory. Because we get the Δ-signal from the two FD nodes, sampling of the
face recognition and augmented reality for mobile devices, gesture recognition reset signals for cancelling the pixel FPN into separate sample and hold circuits
for vehicles, and service robots. The operation principle of the iTOF CIS is a is redundant [3]. Instead, we switch the FD nodes by interchanging the
4-phase demodulation (4PH) that acquires four charges from four phases demodulation signals (TX0/π and TXπ/0) at every sub-integration time. In every other
(0, π/2, π, 3π/2) of demodulation. A conventional iToF CIS employs 2-tap (2T) integration time, the FPN is cancelled after accumulation (Σ). Each pixel includes
pixels that have a photodetector with two readout paths such that charges from two access switches and one MIM capacitor for an analog memory to store ΔQ.
two phases can be separated. For the 4PH, we need two successive frames: 0 Because the capacitance of the MIM capacitor is fixed, the input capacitor Cin
and π in the first, and π/2 and 3π/2 in the second frame [1-5]. Therefore, this consists of the capacitor bank to provide programmable gain (1~4).
conventional 4PH induces significant motion artifact that is critical for applications
of gesture recognition. Another critical problem is the weak immunity for A prototype sensor was fabricated using a 90nm BSI CIS process. Figure 5.4.5
background light (BGL), including strong indoor lighting or sunlight. Several iToF shows the captured depth images from the sensor. With hand movement,
CISs with integrated BGL cancelling are reported. However, they sacrifice significant motion artifact was measured in the conventional 4PH with 2T. This
sensitivity by connecting an additional capacitor to avoid saturation or sacrifice motion artifact is suppressed in the proposed P4T scheme. In order to show depth
spatial resolution by adding huge analog memories in the CIS core [3]. error around the edge of the target object, four magnified images are shown: 4PH
with 2T, P4T with regular interpolation, P4T with the proposed interpolation, and
In this paper, we present a 320×240 iToF CIS with motion artifact suppression up-scaled P4T with the proposed interpolation. As illustrated, the proposed P4T
and BGL cancelling over 120klx. With an on-chip image signal processor (ISP) interpolation suppresses the depth error that was induced from the P4T scheme.
with an up-scaler, the iToF CIS generates 640×480 depth images. To enhance the Two additional depth images for testing the BGLC are shown. To mimic strong
depth accuracy of the 2T pixels in high-frequency demodulation, we implement a sunlight, we located both the visible light lamp with 120klx and the IR light source
field-accelerated PPD with a trident-shaped implant (trident PPD). For the motion with 142μW/cm2 at 1m distance. Without activating the BGLC, the depth image
artifact suppression, we report a dynamic pseudo-4-tap (P4T) scheme that was corrupted from the saturation. This saturation is completely removed using
performs 4PH in a single frame by driving two 2T pixels with spatially-temporally the BGLC.
alternated phases. With the P4T scheme, the region with motion generates a depth
image in a single frame, whereas the region without motion generates depth image Figure 5.4.6 shows the chip characteristics and performance comparison. By
with higher accuracy in two frames. For cancelling BGL with high area efficiency using the P4T scheme with 2T pixels, motion artifact can be suppressed while
while suppressing the fixed pattern noise (FPN), we implement a switching ΔΣ achieving a high fill factor (43 %) in the 8μm pixel pitch using a 90-nm process.
BGL cancelling (BGLC) scheme with an over-pixel MIM capacitor using a backside Even though several previous works include on-chip BGLC, the spatial resolution
illumination (BSI) process. was limited under 84×64 because of large in-pixel circuits or separate frame
memories. This work significantly expands the spatial resolution of the depth
The sensor architecture is illustrated in Fig. 5.4.1. The sensor has a 320×240 pixel images with BGLC by application of the over-pixel analog memories with the BSI
array with a 4-shared pixel structure [4]. For the P4T, odd and even rows of pixels process. The chip photograph is shown in Figure 5.4.7.
are demodulated with spatially and temporally different phases from the alternate
phase driver (APHD). The 320MHz on-chip PLL generates the demodulation References:
signals of the APHD. Two outputs of the 2T pixel are input to the ΔΣ BGLC circuit. [1] S.-J. Kim et al., “A 1920×1080 3.65μm-Pixel 2D/3D Image Sensor with Split
Total integration time is divided with sub-integration time (TS1 to TSN) such that and Binning Pixel Structure in 0.11μm Standard CMOS,” ISSCC, pp. 396–397,
saturation from BGL is avoided while Δ-signals from pixels (ΔQ0 = Q0 – Qπ and Feb. 2012.
ΔQπ/2 = Qπ/2 – Q3π/2) are accumulated in the over-pixel analog memories. The [2] C. S. Bamji et al., “1Mpixel 65nm BSI 320MHz Demodulated TOF Image Sensor
accumulated signals through the BGLC are converted into 10b digital signals using with 3μm Global Shutter Pixels and Analog Binning,” ISSCC, pp. 94–95, Feb.
the single-slope (SS) ADC. The on-chip ISP filters out noise and dead pixels before 2018.
signal processing. In order to compensate possible suppression of depth-edge [3] J. Cho et al., “A 3-D Camera with Adaptable Background Light Suppression
in row direction from the P4T, depth interpolation is performed. Finally, the QVGA Using Pixel-Binning and Super-Resolution,” IEEE JSSC, vol. 49, no. 10, pp. 2319–
image is up-scaled to have VGA resolution. 2332, Oct. 2014.
[4] S. Lee et al., “Design of a Time-of-Flight Sensor with Standard Pinned-
Figure 5.4.2 shows the structure of the trident PPD. The trident PPD that was Photodiode Devices Toward 100-MHz Modulation Frequency,” IEEE Access, vol.
initially developed for the FSI process [4] is further optimized for the BSI CIS 7, pp. 130451–130459, Sept. 2019.
process. In order to enhance the speed of charge transfer from the PPD to the [5] Y. Kato et al., “320×240 Back-Illuminated 10-μm CAPD Pixels for High-Speed
floating diffusion (FD) node, the trident-shaped n- layer in the PPD increases the Modulation Time-of-Flight CMOS Image Sensor,” IEEE JSSC, vol. 53, no. 4, pp.
electron-potential. In addition, the secondary implant of the n+ layer nearby two 1071–1078, Apr. 2018.
transfer gates lowers the electron-potential for generating the lateral electric field [6] M. Keel et al., “A 640×480 Indirect Time-of-Flight CMOS Image Sensor with
that accelerates charge transfer. Owing to the acceleration, high-frequency 4-tap 7-μm Global- Shutter Pixel and Fixed-Pattern Phase Noise Self-
demodulation (>80MHz) with lower TX voltage (2.3V) is applicable. Compensation Scheme,” IEEE Symp. VLSI Circuits, pp. C258-C259, June 2019.
100 • 2020 IEEE International Solid-State Circuits Conference 978-1-7281-3205-1/20/$31.00 ©2020 IEEE
ISSCC 2020 / February 17, 2020 / 2:30 PM
Figure 5.4.2: PPD with trident implant: structure (top-left), simulation of charge
Figure 5.4.1: Sensor architecture. transfer (top-right), and potential diagrams (bottom).
Figure 5.4.3: Dynamic pseudo 4-tap (P4T) scheme: P4T scheme (top), P4T
interpolation (bottom-left), and hybrid depth imaging with P4T and 2T (bottom-
right). Figure 5.4.4: Switching ΔΣ background-cancelling circuit.
Figure 5.4.5: Captured depth images: 320×240 depth images with hand motion
(top), enlarged depth images with various processing (mid), and 320×240
images with BGL (bottom). Figure 5.4.6: Chip characteristics and performance comparison.
5.5 A 2.1e- Temporal Noise and -105dB Parasitic Light are the flicker noise from the SF1 and the thermal noise from the PC transistor.
Sensitivity Backside-Illuminated 2.3μm-Pixel The thermal noise of SF1 and PC is described in the following equation (1):
Voltage-Domain Global Shutter CMOS Image
Sensor Using High-Capacity DRAM Capacitor Technology where C = C1 =C2 (1)
Jae-kyu Lee, Seung Sik Kim, In-Gyu Baek, Heesung Shim, Taehoon Kim,
Taehyoung Kim, Jungchan Kyoung, Dongmo Im, Jinyong Choi,
KeunYeong Cho, Daehoon Kim, Haemin Lim, Min-Woong Seo, According to the equation, the thermal noise is inversely proportional to the
JuYoung Kim, Doowon Kwon, Jiyoun Song, Jiyoon Kim, Minho Jang, capacitor size. In order to reduce the noise of the voltage-domain GS image
Joosung Moon, HyunChul Kim, Chong Kwang Chang, JinGyun Kim, sensor, it is important to increase the capacitance of storage capacitors in each
Kyoungmin Koh, HanJin Lim, JungChak Ahn, Hyeongsun Hong, pixel because increasing capacitance reduces the bandwidth of the SF1 and PC
transistors, and KT/C noise caused by Sample/Cal transistors. Even with the small
Kyupil Lee, Ho-Kyu Kang
pixel size of 2.3μm, we integrated a high-capacity capacitor over 0.7pF to achieve
Samsung Electronics, Hwaseong, Korea RN as low as 2.1e-.
As the automotive and AI industries are expanding rapidly, global-shutter (GS) Figure 5.5.3 shows the design advantages of the BSI voltage-domain GS pixel
image sensors are playing a more significant role in the perception system. More structure compared to the conventional front-side illuminated (FSI) charge-
specifically, GS image sensors are required in various fields involving IR, including domain GS pixel structures. The BSI GS pixel enables a fill factor (FF) of nearly
the face-ID in mobile devices, the driver monitoring system in automotive 100% such that the PD can convert most of the incident light into photoelectrons,
applications, and factory automation. GS image sensors are necessary for these and deep-trench isolation (DTI) can reflect the scattered light reducing the optical
applications because they can capture freeze-frame images without motion crosstalk. A concave-type capacitor is also developed for the GS pixel and has an
distortion due to their advantage in the pixel operation method. The simultaneous order-of-magnitude higher capacitance than other types of trench capacitors with
pixel exposure and in-pixel storing capability allow GS image sensors to achieve a compatible leakage current control as shown in Fig. 5.5.3 [4,5]. In addition, VTG
high-quality imaging, while the sequential pixel exposure and readout of rolling- allows more space for integrating the pixel transistors, and effectively controlling
shutter (RS) image sensors results in image distortion known as the jello effect. the vertically extended PD for higher FWC and sensitivity.
For mobile and automotive applications, a small form factor while maintaining a
low parasitic light sensitivity (PLS) and low noise is crucial. In conventional back- Figure 5.5.4. shows PLS and RN as a function of the capacitance used in the pixel.
side illuminated (BSI) charge-domain GS image sensors, a light-shielding We also develop a GS pixel for enhancing the sensitivity at near-infrared (NIR)
structure over the storage area must be formed in order to suppress the influence region which has different silicon thickness and surface treatment for higher QE
of parasitic light during the readout operation. Therefore, the introduction of such at a wavelength of 940nm. Image sensors with a high QE are required to solve
a light-shielding structure reduces the effective photodiode area, which results in issues involving eye safety and power consumption. The QE of the GS image
a loss of full-well capacity (FWC), light sensitivity of the sensor, and pixel sensor is 32% at 940nm with the silicon thickness at 8μm, and increases to 42%
scalability. with the help of the BST pattern (see Fig. 5.5.4) on the backside of silicon. This
special pattern is designed considering optical properties and the backside etching
In this paper, a BSI voltage-domain GS CMOS image sensor (CIS) using a high process.
capacity DRAM capacitor technology is presented realizing both low PLS and
temporal random noise (RN) with a small form factor. A 1Mp (1280H×800V) with Figure 5.5.5 shows sample images. Both images are captured with GS mode (left)
2.3μm pixel-pitch GS image sensor is implemented using a 65nm stacked CIS and RS mode (right). These images can demonstrate that there is no image
process technology. We achieve a low PLS of less than -105dB and a temporal distortion like the jello effect in the GS mode.
RN of 2.1e-rms. Furthermore, 42% quantum efficiency (QE) at 940nm infrared Chip specifications are summarized in Fig. 5.5.6. The GS image sensor is
(IR) light is obtained by adopting back-side scattering technology (BST) [1] and fabricated in a 65nm stacked BSI CMOS process. The supply voltages are 2.8V
increasing the Si thickness to extend the beam path through the photodiode (PD). for the pixel and analog circuits, and 1.05V for the digital circuits. The sensor
Such a small GS pixel is realized by adopting a high-capacity capacitor using an achieves a readout speed of 8.3ms in the GS mode (1Mpixels). A RN level of 2.1e-
advanced DRAM technology and a vertical transfer gate (VTG) technology that rms at analog gain of +18dB is maintained under dark condition. A saturation
maximizes the area of the silicon surface for the pixel transistors. signal level of 6,000e- in the HCG mode is extended to 12,000e- in the LCG mode.
A schematic diagram of the voltage-domain GS pixel with dual conversion gain A sensitivity of 18ke-/lux·sec and a PLS of less than -105 dB are achieved. An
(DCG) transistor for higher dynamic range (DR) and bypass transistor for RS extremely high QE at IR 940nm of 42% is also achieved.
mode readout is shown in Fig. 5.5.1. The switched-capacitor structure is similar References:
to that of a previously reported scheme [2]. In the proposed pixel, however, the [1] J.H. Park et al. “Pixel Technology for Improving IR Quantum Efficiency of
operating voltages are optimized to have a better hot carrier immunity, dark Backside-illuminated CMOS Image Sensor”, Int. Image Sensor Workshop, 2019.
leakage, voltage stability, and RN that can be increased by the gain factor of the [2] Y. De Wit and T. Geurts, "A Low Noise Low Power Global Shutter CMOS Pixel
charge-sharing operation of the previous capacitor system [3]. Two high-capacity Having Single Readout Capability and Good Shutter Efficiency", Int. Image Sensor
capacitors, C1 and C2, are used as analog memories to store voltages of the signal Workshop, 2011.
and the reset with low RN, and they are also very effective in reducing the PLS [3] C, Xu et al., “A Stacked Global-Shutter CMOS Imager with SC-Type Hybrid-
because the sensitivity of storage node is inversely proportional to the capacitance GS Pixel and Self-Knee Point Calibration Single-Frame HDR and On-Chip
of the node. The bypass transistor, SEL2, simplifies the column readout chain by Binarization Algorithm for Smart Vision Applications,”, ISSCC, pp. 94-95, Feb.
sharing the same column signal line. Another transistor, DCG, is used to select 2019.
the conversion gain mode that extends the saturation signal from 6,000e- at the [4] M. Suzuki et al., “An Over 1Mfps Global Shutter CMOS Image Sensor with
high CG (HCG) of 140μV/e- to 12,000e- at the low CG (LCG) of 70μV/e-. 480 Frame Storage Using Vertical Analog Memory Integration,” IEEE IEDM, pp.
The clock timing for the GS image sensor is shown in Fig. 5.5.2. After the charge 212-215, 2016.
integration, RX and Cal switches toggle to initialize all floating nodes in the pixel [5] M. Takase et al., ”An Over 120 dB Wide- Dynamic- range 3.0 μm Pixel Image
such as FD, X, and Y. The clock TX toggles to transfer the signal charge from the Sensor with In-pixel Capacitor of 41.7 fF/μm2 and High Reliability Enabled by BEOL
PD to FD node, where the charge level is converted to a voltage level. The 1st 3D Capacitor Process,” IEEE Symp. VLSI Tech., pp.71-72, 2018.
source follower (SF1) transistor transfers the voltage level to the X-node via the [6] T. Geurts et al., “A 25 Mpixel, 80fps, CMOS Imager with an In-Pixel-CDS Global
Sample transistor and changes the voltage level of Y-node by capacitive coupling Shutter Pixel,” Int. Image Sensor Workshop, pp. 7.02, June 2015.
of C2 from its initialized value. This operation is called global dump for signal [7] L. Stark et al., “Back-Illuminated Voltage-Domain Global Shutter CMOS Image
sampling. Once the global dump operation is completed, the rolling readout (row- Sensor with 3.75μm Pixels and Dual In-Pixel Storage Nodes,” IEEE Symp. VLSI
by-row) operation starts. Each pixel signal stored at the Y-node is first digitized Tech., pp. 1-2, June 2016.
by a column-parallel ADC, and then the reset level of the same pixel is digitized [8] Y. Oike et al., “8.3 M-Pixel 480-fps Global-Shutter CMOS Image Sensor with
sequentially by turning the Cal switch on. With these signals, the double sampling Gain-Adaptive Column ADCs and Chip-on-Chip Stacked Integration,” IEEE JSSC,
operation is performed to remove the FPN caused by the variation of threshold vol. 52, no. 4, pp. 985-993, Apr. 2017.
voltage of the SF2 transistor and the readout circuitry. [9] M. Sakakibara et al., “A Back-Illuminated Global-Shutter CMOS Image Sensor
with Pixel-Parallel 14b Subthreshold ADC,” ISSCC, pp. 80-81, Feb. 2018.
Despite this effort to improve the noise performance, the pixel noise is generated
due to the pixel’s scheme and operation. Two major components of the pixel noise
102 • 2020 IEEE International Solid-State Circuits Conference 978-1-7281-3205-1/20/$31.00 ©2020 IEEE
ISSCC 2020 / February 17, 2020 / 3:15 PM
Figure 5.5.3: A comparison of pixel structure with conventional charge-domain Figure 5.5.4: A Parasitic Light Sensitivity (PLS) and Random Noise (RN) with
and voltage-domain global shutter. The Vertical Transfer Gate (VTG) and DRAM respect to a capacitance. Two 0.72pF capacitors are used in a 2.3μm pixel. A
capacitor technology are used for the pixel shrink. The capacitance per unit measured quantum efficiency (QE) curve is shown on the right. Inset figures
area is compared with different capacitor structures. are SEM images of BST and simulation result for optical power flux density.
Figure 5.5.5: Sample images (left: global-shutter mode, right: rolling-shutter Figure 5.5.6: Performance summary of 2.3μm global-shutter pixel CMOS image
mode). sensor.
5.6 A 1/2.65in 44Mpixel CMOS Image Sensor with 0.7µm to solve this problem, we change the DTI gap-filled material from un-doped
Pixels Fabricated in Advanced Full-Depth Deep-Trench polysilicon to boron-doped polysilicon, whose resistivity is lowered to ~10-6Ω·m,
which means the RC delay is on the order of nano-seconds rather than a few
Isolation Technology seconds. A dark current of 1.3e-/s, which is equivalent or lower than in the
previous generation, is obtained.
HyunChul Kim, Jongeun Park, Insung Joe, Doowon Kwon,
Joo Hyoung Kim, Dongsuk Cho, Taehun Lee, Changkyu Lee, Figure 5.6.3(a) shows the results of a TCAD simulation used to demonstrate that
Haeyong Park, Soojin Hong, Chongkwang Chang, Jingyun Kim, this work achieves similar full well capacity (FWC) as the previous generation by
Hanjin Lim, Youngsun Oh, Yitae Kim, Seungjoo Nah, Sangill Jung, overcoming physically reduced volume, and (b) the actual FWC generation-by-
generation trend. In order to achieve the same FWC as the previous generation,
Jaekyu Lee, JungChak Ahn, Hyeongsun Hong, Kyupil Lee, Ho-Kyu Kang
two approaches are applied: physical and electrical. First, as mention earlier, with
the development of advanced DTI process technology, we minimize the physical
Samsung Electronics, Hwaseong, Korea
volume reduction of the actual photodiode by reducing the DTI CD based on a
higher-aspect-ratio DTI process. Second, plasma doping (PLAD) at the silicon
As the smart mobile device market continues to grow and the number of cameras
sidewall right after DTI etching process is applied to reduce the dark current of
per device rapidly increases, demand for CMOS image sensors (CIS) also
the FDTI pixel [6]. The FDTI pixel has a three-dimensional structure, so we can
increases. Two major trends in mobile device cameras are: (1) adopting smaller
maintain the FWC by reducing the p-type doping concentration of side walls to
pixels that enable greater pixel count at similar optical format, and (2) bigger
effectively increase the electrical photodiode volume. That means the P/N junction
pixels for higher image quality. To be more specific, front-facing cameras have a
line of the photodiode is more outward from the center of the pixel. With these
trend towards smaller pixels, while rear main cameras have a trend towards both
two approaches, a FWC of 6,000e-, which is equivalent to the previous generation,
smaller and bigger pixels. The optical format of front-facing cameras is especially
is obtained.
limited due to existing bezel-less or border-less display designs, yet higher
resolution still-shot and video (such as 4K UHD) recording is desired. To
Pixel scaling gives more pixels within the same optical format, or smaller form
implement greater pixel count in a limited camera module size, scaling of pixel
factor at the same resolution. But sensitivity drops as the number of incoming
size is required. The main challenges are to maintain acceptable photodiode full-
photons per unit pixel decreases, and usually quantum efficiency (QE) also drops
well capacity (FWC) and sensitivity, while suppressing optical crosstalk [1]. To
because pixel isolation structures (i.e., DTI) does not scale linearly and photons
completely eliminate both electrical and optical crosstalk, deep-trench isolation
are lost to adjacent color filters and metal grid as they are placed near each other.
(DTI) has evolved from early BDTI (Back-side DTI) to current FDTI (Front-side
Therefore, one of the challenges to decrease pixel size is to maintain or increase
DTI) technology, which is also called full-depth DTI. In this paper, a 44Mpixel CIS
the QE performance. We increase QE of the pixel by (1) increasing the depth of
with 0.7μm pixels using full-depth DTI is demonstrated.
the photodiode, (2) decreasing the DTI CD to maximize the effective photodiode
area, (3) optimizing red color filter absorption spectrum to increase the red peak
The evolution of FDTI structures as pixel scales is shown in Fig. 5.6.1. In general,
transmission, and (4) increasing the height of LRI grid structure to minimize
silicon thickness increases and DTI critical dimension (CD) decreases in order to
optical loss to adjacent color filters and metal grid structures [7]. A full-height
obtain similar pixel and optical characteristics as in previous generations [2,3].
low-refractive-index (LRI) grid structure is applied to a 0.7μm pixel to guide light
Therefore, the aspect ratio (A/R) of the FDTI abruptly increases, and the technical
effectively into the silicon photodiode. The TCAD simulation result in Fig. 5.6.4
difficulty to implement high A/R also rises accordingly. In this work, a technique
shows that the incoming light intensity of the full-height LRI structure case is
similar to dry-etch and polysilicon gap-fill technology used in advanced logic or
higher than that of conventional metal grid structure case.
memory processes is introduced. The success of high-aspect-ratio silicon etching
depends on controlling the lateral etch and enhancing the vertical etch rate [4].
The pixel performance is summarized in Fig. 5.6.5. Though 20lux YSNR degraded
The final desired structure is implemented using the latest silicon etch equipment,
from 32.7dB to 31.7dB due to shrinkage of pixel size, dark current is superior to
even though the DTI etch A/R is increased from 45 to 69. A dry etch process
the previous generation, and the remaining characteristics are the same as the
inevitably generates by-products and requires removing them through a cleaning
previous generation. Figure 5.6.6 shows sample images captured with the 0.7μm
process. Historically, advanced logic or memory processes based on design rule
pixels and the 0.8μm ones.
shrink have suffered from pattern leaning by the cleaning process added after
high-aspect-ratio contact (HARC) etching [5]. In this work, a similar pattern
In summary, we have demonstrated a 1/2.65-inch 44Mpixel high-resolution
leaning is now observed in sensor devices. Because the silicon photodiode
CMOS image sensor with 0.7μm pixels fabricated by an advanced full-depth DTI
structure was determined to have attained the desired sensor characteristics, we
process. Dark current, full well capacity and quantum efficiency, which are
had to lower surface tension through change from a spin deionized (DI) water
challenging characteristics to maintain as pixel scales, are equivalent or better
dryer to a hot isopropyl alcohol (IPA) dryer without changing the structure itself.
than the previous generation by utilizing process technologies that are introduced
In order to finalize the full-depth DTI along the periphery of each pixel for complete
in this work.
isolation, insulating oxide and polysilicon are used. As the DTI aspect ratio
increases, a new polysilicon in situ deposition-etching-deposition (DED) process
References:
is introduced to improve polysilicon gap-fill capability. As a result, even with a
[1] S. Takahashi et al. "A 45 nm stacked CMOS Image Sensor Process Technology
higher aspect ratio, polysilicon gap-fill improvement is demonstrated.
for Submicron Pixel," Sensors, vol. 17, p. 2816, 2017.
[2] J.C. Ahn et al., "A 1/4-inch 8Mpixel CMOS Image Sensor with 3D Backside-
Negative-biased DTI (NDTI) is also used, as in previous FDTI generations. A
Illuminated 1.12μm Pixel with Front-Side Deep-Trench Isolation and Vertical
negative bias was applied to the polysilicon for the photodiode surface area ‘hole
Transfer Gate," ISSCC, pp. 124-125, Feb. 2014.
accumulation’. Dark current was suppressed by locating holes in the damaged
[3] Y. Kim et al., "A 1/2.8-inch 24Mpixel CMOS Image Sensor with 0.9μm Unit
interface caused by the DTI etching process, allowing generated electrons from
Pixels Separated by Full-Depth Deep-Trench Isolation," ISSCC, pp. 84-85, Feb.
the interface to quickly recombine with the accumulated holes. As pixel size
2018.
shrinks and the number of pixels increases, this NDTI scheme begins to suffer
[4] B. Wu et al., “High Aspect Ratio Silicon Etch: A Review,” J. Applied Physics,
from a chip-level dark current spatial gradient, which comes from its intrinsic RC
vol. 108, no. 5, pp. 9-252, 2010.
delay time. This RC delay increases with the number of pixels (increasing
[5] C.H. Kim et al., “Improved Drying Technology of Single Wafer Tool by Using
resistance and capacitance) and smaller pixel size (increasing resistance). This
Hot IPA/DIW,” Solid State Phenomena, vol. 195, pp. 243–246, 2012.
RC delay comes from the large resistivity of un-doped polysilicon, which is used
[6] C.R. Moon et al., “Application of Plasma-Doping (PLAD) Technique to Reduce
as DTI gap-filled material (simulated to be ~102Ω·m, R=ρl/A, R: resistance,
Dark Current of CMOS Image Sensors,” IEEE Electron Device Letters, pp. 114-
ρ: resistivity, l: length of resistor, A: cross area of resistor), and the capacitance
116, Feb. 2007.
from larger surface area due to increased number of pixels (C=εA/d, C:
[7] C.H. Lin et al. "1.1μm Back-Side Illuminated Image Sensor Performance
capacitance, ε: dielectric constant, A: area of a parallel plate, d: distance between
Improvement," Int. Image Sensor Workshop, 5.06, 2013.
two parallel plates). As seen in Fig. 5.6.2, un-doped polysilicon shows high dark
current right after streaming on, and decays with a time constant (~2.3s). In order
104 • 2020 IEEE International Solid-State Circuits Conference 978-1-7281-3205-1/20/$31.00 ©2020 IEEE
ISSCC 2020 / February 17, 2020 / 3:30 PM
Figure 5.6.2: (a) Equivalent circuit for an individual pixel with NDTI scheme.
Figure 5.6.1: (a) Schematic diagram of the full-depth DTI structure, by (b) Schematic for grid resistor model. We assume that the component of the
generation. (b) DTI etching aspect ratio and (c) polysilicon gap-fill aspect ratio, longest RC-delayed signal comes from the corner of the APS area. (c) Dark
by generation. current results when stream on/off toggle.
Figure 5.6.4: (a) Comparison of the difference in light incidence with W-grid
Figure 5.6.3: (a) Comparison of schematic pixel design with TCAD simulation and full height LRI grid and (b) normalized quantum efficiency between 0.8μm
and (b) comparison of predicted and achieved FWCs, by generation. pixel and 0.7μm pixel.
5.7 A 132dB Single-Exposure-Dynamic-Range CMOS Image in FC are merged with that of SP2 and read-out as SP2L. In this manner, four
Sensor with High Temperature Tolerance signals are read out by one exposure. The signal with the highest SNR is selected
for each pixel, and one image is composited.
Yorito Sakano1, Takahiro Toyoshima1, Ryosuke Nakamura1, Figure 5.7.4 shows a simplified potential diagram corresponding to Fig. 5.7.3.
Tomohiko Asatsuma1, Yuki Hattori1, Takayuki Yamanaka2, Each cross section of Fig. 5.7.4 shows the path of the dotted line A-B and C-B in
Ryoichi Yoshikawa2, Naoki Kawazu1, Tomohiro Matsuura1, the left side of Fig. 5.7.4. In the beginning of the exposure period, the
Takahiro Iinuma1, Takahiro Toya1, Tomohiko Watanabe2, photoelectrons of SP1 are reset by switching TGL on/off (a) and those of SP2 and
Atsushi Suzuki1, Yuichi Motohashi1, Junichiro Azami1, FC are also reset by switching TGS and FCG on/off (b). Here, FDG and RST are
Yasushi Tateshita1, Tsutomu Haruta1 always on. During the exposure period, FDG and RST are continuously on; hence,
FD1 and FD2 are always reset; the photoelectrons that come from SP2 are
1
Sony Semiconductor Solutions, Atsugi, Japan, accumulated in both FD3 and FC (c). At the end of the exposure period, the reset
2
Sony Semiconductor Manufacturing, Kikuyo, Japan level of SP1L (R2) is first sampled when RST is switched off (d). Next, the reset
level of SP1H (R1) is sampled when FDG is switched off. (e) Then, the
Dynamic range is becoming a key performance parameter for CMOS image photoelectrons accumulated in SP1 are transferred to FD1 by switching TGL on/off
sensors (CISs), especially for the surveillance and automotive fields. Several well- once, and the signal level of SP1H (S1) is sampled (f). At this time, the
known multi-exposure methods are widely used to expand dynamic range (DR). photoelectrons exceeding the capacity of FD1 are left in SP1. After that FD1 is
However, those methods cause issues such as motion artifacts and light emitting connected to FD2 by switching FDG on and the photoelectrons remaining in SP1
diode (LED) flickering with the composite images. Therefore, the single-exposure are fully transferred by switching TGL on/off again, the signal level of SP1L (S2)
method is critical to the quality of high-DR images. The most obvious way to is sampled (g); thus, both SP1H and SP1L can be read-out by performing
expand single-exposure DR is by employing in-pixel capacitors. One of the most correlated double sampling (CDS) for each reset and signal level. In addition, due
well-known technologies with in-pixel capacitors is the lateral overflow integration to the 2-inputs system of the comparators as mentioned above, the auto-zero
capacitor (LOFIC), which is characterized by a unique signal read-out method for function can be performed for both read-outs and the offset of the comparators
in-pixel capacitors [1,2]. Also, a CIS that uses an organic-photoconductive-film can be cancelled by switching AZP0 and AZP1 on/off. Subsequently, the reset
and an in-pixel capacitor is reported in [3]. Spatial sampling with sensitivity ratio level of SP2 (R3) is sampled after that FD1 and FD2 are reset by switching RST
is another common method to expand single-exposure DR [4]. However, on/off; FD3 is connected to FD1 and FD2 by switching FCG on. (h) Following that,
assuming practical resolution, the LOFIC technology or spatial sampling with the photoelectrons accumulated in SP2 are fully transferred to FD3 by switching
sensitivity ratio can achieve less than 100dB single-exposure DR. A method TGS on/off, and the signal level of SP2 (S3) is sampled (i); thus, SP2H can be
combining spatial sampling with sensitivity ratio and an in-pixel capacitor is also read-out by performing CDS for R3 and S3. Here, R3 and S3 also include the
presented in [5] to expand its single-exposure DR. This sub-pixel architecture can photoelectrons accumulated in FC. And then, SP2L is read-out by performing
achieve over 120dB single-exposure DR. But another challenge arises: at high delta reset sampling (DRS) in which a signal level of FC (S4) is sampled first (j),
temperature, the degradation in signal-to-noise ratio (SNR) at the composite followed by a reset level of FC (R4). S4 is at the same level as S3 in terms of
boundary become significant. In this work, we present a prototype of 5.4Mpixel voltage domain. R4 is sampled by switching RST on/off and the photoelectrons
stacked back-illuminated CIS using the sub-pixel architecture with high- of FD1, FD2, and FD3 are reset (k).The drawback of DRS is that kTC noise cannot
temperature tolerance. be removed; however, it can be suppressed by sufficiently enlarging the
capacitance of FC. Also, the noise of SP2L has large temperature dependency due
Figure 5.7.1 shows a simplified block diagram and schematic of a pixel and to the fixed-pattern noise (FPN), which is caused by dark current generated in
comparator. The pixel employs a large photodiode (SP1), a small photodiode FD3. The FCVDD in the accumulation period is set lower than that in the read-out
(SP2), an in-pixel floating capacitor (FC), and seven transistors. The SP2 period of SP2H and SP2L to reduce the FPN of SP2L.
has 1/14.5 of SP1 sensitivity and SP2 linear full-well capacity (FWC) of
165800e- attributed to the FC, thus effectively handling electrical charges reaching Figure 5.7.5 shows composite images of subject with intermediate tone at RT, at
2400ke-. Considering SP1 random noise at high conversion gain mode of Tj=100°C, and the absence of SP2H for each temperature. Thanks to the presence
0.6 e-rms, simple DR is calculated as 132dB. The seven transistors are as follows: of SP2H, 25dB of minimum composition SNR is maintained even at Tj=100°C,
transfer gate of SP1 (TGL), transfer gate of SP2 (TGS), floating diffusion gate which clearly confirms the effectiveness of the architecture. Figure 5.7.6
(FDG), floating capacitor gate (FCG), reset transistor (RST), select transistor summarizes the characteristics of the prototype and shows the chip micrograph.
(SEL), and source follower amplifier (AMP). A floating diffusion (FD) is separated Figure 5.7.7 shows the performance comparison. Within the comparison table,
as FD1, FD2, and FD3 by FDG and FCG, which serve as a switch to connect those this work realizes the best performance from low-light to high-light conditions.
FDs. The two electrodes of FC are connected to FD3 and the counter electrode of We fabricated a 5.4Mpixel CIS using the sub-pixel architecture with high-
which supply voltage is FCVDD, respectively. temperature tolerance. The FC and some pixel transistors are embedded above
the SP1. And also, SP2H is implemented between SP1L and SP2L. Thanks to this
This prototype consists of a pixel chip and logic chip. The pixel chip is fabricated architecture, the single-exposure DR of 132dB can be achieved while maintaining
by using a 90nm process for FEOL and 65nm process for BEOL. Read-out circuits 25dB of minimum composition SNR at Tj=100°C.
(load MOS transistors, column ADCs, DAC), driver circuits (row drivers, row
decoders), image signal processor, and other circuits (PLL, MIPI I/F, CPU, etc.) References:
are all mounted to the logic chip using a 40 nm process. The comparators are [1] K. Miyauchi et al., “High Optical Performance 2.8μm BSI LOFIC Pixel with
characterized by having 2-input systems (two input capacitances, two differential 120ke- FWC and 160μV/e- Conversion Gain,” Int. Image Sensor Workshop, pp.
transistors, and two auto-zero switches). 246-249, 2019.
[2] M. Oh et al., “3.0um Backside Illuminated, Lateral Overflow, High Dynamic
Figure 5.7.2 shows the simplified pixel cross sectional view. The FC and some Range, LED Flicker Mitigation Image Sensor,” Int. Image Sensor Workshop, pp.
pixel transistors are embedded above the SP1. Due to this structure, both FWC 262-265, 2019.
of SP1 and FC can be improved. A vertical transistor needs to be implemented [3] M. Takase et al., “An Over 120 dB Wide-Dynamic-Range 3.0 μm Pixel Image
for TGL to transfer the photoelectrons from SP1 to FD1. Sensor with In-Pixel Capacitor of 41.7 fF/um2 and High Reliability Enabled by BEOL
3D Capacitor Process,” IEEE Symp. VLSI Tech., pp. 71-72, 2018.
The simplified timing diagram is shown in Fig. 5.7.3. The electrical charges [4] J. Solhusvik et al., “A 1280x960 2.8μm HDR CIS with DCG and Split-Pixel
accumulated in SP1 are converted to a signal voltage in two modes, namely high Combined,” Int. Image Sensor Workshop, pp. 254-257, 2019.
conversion gain mode (SP1H) and low conversion gain mode (SP1L) by switching [5] S. Iida et al., “A 0.68e-rms Random-Noise 121dB Dynamic-Range Sub-Pixel
FDG. The conversion gain of SP1H achieves as high as 197μV/e- by reducing its Architecture CMOS Image Sensor with LED Flicker Mitigation,” IEDM, pp. 10.2.1-
capacitance, so the random noise of SP1H can be 0.6e-rms. The electrical charges 10.2.4, 2018.
accumulated in SP2 are read-out as SP2H. And the electrical charges accumulated
106 • 2020 IEEE International Solid-State Circuits Conference 978-1-7281-3205-1/20/$31.00 ©2020 IEEE
ISSCC 2020 / February 17, 2020 / 3:45 PM
Figure 5.7.1: Simplified block diagram and schematic. Figure 5.7.2: Simplified pixel cross-sectional view.
Figure 5.7.3: Simplified timing diagram. Figure 5.7.4: Simplified potential diagram.
Figure 5.7.5: Composite image comparison. Figure 5.7.6: Chip characteristics and micrograph.
Figure 5.7.S1: The noise of SP2L is larger than that of SP2H due to the kTC
noise even at room temperature (RT), and also has large temperature
Figure 5.7.7: Performance comparison. dependency.
Figure 5.7.S2: The synthetic photo-response characteristics have sufficient Figure 5.7.S3: Thanks to the presence of SP2H, 25dB of minimum composition
linearity. SNR is maintained even at the signal-to-noise ratio curve (Tj=100°C).
5.8 A 0.50e-rms Noise 1.45μm-Pitch CMOS Image Sensor with The differential pair is connected to the PMOS current mirror and current source
Reference-Shared In-Pixel Differential Amplifier at through vertical lines. The same composition is configured in each column, and
each column circuit is connected to the shared COM and shared VSL_REF lines.
8.3Mpixel 35fps The sensor can switch to a conventional SF mode by turning on/off switches in
the column circuit area when high-dynamic-range imaging is needed. Therefore,
Mamoru Sato, Yuhi Yorikado, Yusuke Matsumura, Hideki Naganuma, no additional transistors are required within the pixel area compared to
Eriko Kato, Takuya Toyofuku, Akihiko Kato, Yusuke Oike conventional SF. Although the number of vertical metal lines in the pixel area
increases to configure an RSDA, there is no deterioration in sensitivity due to BI
Sony Semiconductor Solutions, Atsugi, Japan structure.
Low noise and high dynamic range are essential features for imaging under both This work combines an RSDA with correlated multiple sampling (CMS) to improve
dark and bright conditions. In recent years, these features have been required for noise. While the settling time of the common-source amplifier is longer than that
small pixel CMOS image sensors (CISs) to realize high S/N and resolutions while of the source-follower, its gain is high and quantization noise is relaxed; therefore,
maintaining a high frame rate. An effective way to improve readout noise is to ADC resolution and AD conversion time is reduced. We achieved a speed over
apply high gain at the front stage of the read chain to suppress noise originating 30fps, even when utilizing two multiple samplings. The reset level and signal level
at the back stage; sub-electron read noise CISs have been developed [1-3]. These were each sampled and acquired twice by the ADC.
approaches, however, are not suited for pixel size shrinkage, as they require the
following transistors within the pixel area: a PMOS or an additional NMOS; the Figure 5.8.3 shows the measurement results of random noise and the maximum
NMOS configures a low conversion gain mode to readout the entire signal of full- frame rate for each mode. The value of the former, of SF and DA, is 1.14e-rms at
well capacity. A quanta image sensor that can achieve photon count level noise 12b 60fps, and 0.90e-rms at 10b 49fps, respectively. The improvement in noise is
(<0.3e-rms) by minimizing the FD node capacitance for small size pixels [4] is not low despite the high-gain-readout of the DA because the noise source has
cost effective, as its full-well capacity is small, and a frame-integration system is increased in the differential pair. RSDA reduces the noise originating from the
necessary for shooting bright scenes. An in-pixel differential common-source reference pixel to a negligible level and thus achieves a random noise level of
amplifier (DA) applies high gain at the pixel and is suitable for shrinking because 0.63e-rms at 10b 49fps. This sensor realizes a random noise of 0.50e-rms at 10b
it can switch to low-gain mode without adding extra transistors within the pixel 35fps by combining conventional CMS techniques with the RSDA architecture.
area [5]. However, the effect of noise reduction is limited as the noise source As the random noise is 0.86e-rms at 12b 36fps when CMS is applied to SF, RSDA
component of DA doubles. This paper presents a 1.45μm pixel, back-illuminated has a larger noise-reduction effect. Figure 5.8.4 shows the results of random noise
stacked 1/2.8-inch CIS that uses a reference-shared in-pixel differential common- measurements at room temperature described in histograms. The images
source amplifier (RSDA) with correlated multiple sampling at 8.3Mpixel 35fps captured using this sensor are shown in Fig. 5.8.5. The upper image was captured
and has a readout noise of 0.50e-rms. using SF mode under high illuminance conditions of 1600lux. The bottom image
was captured in RSDA mode with CMS under low illuminance conditions of 0.5lux.
The RSDA-equivalent circuit and its timing diagram are shown in Fig. 5.8.1. A
readout pixel and a non-readout reference pixel are selected to constitute a DA. Figure 5.8.6 shows the summary of chip characteristics. The top and bottom parts
The drain of the RST transistor is connected to a VRD line, the source of the AMP are each fabricated using a 90nm 1P4M BI-CIS and 55nm 1P7M logic process.
transistor is connected to a current source via a COM line, and the PMOS cascode The pixel size is 1.45μm (H) × 1.45μm (V). SF mode operates at 12b 60fps with
current mirror circuit is connected to each VSL line. The number of configured a power consumption of 450mW and a CG of 75μV/e-. The RSDA mode combined
DAs (each labeled ‘k’), is equivalent to the number of simultaneous readout pixels. with CMS (M=2) operates at 10b 35fps, uses 550mW of power, and has a CG
Every reference side VSL (VSL_REF) and COM line are connected and shared value of 560μV/e-. The increase in power consumption in RSDA mode is mainly
through a shared VSL_REF line and a shared COM line, thus configuring a due to the current of the reference pixel required to configure the differential pair.
reference-shared in-pixel differential amplifier. Since thousands of reference pixels While the random noise of SF mode is 1.14e-rms at peak, RSDA mode is
are connected in parallel, their W/L size becomes extremely large and the noise 0.50e-rms. The VSL voltage swing of RSDA mode is 200mV, which is limited to
from the reference pixel side of the differential pair is reduced to a negligible level, ensure the operation point of the PMOS cascode transistor. The PRNU of RSDA
thereby preventing noise deterioration due to the use of the DA. While the mode is suppressed to 2.5% by the negative-feedback reset technique. We
conversion gain (CG) of a conventional source-follower (SF) readout is determined achieve low noise, high resolution, and high frame rate results of FOM 6.6. FOM
by FD node capacitance (CFD), the CG of DA readout, CGDA, is expressed by is defined as random noise / (number of pixels (V) × frame rate). The chip
e/((CFD+CB)/A+CB). Here, A is the absolute value of the in-pixel amplifier’s open- micrograph is shown in Fig. 5.8.7.
loop gain, CB is feedback capacitance and e is the elementary charge. Since the
dominant term of CGDA is CB and it is one order of magnitude smaller than CFD, an References:
extremely high CG can be realized. [1] S. Wakashima et al., “A Linear Response Single Exposure CMOS Image Sensor
with 0.5e- Readout Noise and 76ke- Full Well Capacity,” IEEE Symp. VLSI Circuits,
The RDSA circuit operation is shown in the timing diagram. First, the SEL pp. C88-C89, June 2015.
transistor of each readout pixel and reference pixel are turned on. Then, the RST [2] M.W. Seo et al., “A 0.44e-rms Read-Noise 32fps 0.5Mpixel High-Sensitivity
transistor of both readout pixels and reference pixels are turned on in the reset RG-Less-Pixel CMOS Image Sensor Using Bootstrapping Reset,” ISSCC, pp. 80-
phase. The reset level VRST is applied to the reference pixel FD, and the readout 81, Feb. 2017.
pixel RST transistor short-circuits the VSL to the FD and forms a negative [3] C. Lotto et al., “A Sub-Electron Readout Noise CMOS Image Sensor with Pixel-
feedback. The pixel-to-pixel mismatch of common-source transistor threshold Level Open-Loop Voltage Amplification,” ISSCC, pp. 402-403, Feb. 2011.
voltages is stored in each FD by the negative feedback reset [3] and the offset is [4] S. Masoodian et al., “Room Temperature 1040fps, 1 Megapixel Photon-
cancelled out by the CDS. Thereby, the deterioration of variations in CG can be Counting Image Sensor with 1.1um Pixel Pitch,” Proc. SPIE 10212, Advanced
suppressed. In addition, the reset level of the VSL can be set to any appropriate Photon Counting Techniques XI, 102120H, May 2017.
voltage in the reset phase because the common mode reset-feedthrough does [5] J. Choi et al., “A 1.36μW Adaptive CMOS Image Sensor with Reconfigurable
not affect the operating point of the DA; therefore a certain amount of output swing Modes of Operation From Available Energy/Illumination for Distributed Wireless
can be guaranteed. The signal charge of the readout pixel is transferred to the FD Sensor Network,” ISSCC, pp. 112-113, Feb. 2012.
and its voltage is inverted and amplified. [6] V.C. Venezia et al., “1.5μm Dual Conversion Gain, Backside Illuminated Image
Sensor Using Stacked Pixel Level Connections with 13ke- Full-Well Capacitance
We fabricated a stacked back-illuminated CMOS image sensor (BI-CIS). Figure and 0.8e- Noise,” IEEE IEDM, pp. 217-220, Dec. 2018.
5.8.2 shows the chip block diagram and the circuit diagram of RSDA. The top and [7] S.F. Yeh et al., “A 0.66erms Temporal-Readout-Noise 3D-Stacked CMOS
bottom parts are stacked using Cu-Cu connection technology. A 2 (H) × 2 (V) Image Sensor with Conditional Correlated Multiple Sampling (CCMS) Technique,”
pixel unit shares an FD and consists of 7 transistors / 4 PDs. The pixel in the row IEEE Symp. VLSI Circuits, pp. C84-C85, June 2015.
adjacent to the readout pixel is selected as a reference pixel. Together with the
readout pixel, they form a differential pair, and the selected row coordinates of
such a pair changes in accordance with a conventional rolling-shutter readout.
108 • 2020 IEEE International Solid-State Circuits Conference 978-1-7281-3205-1/20/$31.00 ©2020 IEEE
ISSCC 2020 / February 17, 2020 / 4:15 PM
Figure 5.8.1: Equivalent circuit diagram for a reference-shared in-pixel Figure 5.8.2: Block and circuit diagrams for a reference-shared in-pixel
differential amplifier and its timing diagram. differential common-source amplifier.
Figure 5.8.3: Results of random noise measurements at room temperature and Figure 5.8.4: Measurement results of random noise histograms at room
maximum frame rate for each mode. temperature.
Figure 5.8.5: Captured images with an F number of 2.8, exposure time of 1/30s,
and no frame averaging. Figure 5.8.6: Characteristics of the chip
Figure 5.8.7: Micrograph of the chip. Figure 5.8.S1: Active switch circuit of SF mode and RSDA mode.
Figure 5.8.S2: Measurement results for conversion gain. Figure 5.8.S3: Performance comparison for low-noise CMOS image sensors.
5.9 A 0.8V Multimode Vision Sensor for Motion and Saliency detection operation. With the interleaving ping-pong pixel and global-shutter
Detection with Ping-Pong PWM Pixel operation, the vision sensor achieves a full-rate consecutive event frame reporting
without motion blur.
Tzu-Hsiang Hsu*, Yen-Kai Chen*, Jun-Shen Wu, Wen-Chien Ting,
Figure 5.9.3 shows the readout circuit and timing diagrams in FD and SD modes.
Cheng-Te Wang, Chen-Fu Yeh, Syuan-Hao Sie, Yi-Ren Chen, In FD mode, the in-pixel FD signal (VGO/E) can be either quantized as on/off event
Ren-Shuo Liu, Chung-Chuan Lo, Kea-Tiong Tang, detection by applying time-variant threshold (VTH) or readout through PWM
Meng-Fan Chang, Chih-Cheng Hsieh operation. For on/off event detection, two thresholds (VTH_H and VTH_L) are applied
National Tsing Hua University, Hsinchu, Taiwan, *Equally-Credited Authors (ECAs) sequentially to get a 2b code 11 (off event: ΔVF1>ΔVF2), 00 (on event ΔVF1<ΔVF2),
or 10 (no event) on VPWO/E and readout through column bus (COL). For high-
Energy-efficient always-on motion-detection (MD) sensors are in high demand resolution (8b) readout of FD signal (VGO/E), a ramping reference (VTH_RAMP) is
and are widely used in machine vision applications. To achieve real-time and applied to get a PWM output VPWO/E and quantized by the column-wised counter
continuous motion monitoring, high-speed low-power temporal difference to get an 8b code. In SD mode, by turning on 8 rows simultaneously and
imagers with corresponding processing architectures are widely investigated [1- connecting the outputs (COL) of 8×1 pixels together to a column-wise integrator,
6]. Event-based dynamic vision sensors (DVS) have been reported to reduce the the 8 currents (on/off currents of MRD0~MRD7) controlled by the binary event
redundant data and power through the asynchronous timestamped event-address levels (VPW0~VPW7) are summed up, converted to voltage VSD, and quantized by
readout [1,2]. However, DVS needs special data processing to collect enough the following SS ADC to achieve the event counting of sub-block and 8× reporting
events for information extraction, and suffers from noise and dynamic effects, rate improvement.
which limits the advantages of low-latency pixel event reporting. Furthermore,
low sensitivity (no integration) and lack of static information are also drawbacks A 0.8V 64×64 vision sensor was fabricated in 180nm standard CMOS with a pixel
of DVS. Frame-based MD rolling-shutter sensors [3,4] were reported to reduce size of 15μm×15μm. Figure 5.9.4 shows the captured images of static patterns
the data bandwidth and power by sub-sampling operation with the tradeoff of low in IC mode and moving patterns in FD mode. In FD mode, the 1st pattern is a black-
resolution and motion blur. Global-shutter MD sensors were reported [5,6] using and-white “SNOOPY” moving horizontally from left to right and the 2nd pattern is
in-pixel analog memory for reference image storage. However, such sensors a clockwise rotating triangle with 3 grey levels. The 2b event frames and 8b FD
require a special process technology for low off-state current device frames are all reported successfully in 960μs. It shows the 2b event frames with
implementation. In a frame-based MD sensor, the required analog processing “off event” (in white) and “on event” (in black) at the leading and behind edges of
circuit and two successive frames for temporal difference operation comes at a moving pattern respectively, and “no event” (in grey) in the background. In the
cost in power, area, and speed. To address these drawbacks, we present a frame- rotating pattern test, it shows the event report accuracy using a fixed event-
based MD vision sensor featuring three operation modes: image-capture (IC), detection threshold is degraded on the moving pattern with a smaller contrast (3
frame-difference (FD) with on/off event detection, and saliency-detection (SD). grey levels and white background). However, the lost motion information can be
Using a low-voltage ping-pong PWM pixel and multi-mode operation, it achieves recovered using the 8b FD level output as shown.
high-speed low-power full-resolution MD, consecutive event frame reporting, and Figure 5.9.5 shows the saliency frame sequence in SD mode and the
image capture functions. Moreover, saliency detection by counting the block-level corresponding event frame sequence in FD mode. The test pattern is a
event number is also implemented for efficient optic flow extraction of the counterclockwise rotating white plate with two black circles located in center and
companion processing chip using simple neuromorphic circuits. outer positions, respectively. The event frame sequence shows the on/off events
Figure 5.9.1 shows the chip architecture and block diagram of the 64×64 vision of outer circle only since the circle in center is stationary. The saliency frame
sensor. In IC mode, the threshold-voltage-cancelling (TVC) pulse-width- sequence shows the block-level off-event count corresponding the moving object
modulation (PWM) pixel [7] using auto-zeroing (AZ) reset achieves low-power position in an 8×8 resolution (8×8 sub-block in a 64×64 array). The reporting time
high-dynamic image capture. In FD mode, the in-pixel frame difference is of SD frame is 120μs to achieve real-time object tracking and optic flow extraction.
implemented using the extra series capacitor CM for storage and coupling to Figure 5.9.6 shows the measured performance and comparison table. The
generate the voltage difference of two successive frames. Since one frame-based achieved frame rate is limited by using an exposure time of 1ms. For image-
event detection needs two successive frames, the maximum event reporting rate capturing operation, it consumes 71.2μW@360fps with an iFOM of 48.3
is degraded by 2 using the conventional storage-and-subtraction operation [5,6]. pJ/pixel·frame. For motion-detection operation, it consumes 74.4μW@510fps in
In this work, a ping-pong pixel (ODD/EVEN) with global shutter is implemented full-resolution (64×64) event-reporting mode and 121.6μW@890fps in block-level
to achieve full-rate consecutive event reporting without motion blur. Unlike the (8×8) saliency reporting mode with iFOMs of 35.6pJ/pixel·frame and
conventional complex column-wise analog processing circuit for event detection 2.1nJ/block·frame, respectively.
[3-5], the in-pixel on/off event detection is implemented to generate a robust
binary output (COL<j>) for high-speed readout without static power consumption. Acknowledgements:
In SD mode, the event counts of an 8×8 sub-block are summed up and readout The authors acknowledge the support of MOST-Taiwan, TSRI-Taiwan, and Signal
to improve the saliency reporting rate by 8× compared to event reporting. Sensing and Application Lab (SiSAL), NTHU.
Figure 5.9.2 shows the circuit implementation of the ping-pong PWM pixel and References:
the timing diagram of FD operation. Each ODD/EVEN pixel consists of a capacitive [1] C. Li et al., “A 132 by 104 10μm-Pixel 250μW 1kefps Dynamic Vision Sensor
memory (CM) and an in-pixel comparator (MCP and MCN). Take the FD operation with Pixel-Parallel Noise and Spatial Redundancy Suppression,” IEEE Symp. VLSI
of frame1-frame2 as an example. In frame1, VS is firstly reset to VRST (zone A) Circuits, pp. 216-217, June 2019.
controlled by TX and RST_PD. After the exposure (zone B), VS is discharged with [2] B. Son et al., “A 640×480 Dynamic Vision Sensor with a 9μm Pixel and
a voltage difference of ΔVF1 on CME by keeping VGE = VAZE (AZ level reset by 300Meps Address-Event Representation,” ISSCC, pp. 66-67, Feb. 2017.
RSTE). Meanwhile, the level of frame0-frame1 is realized (VGO) and then readout [3] G. Kim et al., “A 467nW CMOS Visual Motion Sensor with Temporal Averaging
in zone C. In frame2, VS is first reset to VRST to couple VGE = VAZE + αEΔVF1, and Pixel Aggregation,” ISSCC, pp. 480-481, Feb. 2013.
where αE is the degradation factor defined by the ratio of CME and parasitic [4] K. D. Choo et al., “Energy-Efficient Low-Noise CMOS Image Sensor with
capacitance on node VGE. After the exposure of frame2, VS is discharged with a Capacitor Array-Assisted Charge-Injection SAR ADC for Motion-Triggered Low-
voltage difference of ΔVF2 on CMO by keeping VGO = VAZO (AZ level reset by RSTO). Power IoT Applications,” ISSCC, p. 96-97, Feb. 2019.
Meanwhile, the level of frame1-frame2 is realized as VGE = VAZE + αEΔVF1 - αEΔVF2. [5] T. Ohmaru et al., “25.3μW at 60fps 240×160-Pixel Vision Sensor for Motion
With the ping-pong PWM pixel, due to the same parasitic condition during reset Capturing with In-Pixel Non-Volatile Analog Memory Using Crystalline Oxide
and exposure, αE is a constant to avoid the error caused by being variant in single- Semiconductor FET,” ISSCC, pp. 118-119, Feb. 2015.
capacitor implementation. In zone C, the FD level of successive frames is [6] K. Lee et al., "A 272.49pJ/pixel CMOS Image Sensor with Embedded Object
quantized by in-pixel comparator for on/off event detection and readout. With the Detection and Bio-Inspired 2D Optic Flow Generation for Nano-Air-Vehicle
AZ reset operation, the in-pixel comparator’s threshold variation (including the Navigation," IEEE Symp. VLSI Circuits, pp. 294-295, June 2017.
mismatch of VAZE and VAZO) is cancelled out in event detection and PWM [7] A. Y. Chiou et al., "An ULV PWM CMOS Imager with Adaptive-Multiple-
operation as well. The mismatch of αE and αO is small and can be further Sampling Linear Response, HDR Imaging, and Energy Harvesting," IEEE JSSC,
compensated by applying different thresholds in the corresponding event vol. 54, no. 1, pp. 298-306, Jan. 2019.
110 • 2020 IEEE International Solid-State Circuits Conference 978-1-7281-3205-1/20/$31.00 ©2020 IEEE
ISSCC 2020 / February 17, 2020 / 4:30 PM
Figure 5.9.3: Readout circuit and timing diagrams in frame-difference and Figure 5.9.4: Captured images of static patterns in IC mode and moving patterns
saliency-detection modes. in FD mode.
Figure 5.9.5: Saliency frame sequence in SD mode and the corresponding on/off
event frame sequence in FD mode for moving object position tracking. Figure 5.9.6: Performance summary and comparison table.
Figure 5.9.S1: Captured images of 360fps IC mode (8b) and 510fps FD mode
(8b). It shows the spatial information (IC mode) and temporal information (FD
mode) are all well reconstructed for low-power real-time object identification
Figure 5.9.7: Chip micrograph. and motion detection, respectively.
5.10 A 1280×720 Back-Illuminated Stacked Temporal array. Typical pixel activation to time-stamp delays of 60-to-70ns and about 120ns
Contrast Event-Based Vision Sensor with 4.86μm row-to-row selection timing are achieved.
Pixels, 1.066GEPS Readout, Programmable Event-Rate Figure 5.10.3 shows a block diagram of ESP pipeline and output interface. ESP
functions include address filtering, throughput regulation and data formatting.
Controller and Compressive Data-Formatting Pipeline The digital readout (RO) block synchronizes and timestamps pixel event data row-
Thomas Finateu1, Atsumi Niwa2, Daniel Matolin1, Koya Tsuchimoto2, wise and splits them into 32-event vectors. A LUT-based address filter removes
Andrea Mascheroni1, Etienne Reynaud 1, Pooria Mostafalu3, selected events, e.g., from defective pixels. The ERC block allows to cap the output
Frederick Brady3, Ludovic Chotard1, Florian LeGoff1, event rate to a predefined limit rate by dynamically applying data drop on peaks
Hirotsugu Takahashi2, Hayato Wakabayashi2, Yusuke Oike2, that exceed this value. The limit rate is programmable over a wide range between
5k events per second (EPS) and 1GEPS. ERC continuously monitors input and
Christoph Posch1
output rates of a FIFO buffer and, in closed-loop configuration, regulates the
1
PROPHESEE, Paris, France instantaneous FIFO output rate by removing events, following various drop
2
Sony Semiconductor Solutions, Atsugi, Japan strategies that combine spatial and temporal criteria. An ROI-based dropping
3
Sony Electronics, Rochester, NY scheme subdivides the pixel array into 40-by-23 32×32 pixel blocks that can be
Event-based (EB) vision sensors pixel-individually detect temporal contrast programmed to each exercising a different one of 64 selectable weighs on drop
exceeding a preset relative threshold [1,2] to follow the temporal evolution of rate, allowing for application-specific optimization of ERC operation (e.g.,
relative light changes (contrast detection, CD) and to define sampling points for preferably drop events from the sky in an automotive scene). The EDF block
frame-free pixel-level measurement of absolute intensity (exposure measurement, converts the event stream to vectorized data formats (EVT). Advanced EVT
EM) [3,4]. EB sensors gain popularity in high-speed low-power machine vision formats combine differential and vectorized encoding to dynamically optimizing
applications thanks to temporal precision of recorded data, inherent suppression the number of bits per event by exploiting spatial and temporal relations between
of temporal redundancy resulting in reduced post-processing cost, and wide intra- events. In situations of highest throughput, 1.6b on average encodes the full
scene dynamic range operation. temporal and spatial information of an event. The EOI manages off-chip data
transmission. Endianness support for natively managing several event
Information about temporal contrast (CD) is encoded in the form of “events”: data granularities is provided. In addition to 16b parallel 100MHz synchronous mode,
packets containing the originating pixel’s X,Y coordinate, time stamp, and contrast the interface can be configured for packet mode to better adapt to USB/MIPI
polarity. To maximally benefit from the ability of the individual pixel to sample transceivers. Clock gating reduces EOI power consumption. ESP integrates
visual information at high temporal precision, early time-stamping and high automatic test pattern generators (ATPG) in different places along the pipeline to
readout throughput are crucial to preserve event timings. Different readout ease chip verification.
schemes that trade off temporal precision in favor of simpler architectures and
reduced bit rates at the output (frames-of-events) have been proposed [5,6]. This Figure 5.10.4 shows sensor operation in example applications. A ~1lux traffic
approach, however, is not well matched to the fundamental principle of scene recording demonstrates low-light contrast sensitivity in line with lab test
asynchronous EB pixel arrays and limits the sensor’s usability in high-speed results. The high temporal precision due to low-latency pixel and high-speed
applications like, e.g., fast time-coded structured light depth sensing, high-speed readout operation allows the sensor to decode temporally encoded structured
tracking and optical flow with kHz update rates, or visible-light communication light patterns in a 3D depth sensing application.
(VLC). As shown below, other methods to reduce the bit/event number can be Figure 5.10.5 shows test results on CD contrast sensitivity, dynamic range, BG
devised that do not impact the temporal precision of the event data. noise and pixel uniformity at a single sensor setting. Nominal contrast threshold
EB vision sensors, due to complex in-pixel circuitry, have long suffered from large (NCT) is ~16% log contrast. Lower NCT down to 11% can be achieved at different
pixel sizes and low resolution; therefore, they particularly benefit from direct wafer settings (insets). The CDP curve defines low- and high-light cutoff points (LLCO,
bonding technology and the resulting possibility of stacking efficient HLCO) that limit the DR of CD. LLCO was measured at 0.04lux for c=40% contrast,
phototransduction of BI CIS on top of high-density analog signal processing on HLCO has not been seen up to 100klux; also, no increase in BG rates at high light
CMOS to achieve competitive resolution and size. This paper presents a 3D levels due to parasitic photocurrents is observed. Resulting DR is >124dB. BG
stacked EB vision sensor that uses pixel-level Cu-Cu bonding interconnects to rates peak at ~8.3 events/pixel/s at 0.6lx and go down to below 10-1 for
achieve 4.86μm pixel-pitch, yielding HD resolution below ½” optical format. In- illuminances above few tens of lux. FPN-like CTNU is at ~3% contrast.
pixel circuits communicate CD events via a stall-safe low-latency interface to an A 1280×720 ½” temporal contrast EB vision sensor is designed and fabricated
asynchronous row-selection tree. Events from active rows are immediately time- on a Cu-Cu bonded wafer stack of 90nm BI CIS on 40nm CMOS with 4.86μm
stamped and arranged for bit-efficient vector readout. A digital event signal pixels achieving >77% fill factor. The chip consumes 32mW (static) to 84mW at
processing (ESP) pipeline features a programmable event rate controller (ERC) high activity (300MEPS). Readout with 1μs time stamping handles internal peaks
and an event data formatter (EDF) for dynamic bit compression. A parallel output of up to 2.5GEPS out of the pixel array and sustained 1.066GEPS at chip output.
interface (EOI) with data packing feeds the event data directly to a processor or Bits-per-event are dynamically compressed down to 1.6b while maintaining full
to an external MIPI or USB transceiver. spatial and temporal information. Wide DR (>124dB) is achieved due to good low-
Figure 5.10.1 shows a pixel block diagram illustrating the CIS front-end with Cu- light CIS performance (40mlx LLCO) and absence of leakage activity from parasitic
Cu interconnect to the CMOS asynchronous delta-modulation (ADM) based CD photocurrents at high light. Step-response latencies around 200μs are typical at
circuit and asynchronous readout interface and state-logic (ISL) block. Input illuminance levels >10lux.
latches for CON/COFF reduce power in case of slow comparator switching and References:
prevent ringing around the switching point. Gated latches (K) robustify event [1] P. Lichtsteiner et al., "A 128×128 120db 30mW Asynchronous Vision Sensor
registration by preventing late reqX generation and resulting event loss; events that Responds to Relative Intensity Change," ISSCC, pp. 2060-2069, Feb. 2006.
arriving after row-readout has started are latched for subsequent readout as soon [2] B. Son et al., "4.1 A 640×480 Dynamic Vision Sensor with a 9μm Pixel and
as the current cycle is finished (ackY removed). Only pixels with events stored in 300Meps Address-Event Representation," ISSCC, pp. 66-67, Feb. 2017.
the input latches, locally generate CTRLADM to reinitialize their CD circuitry upon [3] C. Posch et al., "A QVGA 143dB Dynamic Range Asynchronous Address-Event
receiving ackY, thereby removing the necessity for column-wise ackX signals and PWM Dynamic Image Sensor with Lossless Pixel-Level Video Compression,"
simplifying readout control logic, column readout circuitry and pixel array signal ISSCC, pp. 400-401, Feb. 2010.
routing. [4] J. Huang et al., " Live Demonstration: A 768 × 640 Pixels 200Meps Dynamic
Figure 5.10.2 shows a block diagram of the chip, illustrating pixel array readout Vision Sensor," IEEE ISCAS, 2017.
and event data flow via the ESP pipeline to the output data interface. The active- [5] C. Li et al., T. Delbruck, “A 132 by 104 10μm-Pixel 250μW 1kefps Dynamic
row selection operation is pipelined; new row selection happens in parallel to Vision Sensor with Pixel-Parallel Noise and Spatial Redundancy Suppression,”
processing previous row data. The asynchronous-to-synchronous interface is Int. Image Sensor Workshop, 2019.
supervised by a time-out column at the far end of the pixel rows, ensuring correct [6] H. E. Ryu, “Industrial DVS Design; Key Features and Applications,” Conf. on
timing for synchronizing the VecX data into clocked latches at the interface to the Computer Vision and Pattern Recognition, 2019, Accessed Nov. 3, 2019,
ESP pipeline and preventing event loss due to run time variations in the <http://rpg.ifi.uzh.ch/docs/CVPR19workshop/CVPRW19_Eric_Ryu_Samsung.pdf>.
asynchronous data paths. Timestamps are attached to the events at this interface.
This readout architecture preserves the event temporal precision out of the pixel
112 • 2020 IEEE International Solid-State Circuits Conference 978-1-7281-3205-1/20/$31.00 ©2020 IEEE
ISSCC 2020 / February 17, 2020 / 4:45 PM
Figure 5.10.3: Event signal processing (ESP) digital pipeline with event output Figure 5.10.4: Sensor output event data illustrating low-light contrast sensitivity
interface (EOI). Measurement of output event-rate. in an automotive scenario and fast 3D depth map reconstruction.