CMOS-SPAD Camera Prototype for Single-Sensor 2D/3D Imaging

One of the research lines explored in project 'iCaveats' has been the combined capture of 2D and 3D visual information. With the objective of power-efficient feature learning/extraction, combined 2D/3D imaging is a useful tool to work on a lightweight but rich description of the scene. Single-sensor capture of both modalities is a potential improvement in cost and efficiency. In this demo, we present the performance and features of a CMOS-SPAD camera prototype that realizes photon counting and direct time-of-flight (d-ToF). The central elements of the camera module are a 64x64 SPAD imager and a FPGA board for real time histograming and image reconstruction at 1kfps.


INTRODUCTION
The possibility of acquiring 2D and 3D images from a single device represents a reduction in size, power consumption and complexity in embedded vision systems. Traditional methods for 3D imaging, i. e. triangulation/stereo vision, require many computational resources. The capture and integration of 2D and 3D information in the same silicon substrate facilitates mapping between the two views. A procedure that is much more complex in terms of computations in stereo vision. For this reason, estimation of the time-of-flight (ToF) is very much wanted. Some attempts to indirect ToF estimation rely on photonic mixers to demodulate the illumination signal [1], what requires access to special fabrication steps. Some others rely on charge integration [2], what requires very precise charge transfers for accurate background cancellation. This is usually achieved through intervention in the fabrication process doping profiles.
An interesting alternative in CMOS technology are avalanche diodes in Geiger mode, also known as single-photon avalanche diodes (SPAD). These devices are sensitive to the arrival of a single-photon. These events can be counted, thus providing an intensity map, i. e. 2D image, even in low-illumination conditions; or they can be time stamped with the help of a fine time-to-digital converter (TDC) for direct estimation of the ToF [3].

VISITOR EXPERIENCE
The principle of d-ToF is explained in the diagram of Fig. 1(a). Short light pulses are sent by a picosecond laser. These pulses are reflected by objects situated at image planes between Pi and Pf. At the same time, an electrical signal is sent to the global shutter of the imager. Some of the outgoing photons reflected by the object reach the active surface of the sensor and are eventually detected.
In this demo, the operation of the 2D/3D camera will be presented with the help of a poster and recorded videos. The d-ToF system is composed by the SPAD camera itself and a picosecond laser ( Fig. 1(b)). The housing of the camera accommodates 8mm F1.2 lenses focusing on the 64×64 array of SPADs and pixel-level TDCs [4]. The control signals are provided by an FPGA. It also implements a real time 2D and 3D image reconstruction and a USB link used for image streaming and camera configuration. A user-friendly GUI has been built in Matlab and OpenCV. The camera also has the possibility to record videos at 1000 fps and sent them to the computer in raw format. Afterwards the user can play them off-line at 50 fps.

Different experiments are featured in the videos:
2D imaging: The scene is illuminated by a 30W lamp. This amount of light is enough to ensure the minimum count rate required at the input of the CMOS counters which are driven by the SPAD detectors. The brightness map is obtained by counting photons. This is done at pixel-level by a gated 8b counter which is actually the coarse counter used by the TDC for 3D imaging. The 2D image obtained by averaging 100 inter-frames. The frame rate of the reconstructed images is about 10fps.
3D ranging: We have employed a panel to provide a homogenous background for the shapes to be reconstructed. This panel has been shifted towards the sensor-laser ensemble from 74cm down to 4cm with 1cm step (see Fig. 1). ToF estimation is done in parallel by means of in-pixel TDCs. These TDCs have a 145ps LSB. In addition, time-gating of the SPADs mitigates the influence of spurious avalanches. The depth map is obtained from the aggregated readings of the pixel ToF estimations. 65k interframes are employed to generate this map off-chip at 1000fps. The SNR of the ToF measurement is about 24dB. The total jitter FWHM is about 735 ps.
High-speed burst capture: The high-speed photon counting capabilities of the sensor and the camera system memory allows to capture a burst at 1000fps, that can be then downloaded off-line and played in slow motion.

CONCLUSIONS
A camera prototype for 2D/3D image reconstruction based on photon counting and d-ToF has been developed in this project. It demonstrates that single-sensor 2D/3D imaging is possible in CMOS technology. However noise and background illumination, translated into spurious avalanches, can seriously degrade image quality. The overall frame rate is limited by the large number of inter-frames needed and image serialization. On-chip image reconstruction in real-time and parallel readout channels can help overcome these limitations.