# In-Pixel ADC for a Vision Architecture on CMOS-3D Technology

M. Suárez, V.M. Brea, Carlos Domínguez Matas\*, Ricardo Carmona\*\*, Gustavo Liñán\*\*, Ángel Rodríguez-Vázquez\*\*

Dept. of Electronics and Computer Science
University of Santiago de Compostela
Santiago de Compostela
E-15706, Spain
AnaFocus\*, Sevilla, Spain
Instituto de Microelectrónica de Sevilla (IMSE-CNM)\*\*
Spain

Email: manuel.suarez.cambre@usc.es

Abstract- This paper addresses the design of an 8-bit single-slope in-pixel ADC for a 3D chip architecture intended for airborne surveillance and reconnaissance applications. The 3D chip architecture comprises a sensor layer with a resolution of 320 x 240 pixels bump-bonded to a three-tier chip on the 150 nm FDSOI CMOS-3D technology from MIT-Lincoln Laboratories. The top tier is a mixed-signal layer with 160 x 120 processing elements. The ADC is distributed between the top two tiers. The top tier contains both global and local circuitry. The ramp generation is implemented with global circuitry through an 8-bit unary current-steering DAC. The end of conversion at every pixel or processing element is triggered by a local comparator. The digital words are stored in a frame-buffer in an intermediate tier. The area of the local circuitry in the ADC is consumed by the comparator, capable of reaching less than 3 mV of resolution in less than 150 ns with less than 220 µm<sup>2</sup>, and by the memory cells, each one storing 6 8-bit words along with two additional bits in less than 50 µm x 50 µm. Every ADC conversion is performed in less than 120 µs.

## I. INTRODUCTION

The huge amount of data to deal with during the early stage of image processing along with the heterogeneity of the tasks performed towards decision-making render the hardware implementation of vision chips challenging [1, 2].

Conceived as computers that see, vision chips face a variety of tasks so different as image acquisition, spatio-filtering, logic and morphological processing, blob analysis, classification and eventually decision-making. Meeting the above needs on vision chips with fine-grained parallelism leads to pixels or processing elements not only able to sense, but also able to process images. This feature leads to either larger vision chips, or chips with very low resolution, and/or processing elements with limited functionality.

CMOS-3D technology permits to distribute processing among different tiers, paving the way for vision chips with larger resolution, better fill-factor and more functionality at every processing element.

The ADC addressed in this paper is embedded on a 3D vision architecture made up of a sensing layer with a

resolution of 320 x 240 pixels bump-bonded to a three-tier CMOS-3D chip on the 150 nm FDSOI CMOS-3D technology from MIT-Lincoln Laboratories (MIT-LL). The chip is oriented to airborne surveillance and reconnaissance applications, performing image registration and feature extraction at high speed [3].

The ADC does not only map the pre-processed analog images onto digital representations, but it is also employed to implement mathematical operations among the pre-processed analog images, particularly aimed at the detection of maximum and minimum. The ADC is hence a critical component of the architecture. The ADC introduced in this paper is implemented as an 8-bit single-slope ADC.

Provided that only the comparator and the registers that store the digital words are local to every processing element, single-slope ADC's are a natural solution as in-pixel ADC's on vision chips with fine-grained parallelism. The drawback is the time it takes to run every conversion when the bit resolution is high (n>10), as it is needed to wait for the end of the ramp (2<sup>n</sup> cycles) to have one conversion [4]. In our case, the single-slope is provided by an 8-bit unary current-steering DAC. The time of conversion is not penalized excessively, resulting in less than 120 μs, enough to our purposes. The storage of the digital words in an intermediate tier is performed with 6 8-bit memories and two additional 1-bit registers.

The outline of this paper is as follows. Section II briefly presents the architecture of the 3D-vision chip, what we have called VISCUBE, along with a description of its functionality. Section III addresses the design of the in-pixel ADC itself. Finally, the conclusions of the paper are drawn.

## II. 3D VISION ARCHITECTURE

Fig. 1 displays the schematic view of the so-called VISCUBE chip. The sensing layer is bump-bonded to a three-tier structure composed of a mixed-signal layer, a frame-buffer as intermediate layer, and a digital processing layer on the bottom tier. The resolution of the sensing layer is 320 x



Fig. 1. Schematic view of the VISCUBE chip, displaying a processing element in the top tier (the mixed-signal layer) and the global circuitry for the in-pixel ADC (digital counter and analog ramp generator) distributed in the top two tiers.

240 pixels. The mixed-signal layer has a resolution of 160 x 120 pixels or processing elements, thus every pixel in the mixed-signal layer is assigned to four photosensors.

The VISCUBE chip is intended for airborne visual navigation and reconnaissance applications. These applications are based on segmentation, which in the case of a moving platform requires image registration. Image registration means to find and calculate the affine transformation for the ego motion of the camera. In this context, some of the functions performed by the chip are: feature selection by the mixed-signal layer, displacement calculation and feature extraction by the digital signal layer, and registration of successive frames and tracking of moving objects by an external processor.

Low-pass filtering and maximum and minimum detection at different scales, i.e. image resolution, are the key to find salient features and perform a successful registration. These operations are realized in the mixed-signal layer, where the inpixel ADC is embedded.

Fig. 1 sketches a processing element. The sensor interface is implemented with a capacitive transimpedance amplifier with offset sampling. The resultant voltages are stored in the so-called local analog memories (LAM's). The in-pixel ADC does not only map the pixel values onto digital representations, but it also realizes the maximum-minimum detection. The comparator is the critical component of the ADC in the mixed-signal layer, not only because of the accuracy requirements, (this will be seen in Section III), but also because of the area constraint. The relatively high resolution (160 x 120) of the mixed-signal layer makes a compact comparator necessary in order to keep a reasonable chip area. The area of the processing element was limited to  $50 \ \mu m \times 50 \ \mu m$ .

### III. IN-PIXEL ADC

As it was mentioned above, an 8-bit single-slope in-pixel ADC was designed for VISCUBE. The in-pixel ADC is distributed in two tiers. The comparator lies in Tier C (see Fig.



Fig. 2. Schematic view of the analog ramp generator for the 8-bit single slope in-pixel ADC.



Fig. 3. Thermometric DAC used to generate the ramp of the 8-bit single-slope in-pixel ADC.

1). The analog ramp is generated with an 8-bit counter, allocated in Tier B, and with an 8-bit unary current-steering DAC in Tier C. The registers that store the digital words lie in Tier B. Every memory cell is made up of 6 8-bit digital words and two additional 1-bit registers. Every processing element in the top tier is pitch-matched to every memory cell in the intermediate tier. Both are connected with Through-Silicon-Vias (TSV's) of 5  $\mu$ m x 5  $\mu$ m.

# A. The Analog Ramp Generator

The ramp or single-slope is provided by the global block analog ramp generator, which produces a staircase signal of 255 steps. Fig. 2 shows such an analog ramp generator. It comprises an 8-bit current steering DAC and the corresponding buffering to cope with the large fan-out (160 x 120 processing elements in the mixed-signal layer). In turn, the DAC is driven by a digital counter.

Fig. 3 shows a view of the thermometric DAC used to generate the ramp of the 8-bit single-lope in-pixel ADC. Fig. 4 displays its layout, occupying an area of 296  $\mu$ m x 270  $\mu$ m.



Fig. 4. Layout of the 8-bit thermometric current-steering DAC used in the inpixel ADC.



Fig. 5. Current source of the thermometric DAC.

The DAC is implemented as a thermometric current-steering DAC. Thermometric current-steering DAC's give a good DNL figure and ensure monotonicity. The INL is the challenge in this type of converters [5]. The area consumption is not an issue as the DAC is global circuitry allocated outside the array of processing elements in the mixed-signal layer.

The DAC consists of 2 4:15 thermometer decoders to convert from digital to thermometer code, an array of  $2^{N-1}$  sources to generate the unary reference current  $I_{u}$ , (being N the DAC resolution), an external and adjustable resistance to convert the output current signal to a voltage signal and an offset calibration block to cancel the offset or to add a constant value to the output current from the array.

The core of the DAC is the array of current sources. In our



Fig. 6. Current variation in every current source due to finite output impedance at the full-output swing.

case 255 current sources are needed. The current sources are laid out in a square array of 16 x 16 elements. Every element of the array encompasses the current source itself and local logic. The local logic is used with the thermometric decoders in order to set on/off current through every element of the array of current sources.

Fig. 5 displays a schematic view of the current sources of the array with the transistor dimensions in microns. Every current source is a differential pair biased by a current source implemented with a PMOS cascode topology. The reason for the cascade topology is the low output impedance provided by the transistors of the 150nm FDSOI CMOS-3D technology from MITLL. High output impedance is required in order to guarantee a good INL in the DAC.

Fig. 6 shows the magnitude of the systematic errors caused by the finite output impedance in a current source (Fig. 5) of the DAC. In this simulation, the current source is subject to a voltage change across the resistor that collects all the currents from the array (Fig. 2) from 0 to 255.Iu.R volts (full-output swing). The error caused by the finite output impedance is just a 0.16 % of the nominal unity current  $I_u{=}1.4~\mu\mathrm{A}$ . The resistor value was chosen to be 1.96 k $\Omega$ , leading to an output voltage range of [0.4, 1.1] V. The resistor will be external to the chip, allowing for error correction after chip fabrication.

The DAC also includes an offset calibration block. The circuitry for offset calibration comprises two sets of currents sources. One set of current sources is always on, drawing a fixed current. The other set is user-programmable with 5 bits of resolution. The goal of the offset calibration circuitry is two-fold. On the one hand, the aim is to add a constant value to the currents from the 8-bits current-steering DAC in order to have an output voltage within the operating range of the remaining circuits of the cells in the mixed-signal layer of the VISCUBE chip, this is, in the [400mV, 1.1V] range. On the other hand, the goal is to offer user-programmable offset calibration. 129 current sources permanently on give a fixed output voltage around 356 mV. In addition, there are 32 current sources that provide programmability with 5 bits of resolution. This permits to adjust the wanted offset, i.e. 400mV, with  $\pm 15$ LSBs of resolution. The power consumption of the DAC goes from 270  $\mu W$  when all the switchable sources are OFF to 871  $\mu W$  when all of them are ON.

The analog ramp produced by the 8-bit current-steering DAC has to be distributed to the 160 x 120 processing elements in the mixed-signal array, leading to a large fan-out. This obliges to include an analog buffer between the DAC and the mixed-signal layer in order to reach an acceptable time response. As the output from the DAC has to arrive at the comparators of every pixel, the fan-out seen by the DAC is mostly capacitive. This permits to implement the buffering circuitry with an OTA. The OTA implemented to this purpose on the VISCUBE chip is sketched in Fig. 7. Such an OTA is a differential structure with two complementary differential pairs and a folded-cascode topology as output stage. Table I conveys sizes of the transistors of the OTA buffer.

The use of two complementary differential pairs ensures an input range sufficiently wide as to work within the range 400 mV-1.1 V. High linearity is achieved with a high open-loop gain. The high open-loop gain is met with a high output impedance through cascode structures. It is also apparent the gain dependence with the common-mode signal. The lowest gain achieved with the buffer in the [400 mV, 1.1V] range was 57 dB, which leads to a linearity error inferior to 0.14 %. This is enough to achieve low linearity errors in the analog ramp generator.

Table I OTA Dimensions

| C 111 E IIII O III |      |     |     |      |    |      |     |     |
|--------------------|------|-----|-----|------|----|------|-----|-----|
|                    | M1   | M2  | M3  | M4   | M5 | M6   | M7  | M8  |
| W                  | 6    | 1   | 0.8 | 3    | 11 | 9.25 | 4   | 4   |
| (µm)               |      |     |     |      |    |      |     |     |
| L (µm)             | 0.65 | 0.2 | 0.2 | 0.85 | 1  | 1    | 1.2 | 1.5 |
| Fingers            | 40   | 40  | 34  | 34   | 40 | 40   | 40  | 40  |



Fig. 7. OTA used as buffer in the analog ramp generator of the in-pixel ADC.

Fig. 8 shows the output of the analog ramp generator. Fig. 9 illustrates its linearity errors. The INL errors were found to be around 0.2 LSB. DNL errors were found to be below 0.01 LSB. It is worth saying that these errors are caused by systematic errors like finite gain and output impedance values. Random errors might be compensated for with an external



Fig. 8. The output of the analog ramp generator.



Fig. 9. Linearity, DNL and INL figures of the analog ramp generator composed of the thermometric DAC and the buffer.

resistor and with the offset calibration block. Also, lower linearity errors are possible with higher gain in the folded-cascode OTA used as buffer through higher output impedance. Nevertheless, this is constrained by the settling time, which in turn is determined by the conversion time of the ADC. In our case, the conversion time for the ADC was fixed at 120  $\mu s$ , which leaves 400 ns for every step in the ramp. This time should be enough for the output of the voltage buffer to settle down and for the comparator to make a comparison. The time for every step in the ramp was found to be 212 ns by simulation. The power consumption was found to be 570  $\mu W$ .

## B. The Comparator

The limited area available for every processing element is the major issue in the comparator design. Besides, the operating voltage range is limited to [0.4, 1.1] V, resulting into a resolution of 2.7 mV in order to reach 8-bits of accuracy. Instead of OTA-based comparators, an inverter was used as gain stage. This is motivated by the low area requirements of the comparator [6]. The resolution constraint is achieved with offset-compensation techniques.

Fig. 10 shows the schematic of the comparator implemented



Fig. 10. Schematic of the comparator implemented on the VISCUBE chip.

on the VISCUBE chip. The comparator works in two phases. During the reset phase, controlled by signals  $Comp\_Rst$  and  $Comp\_RstD$ , the voltage difference  $V_1$ - $V_Q$  is stored in the capacitor C, with  $V_Q$  being the quiescent point of the inverter. The two signals  $Comp\_Rst$  and  $Comp\_RstD$ , with the latter slightly delayed with the former one, implement the well-known bottom-plate sampling technique, required for dealing with charge injection and feedthrough effects. During the comparison phase, signals  $Comp\_Rst$  and  $Comp\_RstD$  are set to LO, while signal  $Comp\_Sig$  is set to HI. In this situation, the input voltage to the inverter stage (-K) is given by:

$$V_{i} = V_{Q} + (V_{2} - V_{1}) = V_{Q} + V_{diff}$$
 (1)

producing an output voltage expressed in Eq. (2).

$$V_{out} = V_Q - K(V_2 - V_1) = V_Q - KV_{diff}$$
 (2)

The sign of the differential input signal determines the output. Higher K values allow for lower  $V_{diff}$  voltages, resulting in better resolution.

The low intrinsic gain of the FDSOI transistors when compared to their bulk counterpart yields a very low resolution in the comparator with a simple inverter. Cascode topologies are needed in order to distinguish differential signals in the order of mV at the input. Fig. 11 displays the schematic of the inverter gain used in the comparator, along with their small-signal equivalent circuits. As shown in Fig. 12, the double cascode inverter gives a gain of 66 dB. Although additional stages are required for the output voltage to set to either *Vdd* or *-Vss*, this value is high enough to reach 2.7 mV of resolution. Transistor sizes of the double cascode inverter are collected in Table II. The power consumption of the comparator was found to be 1.12 µW at simulation level.

Table II
Transistor Sizes in the Dual-Cascode Inverter

|        | M1  | M2  | М3  | M4  |
|--------|-----|-----|-----|-----|
| W (µm) | 0.6 | 0.6 | 0.6 | 0.6 |
| L (µm) | 0.6 | 0.2 | 0.6 | 1.5 |



Fig. 11. Double cascode topology used as inverter gain stage in the comparator.

(c)

Fig. 13 shows the actual comparator with its time diagram. Two additional NAND gates were included, rendering a multistage comparator with a high gain. In this case the resolution can be expressed as:

$$\xi_s = \frac{V_{dd}}{K_{s1}K_{s2}K_{s3}} + E_{off1} + \frac{E_{off2}}{K_{s2}} + \frac{E_{off3}}{K_{s3}}$$
(3)

where  $K_{si}$  is the dc gain, and  $E_{offi}$  is the offset of every gain stage in the multi-stage comparator. Equation (3) shows that if the gain is sufficiently high, the offset of the first gain stage is the limiting factor to reach a high resolution. In our case, the gain provided by the comparator amounts to 115 dB, which would eventually lead to a very high resolution. Nevertheless, and as we will see below, the residual offset caused by charge injection and feedthrough effects puts a limit on the resolution of the comparator.

On the other hand, the first NAND gate was sized 1) to settle its output to LO during the reset phase, drawing only leakage currents and thus consuming low power and 2) to filter possible transient peaks caused by the transition from the reset to the comparison phase of the comparator. The second NAND gate allows to introduce an external signal to enable/disable the comparator for test purposes and to force a conversion at the end of the ramp in case it had not occurred before. The first NAND gate was sized to  $0.6~\mu m/0.6~\mu m$ , while the second NAND gate and the final inverter were sized to the minimum dimensions  $(0.6~\mu m/0.2~\mu m)$ .

Concerning the switching scheme, the input switches were implemented with transmission gates in order to have full input swing, making it easier the connection to other circuits in the processing element. Such transmission gates were sized to the minimum dimension. Besides, one more transistor was



Fig. 12. Bode plots of the dual-cascode inverter used as gain stage in the comparator.



Fig. 13. Actual comparator with its time diagram.

added to the feedback loop in order to minimize leakage currents. The transistor driven by  $Comp\_Rst$  was sized to the minimum dimension in order to reduce charge injection and feedthrough effects. The transistor driven by  $Comp\_RstD$  in the feedback loop was sized with a longer channel length, 0.6  $\mu$ m/ 0.6  $\mu$ m in order to minimize leakage currents.

The multi-stage architecture of the comparator permits to



Fig. 14 Speed-resolution trade-off in the comparator.



Fig. 15. Comparator layout.

achieve the required 2.7 mV of resolution. Nevertheless, it must be said that the charge injection and feedthrough errors from the hardware for offset cancellation leaves a residual offset around 3 LSB's. The capacitor used in the comparator, 162 fF, was chosen to keep the residual offset constant throughout the ramp. Larger capacitors would make the offset lower. Eventually, it would be possible to improve the resolution up to the limit imposed by the gain, below  $\mu V$ 's.

The speed of the comparator is determined by the dynamic resolution, defined as the resolution reached within a given time slot. Every comparator features a natural speed-resolution trade-off, i.e. higher resolutions are reached at the expense of longer comparison times. Quantitatively, the dynamic resolution is studied with the comparator subject to a step input. In the case of a multi-stage comparator with N identical stages cascade-connected, the speed-resolution trade-off is given by Equation (4), where  $\Delta_d$  is the dynamic resolution,  $T_c$ is the comparison time and  $\tau_u = C_L/g_m$ . As seen in Eq. (4), better resolutions, i.e. lower  $\Delta_d$  values, for a given number of stages, N, leads to longer comparison times. Fig. 14 plots the resolution-speed trade-off in our comparator with the offset canceled. It can be seen that resolutions of 2.7 mV lead to comparison times around 120 ns. Provided that the ramp should be completed in 120 µs, every step in the ramp takes 400 ns. This time is needed for the output voltage of the analog ramp generator to settle down and to perform a comparison. As it was seen in the former subsection, the analog ramp generator needs around 212 ns, which added to the time for a comparison amounts to less than 400 ns.

$$\Delta_d \left(\frac{T_c}{\tau_u}\right)^N \approx N! \cdot V_{dd} \tag{4}$$

Finally, the main data on the comparator are shown in Table III. The final area of the comparator is less than 220 um<sup>2</sup>, around 9% of the area occupied by the processing element in

the mixed-signal tier. A view of the comparator layout can be seen in Fig. 15.

Table III Comparator Data

| comparator zata                  |                                           |  |  |  |
|----------------------------------|-------------------------------------------|--|--|--|
| Area                             | 220 μm²                                   |  |  |  |
| Operating Range                  | 0.4- 1.1 V                                |  |  |  |
| Resolution                       | 8 bits, 2.7 mV                            |  |  |  |
| Max. Resolution- Residual Offset | μV's                                      |  |  |  |
| Speed- Resolution Trade-off      | $\Delta_d$ <1LSB, T <sub>C</sub> > 120 ns |  |  |  |
| C=162 fF                         | Constant offset- 3 LSB's                  |  |  |  |
| Power Consumption                | 1.12 μW                                   |  |  |  |

#### C. Memory Cells

The memory cells in the intermediate tier (Tier B in Fig. 1) store the digital words from the in-pixel ADC. Every memory cell is made up of six 8-bit words and two additional 1-bit registers. The memory cells are implemented with a static latch. Its schematic view is displayed on Fig. 16. The PMOS transistor in the second inverter is oversized (1.4  $\mu$ m/ 0.2  $\mu$ m) with regards to the minimum dimension (0.6 $\mu$ m/ 0.2  $\mu$ m) in order to guarantee an adequate write/read cycle. Also, the use of static latches permits to keep floating body effects from the FDSOI transistors at a low value [8].



Fig. 16. Memory cells in the intermediate tier to store the digital words from the in-pixel ADC.

# IV. CONCLUSIONS

This paper has addressed the design of an 8-bit single-slope in-pixel ADC for a 3D vision architecture oriented to airborne surveillance and reconnaissance applications. The 3D vision architecture is composed of a sensor layer with a resolution of 320 x 240 pixels bump-bonded to a three tier structure with

TSV's of 5 µm x 5 µm on the 150 nm FDSOI CMOS-3D technology from MITLL. The top tier of the array is a mixedsignal layer of 160 x 120 processing elements. The intermediate tier is a frame buffer. The bottom tier is a digital layer for high level image processing. Image registration is among the main tasks performed by such a 3D vision architecture, the so-called VISCUBE. To that purpose, lowpass filtering and detection of maximum and minimum across different scales (image resolutions) are needed. The in-pixel ADC embedded in VISCUBE does not only map the preprocessed images onto a digital representation, but it is also used to detect maximum and minimum. Hence, the ADC is a critical component of the system. The in-pixel ADC is implemented with an 8-bit thermometric current steering DAC, a comparator and memory cells to store the digital words. Due to area constraints, the comparator is the major issue in the design of the ADC. The use of an inverter as gain stage instead of an OTA-based topology yields low area. Offset-compensation techniques permit to reach the required resolution. The comparator achieved 2.7 mV of resolution in less than 220 µm<sup>2</sup> with a time response below 120 ns. Finally, it should be said that the VISCUBE chip has been submitted to fabrication. If available, experimental results will be shown at the conference.

### REFERENCES

- [1] ENIAC working group, Strategic Research Agenda (2nd edition), European Technology Platform Initiative (2007).
- [2] 2007 International Technology Roadmap for Semiconductors (ITRS) 2007 Edition Emerging Research Devices. http://www.itrs.net/Links/2007ITRS/Home2007.htm.
- [3] Ángel Rodríguez-Vázquez et al.,"A 3D Chip Architecture For Optical Sensing and Concurrent Processing", SPIE Photonics Europe, Optical Sensing and Detection. 12-15 April 2010. Proceedings of SPIE Volume: 7726-39. DOI: 10.1117/12.855027.
- [4] M.F. Snoeij et al.,"Multiple-Ramp Column-Parallel ADC Architectures for CMOS Image Sensors", IEEE J. Solid-State Circuits, Vol 42, no. 12, pp. 2968-2977, Dec. 2007
- [5] A. van den Bosch et al., "A 10-bit 1-G sample/s Nyquist current-steering CMOS D/A Converter", IEEE J. Solid-State Circuits, vol. 36, no. 3, pp. 315–324, Mar. 2001.
- [6] Heng Zhang, Mohamed Mostafa Elsayed, and Edgar Sánchez-Sinencio,"New applications and technology scaling driving next generation A/D converters", Circuit Theory and Design, 2009. ECCTD 2009. European Conference on 23-27 Aug. 2009, pp. 109 112, ISBN: 978-1-4244-3896-9.
- [7] C. Toumazou, B. Gilbert, G.S. Moschytz, "Trade-Offs in Analog Circuit Design -The Designer's Companion", London: Kluwer Academic Publishers, 2002.
- [8] Kerry Bernstein, Norman J. Rohrer, "SOI: Circuit Design Concepts", Springer 2007.