## 10 Gbit/s 0. 25 μm CMOS 1 :4 demultiplexer

Ding Jingfeng Wang Zhigong Zhu En Wang Gui Xia Chunxiao Xiong Mingzhen

(Institute of RF- & OE-ICs, Southeast University, Nanjing 210096, China)

**Abstract:** A 10 Gbit/s (STM-64, OC-192) 1 : 4 demultiplexer (DEMUX) with 4-phase clock was achieved in TSMC's standard 0. 25  $\mu$ m complementary metal-oxide-semiconductor (CMOS) technique. All of the circuits are in source coupled FET logic (SCFL) to achieve as high as possible speed and suppress common mode distortions. This DEMUX is featured by constant-delay buffers to generate a 4-phase clock and adjust skews of the four channel outputs. The fabricated DEMUX operates error free at 10 Gbit/s by 2<sup>31</sup> – 1 pseudorandom bit sequences (PRBS) via on-wafer testing. The measured root mean square (rms) jitter, rising and failing edge of the eye-diagram are 11, 123 and 137 ps, respectively. The chip size is 0.9 mm × 1.2 mm and the power dissipation is 550 mW with a 3.3 V supply.

**Key words:** optical receive; complementary metal-oxide-semiconductor (CMOS); demultiplexer (DEMUX); latch

DEMUX is one of the key components in a high speed data transmission system. It normally lies at the end of an optical receiver and has the function of recovering the original low speed parallel bit streams from a high speed serial input. Until now, most multiplexers (MUX) and DEMUXs operating at bit rates of more than 10 Gbit/s have been generally fabricated in GaAs HEMTs<sup>[1]</sup>, SiGe BiCMOS<sup>[2]</sup>, InP HEMTs<sup>[3]</sup> and InP HBTs<sup>[4]</sup>. But they all have the same drawback of high power dissipation with a higher power supply. Recent achievements in CMOS have verified that it is practicable to design ICs for data transmission systems with economical cost, high yield and high integration. But at bit rates of more than 10 Gbit/s, most of them are fabricated using more advanced technologies with smaller feature sizes such as 0.18 µm<sup>[5]</sup> and 0.12  $\mu m^{[6]}$ .

In this paper, a 10 Gbit/s (STM-64, OC-192) 4phase clock 1 :4 DEMUX achieved in TSMC's standard 0. 25  $\mu$ m CMOS is described. It features constantdelay buffers which generate 4-phase clocks and adjust skews of the four outputs. Compared with the same bit-rate conventional tree-type DEMUX, it reduces the number of latches and lowers the power dissipation. The fabricated four phase clock 1 :4 DE-MUX operates error free at 10 Gbit/s by 2<sup>31</sup> – 1 pseudorandom bit sequences (PRBS) via on-wafer testing. The chip size is only 0.9 mm  $\times$  1.2 mm and the power dissipation is 550 mW with a 3.3 V power supply.

#### **1** Conventional Tree-Type **1**:4 DEMUX

As shown in Fig. 1, the conventional tree-type 1:4 DEMUX consists of two stages. The first 1:2 stage divides the serial input bit stream into two parallel output bit streams by a half rate clock signal. One stream contains the odd numbered data bits, and the other, the even ones. The 1:2 stage consists of one rising clock edge triggered master-slave D-type flip-flop (MSDFF) and one falling clock edge triggered master-slave-slave D-type flip-flop (MSDFF). The MSSDFF has one more latch than the MSDFF, in the interest of synchronization. Both of these output streams are used as input signals for the next stage 2:4 DEMUX, which convert these into four parallel ones. In this structure, a toggle flip-flop (TFF) 1:2



Fig. 1 Conventional tree-type 1:4 DEMUX

Received 2004-10-11.

Foundation item: The National High Technology Research and Development Program of China (863 Program) (No. 2002AA312230, 2003AA31G030).

**Biographies:** Ding Jingfeng (1976—), male, graduate; Wang Zhigong (corresponding author), male, doctor, professor, zgwang@ seu. edu. cn.

142

frequency divider is required to get the quarter-rate clock needed by the second-level 1 : 2 stages. Therefore, a conventional tree-type 1 : 4 DEMUX has in total 17 latches, five in the first 1 : 2 stage operating at fast rate, 10 in the second-level 1 : 2 stages operating at slower rate and two acting as a high speed TFF 1 : 2 frequency divider.

#### 2 Four-Phase Clock 1:4 DEMUX

As shown in Fig. 2, this DEMUX features constant-delay buffers which generate a 4-phase clock and adjust skews of the four outputs. Unlike the tree-type DEMUX, it has just one stage and directly divides the 4f (bit/s) serial input bit stream into 4f (bit/s) parallel output bit streams by the 4-phase clock. Because the first high speed stage, which consumes much more power, is eliminated, this DEMUX achieves power saving substantially. When designing a higher speed DEMUX in this structure, we can substitute the constant-delay buffers with passive transmission lines (TL) to save more power and get more precise delay. Furthermore, this DEMUX just has 8-latch which is much less than that of the 1 :4 tree-type DEMUX.



In Fig. 3, the schematic diagram of the used latch with typical source coupled FET logic (SCFL) is shown. It samples the input data during the high level of the clock and holds the sampled data during the low level of the clock. Because all of the latches are controlled by the quarter-rate 4-phase clock and work at a lower speed, it is necessary to precisely sample and hold the full-speed input data. Two methods are applied in this latch to enhance its performance: ① To keep the size ratio of the sampling pair (NM3, NM4) and the hold pair (NM5, NM6) 1 : 1 to enhance the hold ability; ② To modify the load to be a symmetric one which consists of an NMOS and a PMOS active load. In this way, a higher gain is achieved for the high resistance of the active loads which will reduce

the setup and hold time and accelerate the whole circuit. Furthermore, the symmetric load will be beneficial by easing the layout to get a better noise performance.



Fig. 4 shows the schematic of a constant-delay buffer. Two stages of this buffer will achieve a  $\tau$  ( $\approx$  100 ps) delay, which means a 90° phase shift for the 2. 5 GHz clock. In addition, the clock signal is differential, so clocks with phases of 0°, 90°, 180° and 270° are obtained. Because the NRZ outputs of MS-DFFs trigged by the 4-phase clock have a  $\tau$  delay each,  $3\tau$ ,  $2\tau$  and  $\tau$  delays are added after the MSDFFs trigged by 0°, 90° and 180° clocks to synchronize the output signals. Fig. 5 shows their time chart clearly. All of the above work is established on the assumption that the buffers function with precise constant-delay. In order to remove the difference between simulation and actual result, we adjust the delay by changing bias tail current  $I_{ss}$  according to the following equation<sup>[7]</sup>:

$$\tau \propto \frac{V_{\rm DD} C_{\rm L}}{I_{\rm SS}} \tag{1}$$

where  $I_{\rm SS}$  is the tail current,  $C_{\rm L}$  is the total intrinsic and parasitic capacitance and  $V_{\rm DD}$  is the power supply. And by symmetrical layout, the discrepancy in the signal in different channels is reduced to minimum.





#### **3** Fabrication and Measurement

This circuit is achieved in TSMC's standard 0. 25  $\mu$ m single-poly 5-metal (5M1P) CMOS process. The cutoff frequency  $f_T$  of this process is 18.6 GHz. The microphotograph of the fabricated chip is shown in Fig. 6. The chip size is 0.9 mm × 1.2 mm. Its element number is 324. The input data and clock are AC-coupled and terminated with 50  $\Omega$  on-chip resistors. The output buffers are designed to drive 50  $\Omega$  external loads. An on-chip output termination resistor of 100  $\Omega$  is provided to reduce the output return loss compared to the open drain configuration.



Fig. 6 Microphotograph of the chip

The performance of the fabricated DEMUX was measured on-wafer on a Caccade Microtech's probe station. The test set-up is shown in Fig. 7. The 4f (Gbit/s)  $2^{31} - 1$  PRBS input data and the f (GHz) clock were generated by an Advantest D3186 Pattern



Generator and the outputs were measured by a widebandwidth oscilloscope, Agilent DCA 86100A. The error free operating range of this chip was tested from 9 Gbit/s to 11 Gbit/s and the amplitudes of the input data and clock were 500 mV. Fig. 8 (a) shows the measured eye-diagram of one single-ended output with a 10 Gbit/s  $2^{31} - 1$  PRBS input data and a 2. 5 GHz sinusoidal clock signal. The measured root mean square (rms) jitter, rising and failing edge of the eyediagram are 11, 123 and 137 ps, respectively. Figs. 8 (b) and (c) show the measured eye-diagram at 11 Gbit/s and 9 Gbit/s  $2^{31} - 1$  PRBS input data. According to four parallel eye - diagrams in Fig. 8 (d), their



**Fig. 8** Measured eye-diagrams. (a) 10 Gbit/s; (b) 11 Gbit/ s; (c) 9 Gbit/s PRBS input; (d) All four output signals at 10 Gbit/s PRBS input

skew is less than 40 ps and it confirms that the time adjusting method adopted in this chip is practicable. Because of the limitation of the equipment, the phase margin was not accurately measured, but it is surely larger than  $180^{\circ}$  by cursory estimation. The typical DC power consumption of this chip is about 550 mW with a single 3. 3 V supply. Moreover, this chip could work properly under supply voltages from 3.1 to 3.5 V.

### 4 Conclusion

This work demonstrates that the standard 0.25  $\mu$ m CMOS technology is practical for ultra-high speed ICs operating at 10 Gbit/s and above. The method used to get 4-phase clocks and adjust the output skews is practical. Furthermore, this DEMUX can be widely applied in the STM-64 or OC-192 optical receiver.

#### References

- Lang M, Wang Z, Lao Z, et al. 20 40 GB/s 0.2-μm GaAs HEMT chip set for optical data receiver [J]. *IEEE* Journal of Solid-State Circuits, 1997, 32(9):1384 – 1393.
- [2] Meghelli M, Rylyakov A V, Shan L. 50 Gb/s SiGe Bi-

CMOS 4 : 1 multiplexer and 1 : 4 demultiplexer for serial communication systems [A]. In: *ISSCC* [C]. San Francisco, 2002. 260 – 261.

- [3] Sano K, Murata K, Kitabayashi H, et al. 50-GBit/s InP HEMT 4 : 1 multiplexer/1 : 4 demultiplexer chip set with a multiphase clock architecture [J]. *IEEE Trans on Microwave Theory and Technique*, 2003, 51(12): 2548 – 2554.
- Yen J, Case M G, Nielsen S, et al. A fully integrated 43.2 GB/s clock and data recovery and 1 :4 DEMUX IC in InP HBT technology [A]. In: *ISSCC* [C]. San Francisco, 2003.240 241.
- [5] Tanabe A, Umetani M, Fujiwara I, et al. 0. 18-μm CMOS 10-Gb/s multiplexer/demultiplexer ICs using current mode logic with tolerance to threshold voltage fluctuation
  [J]. *IEEE Journal of Solid-State Circuits*, 2001, **36**(6): 988-996.
- [6] Kelhrer D, Wohlmuth H, Knapp H, et al. 40-Gb/s 2 : 1 multiplexer and 1 : 2 demultiplexer in 120 nm CMOS
  [A]. In: *ISSCC* [C]. San Francisco, 2003. 344 – 349.
- [7] Plouchart J, Kim J, Zamdmer N, et al. A 31GHz CML ring VCO with 5.4 ps delay in a 0.12-μm SOI CMOS technology [A]. In: *ESSCIRC* [C]. Lisbon, Portugal, 2003. 357 – 360.

# 10 Gbit/s 0.25 µm CMOS 1:4 分接器

丁敬峰 王志功 朱 恩 王 贵 夏春晓 熊明珍

(东南大学射频与光电集成电路研究所,南京 210096)

**摘要:**描述了一种基于 TSMC 0.25 μm CMOS 工艺设计的 10 Gbit/s(STM-64,OC-192)四相位时钟 1:4 分接器.为了实现最高的工作频率和抑制共模噪声,所有的电路都采用了源极耦合逻辑(SC-FL)结构.本分接器的特点是通过采用固定延时缓冲来实现四相位时钟和输出边沿的对准.通过在晶圆测试,该芯片在输入 10 Gbit/s 长度为 2<sup>31</sup> -1 伪随机码流时,分接功能正确.此时所测得的眼图 的均方根抖动、上升沿和下降沿分别为 11,123 和 137 ps. 芯片面积为 0.9 mm × 1.2 mm,在 3.3 V 单电源供电的情况下的典型功耗为 550 mW.

关键词:光接收机;CMOS;分接器;锁存器

中图分类号:TN722