Abstract

A clock generator with an edge-combiner DLL (ECDLL) has been developed for USB 2.0 applications. The clock generator generates 480 MHz 10-tap output signals from a 12 MHz reference signal and consists of three DLLs to shrink the design area so that it is smaller than a conventional one based on a PLL. Each DLL is applied to our proposed shot pulse reset technique to prevent from a harmonic lock and is applied to a voltage-controlled delay line (VCDL) with a trimming function to operate against any process voltage temperature (PVT) variations. A 90 nm CMOS process was used to fabricate our proposed clock generator. The 480 MHz 10-tap output signals satisfy the USB 2.0 specifications. A power consumption is less than 1.3 mW and a locking time is less than 3.5 μs, which are far less than a conventional one, 10.0 μs. The design area is 2 0 0 × 2 2 5 μm, which is half that of the conventional one.

1. Introduction

The clock generator may be one of the largest blocks of the physical layer (PHY) in wireline communications, because it usually consists of a phase-locked loop (PLL) that has large capacitors for use as a lowpass filter. There are many reports proposing the shrinkage of the design area of PLLs. A capacitance-multiplication technique was reported to shrink the capacitance of the loop filter [1, 2]. However, the shrink ratio of a capacitor may be less than five when taking the leakage current of the capacitor and PVT variation into consideration. Thus, the total design area cannot be drastically reduced. An all digital PLL (ADPLL) technique has also been reported [3]. However, an issue with the accuracy operation against the PVT variation still remains. Therefore, a new approach is desirable for essentially reducing the design area.

The DLL has several advantages over the PLL. First, it can be designed to be smaller than the PLL. While the PLL is a higher-order system, the DLL is a first-order system and is always stable. Thus, the DLL needs small capacitors to keep the DLL loop stable while the PLL needs large capacitors to design a stable lowpass filter. Second, the DLL can achieve a shorter locking time than the PLL. Third, the DLL consumes less power than the PLL. The PLL has the VCO and a divider that consumes a large amount of power in order to reduce the jitter. However, the DLL has several disadvantages over the PLL when used as the clock generator. First, the DLL cannot generate faster clock signals than the PLL. Second, the DLL has a locking range limitation while the PLL does not. This means that the DLL cannot achieve a fractional multiplication ratio while the PLL can achieve a fractional-N PLL.

An edge-combiner DLL (ECDLL) has been reported as an alternate high-speed clock generator because the ECDLL is based on the DLL and can multiply the reference frequency [48]. The ECDLL has a potential for use as the clock generator although it has barely been used in this capacity because the ECDLL has several challenges that need to be overcome. The first is an operation against PVT variation. The second is the output signal frequency limitation.

From the viewpoint of the frequency limitation, the operation frequency of the DLL has been increasing in recent CMOS process. The DLL can operate at less than 1 GHz in a submicron CMOS process. Thus, the DLL might be able to be use as the clock generator for wireline communications whose operation frequency is less than about 1 GHz. USB 2.0 is the most popular wireline communication in the world and operates at 480 MHz. USB 2.0 PHY needs a small design area and a low power consumption level for use in portable devices. Therefore, USB 2.0 may be one of the most suitable applications for the clock generator with the ECDLL.

In order to apply the ECDLL for USB 2.0, we propose techniques hat overcome the above-mentioned challenges. The first is a shot pulse generator to prevent from a harmonic lock. The third is the VCDL trimming function to operate against PVT variation.

In this paper, we propose a clock generator applied to an ECDLL for USB 2.0 PHY to shrink the design area [1]. The organization of this paper is as follows. Section 2 describes the overall structure of the proposed clock generator architecture. Section 3 describes the ECDLL and DLL in detail. Section 4 presents the evaluation results of our measurements, and Section 5 concludes with a short summary of the key points.

2. Overall Clock Generator Architecture

Figure 1 shows a block diagram of a USB 2.0 PHY. The PHY consists of a clock generator, a band-gap reference (BGR), a controller (CNT), a driver (DRV), a receiver (RCV), and a logic block (LGC). Figure 2 shows a block diagram of a conventional clock generator based on the PLL with a ring oscillator. The reference signal ( 𝐹 R E F ) is 12 MHz. The output signals ( 𝐹 O [9:0]) are 10-tap 480 MHz. The design area is 200 × 550 μm2. The loop bandwidth is designed at 1.6 MHz using a second-order lowpass filter that consists of 130 pF and 34 pF capacitances and a 2.7 kΩ resistance. Thus, the lowpass filter occupies a large portion of the design area. Therefore, we proposed the ECDLL as a clock generator to shrink the design area.

In this clock generator, there are three candidates, which are one ECDLL and one DLL, two ECDLL and one DLL, and three ECDLL and one DLL, as shown in Figures 3 and 4. The one ECDLL and one DLL structure consists of the DLL1 of the ECDLL that has the multiplication ratio of 40 and the DLL2 of the DLL that generates the 10-tap 480 MHz signals, as shown in Figure 3. The two ECDLL and one DLL structure consists of the DLL1 of the ECDLL that has the multiplication ratio of five, the DLL2 of the ECDLL that has the multiplication ration of eight, and the DLL3 of the DLL that has the same manner of the above DLL, as shown in Figure 4. The three ECDLL and one DLL structure consists of the DLL1 of the ECDLL that has the multiplication ratio of two, the DLL2 of the ECDLL that has the multiplication ratio of four, the DLL3 of the ECDLL that has the multiplication ratio of five, and the DLL4 of the DLL that has the same manner of the above DLL, as shown in Figure 3.

First, it is reasonable that the ECDLL has the multiplication ratio of less than 10, according to the design area and operation against from PVT variation. As the multiplication ratio is larger, the number of the VCDL stage is larger. It causes large design area. The one ECDLL and one DLL structure becomes two times as large as the two ECDLL and one DLL structure because the ECDLL of the multiplication ratio of 40 is large. The two ECDLL and one DLL structure is almost same design area as the three ECDLL and one DLL structure. And then, the previous DLL operates more slowly than the latter one. Thus, the delay cell size in the previous DLL may be smaller than that in the latter one. Therefore, to shrink the design area, the previous DLL might have smaller multiplication ratio than the latter one.

Second, the number of the cascade DLL block should be as low as possible because the operation of the whole clock generator could be stable and the settling period could be short. In our proposed clock generator, the standby sequence is necessary because the DLL may fall into the unlock state if the DLL starts to operate before the previous DLL completes the lock, as shown in Figure 4.

Finally, the two ECDLL and one DLL structure is proposed, considering above concern. The DLL1 and DLL2 have a multiplication ratio of five and eight, respectively.

The counter (CNT) generates the standby signals of each block (ST1, ST2, and ST3) using the standby signal of the clock generator (ST) to create a standby sequence for each DLL, as shown in Figure 4. If DLL2 starts the lock operation before DLL1 completes the lock, DLL2 might fall into the unlock state. Thus, the CNT controls the sequential wake-up operation by generating ST1, ST2, and ST3, as shown in Figure 4.

3. ECDLL and DLL in Detail

3.1. Shot Pulse Reset Technique

Figures 5 and 6 show a block diagram of our proposed ECDLL applied to DLL1 and DLL2 and DLL applied to DLL3. They consist of a phase detector (PD), a charge-pump (CP), a switch (SW), a capacitor, a voltage-controlled delay line (VCDL), and shot pulse generator (SHOT). And then the ECDLL has an edge-combiner (EC) and the DLL has the output buffer (BUF). The PD makes a comparison between the phase of the 𝐹 R E F and a phase of the feedback clock ( 𝐹 B ) and generates result signals (UP/DN). The CP charges and discharges the capacitor. The VCDL generates the output signals from the 𝐹 R E F . The delay time from 𝐹 R E F to 𝐹 B is controlled by a controlled voltage ( 𝑉 C ). In Figure 6, the EC generates the output signal ( 𝐹 O 1 ) from VCDL output signals.

Our ECDLL and DLL have an issue of a harmonic lock. We propose shot pulse reset technique by using the shot pulse generator to resolve this issue.

Figure 7 shows an explanation of the proposed shot pulse reset technique. The PD compares the rise edge of the 𝐹 R E F and the rise edge of the 𝐹 B and generates the UP and DN signals. When the second rise edge of the 𝐹 R E F comes after the first rise edge of the 𝐹 B , a harmonic lock occurs that the PD compares between first rise edge of the 𝐹 R E F and the first rise edge of the 𝐹 B . It is an actual operation that the PD compares between the second rise edge of the 𝐹 R E F and the first rise edge of the 𝐹 B . A shot pulse reset shown in Figure 7 is proposed to prevent this malfunction from happening. The operation of the PD is reset by R when the PD compares the first rise edge of the 𝐹 R E F with the first rise edge of the 𝐹 B . Thus, the PD can compare the second rise edge of the 𝐹 R E F with the first rise edge of the 𝐹 B . Even though the PD is reset by R, the harmonic lock occurred and the PD compared the third rise edge of the 𝐹 R E F with the first rise edge of the 𝐹 B , when the first rise edge of the 𝐹 B comes after the third rise edge of the 𝐹 R E F . To prevent this malfunction from happening, the capacitor is charged by the SW and the voltage level of the 𝑉 C is likely to be 𝑉 D D . This allows the first rise edge of the 𝐹 B to come before the second rise edge of the 𝐹 R E F when the PD starts operation.

Figures 8 and 9 show a circuit diagram of the shot pulse generator (SHOT) and the simulation results from the SHOT. After the ST1 is set to low, the R operates the pulse reset and the Q is set to high at the rise edge of the 𝐹 R E F . The pulse of the R resets the PD operation and the SW charges the capacitor during the period between the fall edge of the ST1 and the rise edge of the Q. After charging the capacitor, the 𝑉 C is almost the 𝑉 D D . Our ECDLL and DLL can be operated accurately because of the SHOT.

3.2. Charge Pump

Figure 10 shows a circuit diagram of the CP. M5 and M6 are the switches that charge and discharge the capacitor. When the CP charges the current, the UP, UN, DP, and UN are high, low, low, and high, respectively. The charge current, which is the M13 drain current, passes through the switch M5 and charges the capacitor that is connected at the 𝑉 C , and the M12 drain current passes through the switches (the M1 and M4) and flows to M10. When the CP discharges the current, the UP, UN, DP, and UN are low, high, high, and low, respectively. The discharge current, which is the M10 drain current, passes through the switch at M6 and discharges the capacitor that is connected at the 𝑉 C , and the M13 drain current passes through the switches (the M3 and M2) and flows to M9. The Op-Amp is designed in the CP to structure the common-mode feedback. When the 𝑉 C is not equal to the voltage of the 𝑉 C M , the difference between the charge current and the discharge current is larger. This causes a constant phase error. Figure 11 shows the simulation results from the CP. The lock range, which is the range in which the 𝑉 C is equal to the voltage of the 𝑉 C M , is 0.427–0.884 V under the worst conditions (ss/1.05 V/125°C).

3.3. VCDL

Figure 12 shows a block diagram of the voltage-controlled delay line (VCDL) for DLL1 and DLL3. It consists of a voltage-current converter (VIC) and the delay chain consisting of eleven delay cells. The VCDL for DLL2 has the seventeen delay cells. Figures 13 and 14 show the circuit diagrams of the VIC and the delay cell. The VIC converts the control voltage ( 𝑉 C ) to a control current ( 𝐼 C ). The VCDL delays the 𝐹 I N , which is the amount of the delay controlled by the 𝐼 C . The trimming signal (TRIM[1:0]) controls the sensitivity of the current-voltage characteristics of the M1 by changing the gm of M1. The larger the TRIM[1:0] is, the larger the 𝐼 C is. The delay cell consists of the two inverter buffers (M4-M2 and M5-M6) and current sources (M6 and M1).

In the VCDL, the sensitivity of the VCDL is important for DLL operation. Figure 15 shows the explanation of the sensitivity of the VCDL and the DLL settling operation. If the sensitivity of the VCDL delay-current characteristics is larger at the lock point, the DLL settling operation may not be stable as shown in Figure 15(d). It is the reason that the overshoot is large because the magnitude of the delay change per one clock cycle is large. To prevent from this unstable state, the VCDL sensitivity is designed small by using large delay cell for VCDL, as shown in Figure 15(a). However, this design causes large power consumption and the malfunction may be caused in the worst condition if the sensitivity is designed too small, as shown in Figure 15(b).

The delay is mainly generated as the control current and input capacitor, which is gate capacitor of M4 and M2, and a parasitic capacitor between delay cells. If the buffer MOSs (M4-M2 and M5-M3) are designed small, the necessary delay is obtained by small current. However, this causes large sensitivity. Thus, the buffer MOSs are not designed small. Figure 16 shows the VCDL delay-current characteristics by using variable delay cell. As the size of the delay cell is larger, the sensitivity at the necessary delay point is smoother.

Figures 17, 18, and 19 show the postlayout simulation results of the VCDL delay-current characteristics for DLL1, DLL2, and DLL3, respectively. The VCDL for DLL1 can achieve a target delay of 8.3 ns at about 9 μA under variable conditions. The VCDL for DLL2 can achieve a target delay of 1.04 ns at 80 μA under typical and the best conditions, which are tt/1.20 V/25°C and ff/1.35 V/−40°C, respectively. However, under the worst condition, which is ss/1.05 V/125°C, the target delay can be achieved at an 𝐼 C of about 120 μA. This is adjusted by the trimming bits (TRIM[1:0]), as shown in Figure 14. The VCDL for DLL3 can achieve a target delay of 0.208 ns at a 𝑉 C of about 200 μA. However, the VCDL is trimmed by the TRIM[1:0].

3.4. Edge-Combiner

Figure 20 shows a block diagram of the EC for DLL1. It consists of SR flip-flops (SRFFs) and NANDs and a NOR. Figure 21 shows a circuit diagram of the SRFF. It consists of six MOSs. There are two floating nodes (M3 drain node and M5 source node). Figures 22 and 23 show the postlayout simulation results from the VCDL and EC for DLL1. The one cycle delay is obtained at about 𝑉 C = 0 . 5  V in ff/1.30 V/−40°C as shown in Figure 22(a). The EC can operate variable input signal as shown in Figure 22(b). If the SRFF cannot operate accurately by the leakage current, the output signal of the EC slips the clock in part. However, the EC can get all clock edges of the each signal at variable 𝑉 C as shown in Figure 22(b) and it can operate at variable conditions as shown in Figure 23.

Figure 24 shows the postlayout simulation results from the VCDL and EC for DLL2. The one cycle delay is obtained at between 0.45 V and 0.50 V in ff/1.30 V/−40°C and tt/1.20 V/25°C as shown in Figures 23(a) and 23(b), and at between 0.50 V and 0.60 V in ss/1.05 V/125°C. The EC can get all clock edges of the each signal at variable 𝑉 C and it can operate at variable conditions as shown in Figure 24.

3.5. Lock Operation

Figure 25 shows the postlayout simulation results of the DLL1 locking operation. The simulation condition is tt/1.2 V/25°C. The DLL1 has a capacitor of 10 pF. After ST1 is set to low at about 100 ns, the R is set to high, and then a shot pulse occurs. The PD operation is reset by the shot pulse, as shown by the UP and DN signals in Figure 25. After that, the PD generates a wide DN pulse and then the 𝑉 C decreases. Finally, DLL1 completes the lock at about 1 μs. When DLL1 completes the lock, the 𝑉 C is about 0.6 V. Figure 26 shows the VCDL output signals after the DLL1 completes the lock. The EC can generate the output signal ( 𝐹 O 1 ) of 60 MHz.

Figure 27 shows the postlayout simulation result of the DLL2 locking operation. The simulation condition is tt/1.2 V/25°C. The DLL2 has a capacitor of 0.5 pF. After ST2 is set to low at about 200 ns, the R is set to high and then a shot pulse occurs. The PD operation is reset by the shot pulse, as shown by the UP and DN signals in Figure 27. After that, the PD generates a wide DN pulse and then the 𝑉 C decreases. Finally, DLL2 completes the lock at about 400 ns. When DLL2 completes the lock, the 𝑉 C is about 0.6 V. Figure 28 shows the VCDL output signals after the DLL2 completes the lock. The EC can generate the output signal ( 𝐹 O 2 ) of 480 MHz.

Figure 29 shows the postlayout simulation results of the DLL3 locking operation. The simulation condition is tt/1.2 V/25°C. The DLL2 has a capacitor of 1 pF. After ST3 is set to low at about 50 ns, the PD operation is reset by the R shot pulse, as shown by the UP and DN signals in Figure 28. After that, the PD generates a wide UP pulse and then the 𝑉 C increases. Finally, DLL3 completes the lock at about 100 ns. When DLL3 completes the lock, the 𝑉 C is about 0.6 V. Figure 30 shows the VCDL output signals after the DLL3 completes the lock. The DLL3 can generate the output signals of 480 MHz.

Figure 31 shows the postlayout simulation results from a clock generator that consists of DLL1, DLL2, and DLL3. After ST is set to low, ST1 is set to low first. At this time, ST2 and ST3 remain high. 𝐹 R E F inserts DLL1 and the 𝑉 C is nearly 𝑉 D D due to a precharge. The PD generates a wide DN pulse at first because of the precharge. The 𝑉 C decreases due to the wide DN pulse. At about 2 μs, DLL1 completes the lock and generates 𝐹 O 1 , which is the 60 MHz clock signal. ST2 is set to low at about 1 μs. It is essentially set to low after DLL1 completes the lock. However, in this simulation, it is set to low before the DLL lock time. The 𝑉 C in DLL2 is almost 𝑉 D D due to the precharge. After ST2 is set to low, the PD in DLL2 generates a wide DN pulse and then the 𝑉 C is soon almost 0.5 V. At about 2 μs, DLL2 completes the lock. ST3 is set to low at about 2 μs. It is essentially set to low after DLL2 completes the lock, but in this simulation, it is set to low before the DLL lock time, too. After ST3 is set to low, the 𝑉 C at first remains almost 𝑉 D D . The 𝑉 C decreases at about 2.6 μs and finally is about 0.5 V. DLL3 completes the lock and generates the 10-tap 480 MHz clock signals. The total lock time of the clock generator is about 3.0 μs. In general, the locking time of the DLL is defined by the capacitance and CP current. When the CP current is large for the capacitance, the locking time is short, but the locking operation is barely stable. DLL1 is designed to be stable because the phase error of DLL1 directly influences the other DLLs. DLL2 and DLL3 are designed to achieve a fast locking time because the clock generator can achieve it. Then, DLL2 and DLL3 start to operate before the forward DLL completes the fast locking time. When DLL2 starts to operate, 𝐹 O 1 is almost 60 MHz. Thus, DLL2 can accurately operate.

4. Measurement Results

A 90 nm CMOS process was used to fabricate our proposed clock generator for use as a USB 2.0 PHY. Figure 32 shows the measurement results of the output signal 𝐹 O [ 9 ] . The measurement signal is 𝐹 O [ 9 ] divided by eight. The clock generator output signal frequency is 480 MHz. The jitter is less than 0.8 psrms. Figure 33 shows the measurement results of the EYE pattern for the USB 2.0 specifications. The USB 2.0 PHY with our proposed clock generator can pass these specifications. Figure 34 shows the measurement results of the random data pattern in the USB 2.0 specifications. The USB 2.0 PHY with our proposed clock generator can operate random data that meets the USB 2.0 specifications. Figure 35 shows the layout of the chip. Our proposed clock generator consists of three DLLs that is half the design area as that of the conventional one that consists of the PLL. Our clock generator consists of three DLLs. However, each DLL has a small capacitor to maintain the loop stability. Thus, our clock generator is smaller than the conventional one that has a large capacitor in the loop filter. Table 1 is a comparison table. The proposed clock generator has a power consumption of 1.3 mW, which is less than that of the conventional one, which is based on the PLL as shown in Figure 2. The ECDLL operates at the necessary reference signal frequency in the DLL loop that includes the VCDL. Thus, the power consumption is less than that of a PLL that has a VCO and a divider. A locking time of less than 3.5 μs can also be achieved.

5. Conclusion

We proposed novel clock generator architecture to shrink the design area. The proposed clock generator consists of two edge-combiner DLLs and a DLL. A shot pulse generator is used in the DLLs to prevent from harmonic lock and a CP with common-mode feedback is used in the DLLs to reduce the pattern jitter due to a constant phase error. A controller is used to control the wake-up sequence to prevent malfunctions. Our proposed clock generator is fabricated using a 90 nm CMOS process. It can achieve 10-tap 480 MHz clock signals that meet the USB 2.0 specifications. A power consumption of less than 1.3 mA was also achieved. Our USB 2.0 PHY with this clock generator also meets the USB 2.0 specifications. Our proposed clock generator needs only half the design area of the conventional one, which is based on the PLL.