CMOS High-Speed Dual-Modulus Frequency Divider for RF Frequency Synthesis

Navid Foroudi and Tadeusz A. Kwasniewski

Abstract—The architecture of a high-speed low-power-consumption CMOS dual-modulus frequency divider is presented. Compared to other designs fabricated with comparable CMOS technologies, this architecture has a better potential for high-speed operation. The circuit consumes less power than previously reported CMOS circuits, and it approaches the performance previously achieved only by bipolar or GaAs devices. The proposed circuit uses level-triggered differential logic to create an input-frequency-entrained oscillator performing a dual-modulus frequency division. In addition to high-speed and low-power consumption, the divider has a low-input signal level requirement which facilitates its incorporation into RF applications. Fabricated with a 1.2-µm 5-V CMOS technology, the divider operates up to 1.5 GHz, consuming 13.15 mW, and requiring less than 100 mV rms input amplitude.

I. INTRODUCTION

Dual-modulus frequency dividers are used in fractional-division phase-locked loop (PLL) synthesizers as shown in Fig. 1 [1], [2]. By changing the division ratio of the divider periodically between \( N \) and \( N + 1 \), the divider, on average, has a fractional division ratio. The divider divides by \( N \) when its modulus control input is a logic 0 and by \( N + 1 \) when the control input is a logic 1. The density of the modulus controller is \( K/2^m \). Therefore, the divider, on average, divides by \( N + K/2^m \). As a result, the frequency step at the output of the synthesizer can be a fraction of the reference frequency. This lets the frequency synthesizer have both high-frequency resolution and short settling time, two essential requirements of a mobile-radio front-end frequency synthesizer [3].

However, if a PLL is to be used as the frequency synthesizer of a cellular phone, careful considerations are required for the speed and power consumption of the VCO and frequency divider, the two components operating at the highest frequency in the system. For GHz-range frequency synthesizers, these two blocks are often made using bipolar or GaAs technologies. This paper introduces a CMOS dual-modulus frequency divider which offers both high-operating frequency and low-power consumption, facilitating the fabrication of the divider along with low-frequency sections with a CMOS technology.

To pursue high speed for CMOS integrated circuits, architectures with shorter gate lengths are developed, offering higher integration density, lower power consumption, and higher operating frequency. However, it is also possible to increase speed by improving circuit designs. A success in the latter, the main focus of the work presented here, enables us to fabricate IC’s with higher operating frequencies using relatively conventional processes. The challenge is to explore novel circuit techniques which can fully exploit the capabilities of a given process.

The frequency division required in a PLL is performed by a counter [3] or a sequential circuit. In general, sequential circuits are built with edge-triggered D-type flip-flops. We propose the use of a level-triggered latch instead of an edge-triggered flip-flop as the building block for the sequential circuit, in order to achieve high-speed and low-power consumption. This results in an increase of the operating frequency, although the circuit operates only in a limited range of input frequencies.

II. MAXIMUM OPERATING FREQUENCY

A sequential circuit consists of state storage devices, usually edge-triggered D-type flip-flops, and combinational logic circuits. D-type flip-flops themselves are composed of D-type level-triggered latches. The time required by a latch to accept or latch the input data is the latching period \( \tau_L \). This period starts either when the clock changes to the voltage level corresponding to the transparent mode or when the clock is already at the transparent level and the input data changes. The period ends when the clock can go back to the latch voltage level without harming the new latched data. The \( D \) input data must be prevented from changing during this period. The additional time required by the latch to propagate the new
Fig. 2. (a) Block diagram of a sequential circuit cell. (b) Positions of latch and combinational circuit time intervals with respect to the clock waveform.

Fig. 3. (a) Block diagram of a latch-operated sequential circuit cell. (b) Positions of latch and combinational circuit time intervals with respect to the clock waveform.

The clock signal must be kept low for a period longer than or equal to the latching period of the first latch so that the new data is stored in the first latch

$$T_L \geq (\tau_{L1})_{\text{max}}.$$ (1)

If the first latch has different latching delays for a logic 1 and a logic 0, $(\tau_{L1})_{\text{max}}$ corresponds to the longer delay.

The new data will be stored in the second latch only if the clock is held high for a period longer than or equal to its latching period

$$T_H \geq (\tau_{L2})_{\text{max}}.$$ (2)

Also, the clock period must be longer than or equal to the total delay through the cell. Satisfying this condition guarantees that in the worst case, the data will pass through the cell and reach the next cell in time for the next clock cycle

$$T_{\text{clk}} \geq (\tau_{L1} + \tau_{P1} + \tau_{L2} + \tau_{P2} + \tau_{C})_{\text{max}}.$$ (3)

Satisfying all of the aforementioned three conditions is necessary for the proper operation of the sequential circuit. The maximum operating frequency of the circuit is determined by the dominant condition.

B. Latch-Operated Sequential Circuits

Replacing the edge-triggered flip-flops with level-triggered latches changes the constraints on the clock period. The block diagram of a cell after this replacement is shown in Fig. 3. All of the latches are transparent when the clock is high.
Condition (1) is no longer a requirement since there are no latches being activated by the low level of the clock. The clock must be high for a period longer than or equal to the latching period of the latch so that the data is stored in the latch. Therefore, condition (2) remains unchanged and must be satisfied

\[ T_{H} \geq (\tau_{L})_{\text{max}}. \]  

(4)

The clock period must be longer than or equal to the total delay through the latch and the combinational circuit. Otherwise, the data will not reach the input of the following latch before the next rising edge of the clock

\[ T_{\text{clk}} \geq (\tau_{L} + \tau_{P} + \tau_{C})_{\text{max}}. \]  

(5)

This condition replaces (3).

A negative side effect of removing the first latch of every flip-flop is the imposition of a minimum operating frequency on the sequential circuit. Propagation of a data bit through a cell takes \(\tau_{L} + \tau_{P} + \tau_{C}\). If the clock signal is kept high for a period longer than that, the following latch starts to latch the new data that has just passed through the cell. But, the following latch must not latch the new data until the next cycle. Therefore, \(T_{H}\) must be shorter than the total delay of every cell in the circuit

\[ T_{H} < (\tau_{L} + \tau_{P} + \tau_{C})_{\text{min}}. \]  

(6)

The clock signal must therefore satisfy a two-sided relation to guarantee the proper operation of the sequential circuit. Satisfying conditions (4) and (5), determining the maximum operating frequency, guarantees that the circuit can go from any state to the next one in a single clock cycle, while satisfying (6), which determines the minimum operating frequency, ensures that the circuit does not go through more than one state in any cycle.

If \((\tau_{L} + \tau_{P} + \tau_{C})_{\text{min}}\), the minimum cell delay, is much shorter than \((\tau_{L} + \tau_{P} + \tau_{C})_{\text{max}}\), the maximum cell delay, it might not be possible to satisfy the two-sided relation. A sequential circuit that does not have a combinational circuit in every cell can be an example. Therefore, it is necessary to keep the minimum cell delay as close to the maximum cell delay as possible.

C. Keeping the Latches in Their Transparent Region

In the previous section, we assumed that the latches were turned off after each change of state and remained off until the next rising edge of the clock which initiated another cycle. However, this is not a requirement. As will be explained, it is possible to keep the latches on and still have a functional frequency divider.

A dc voltage applied to the clock input of a counter constructed with positive level-triggered latches makes the circuit change states one after another as long as the input voltage is higher than the threshold of the latches. Note that the clock voltage does not need to be at the logic 1 level for the circuit to oscillate. Any other voltage above the threshold of the latches allows them to become transparent. The speed of a latch depends on its node capacitances and the resistance of paths that connect those nodes together. Some of the paths are through switching transistors. The clock voltage determines the resistance of the switching transistors. Therefore, the latching period \(\tau_{L}\) is not constant and varies depending on the voltage applied to the clock input.

Fig. 4 shows a Johnson counter employing identical latches as its storage elements. Assuming that the latches are on and the minimum and maximum cell delays are equal, the latching, propagation, and combinational circuit periods are positioned as shown. If the dc input voltage is increased, \(\tau_{L}\) becomes shorter and the frequency of oscillation increases. With the decrease of the input voltage, the delay \(\tau_{L}\) becomes longer and the frequency decreases. The circuit behaves like a voltage-controlled oscillator whose period of oscillation is

\[ T_{\text{out}} = 4(\tau_{L} + \tau_{P} + \tau_{C}). \]  

(7)

Note that only \(\tau_{L}\) varies as the input clock voltage changes and \(\tau_{P}\) and \(\tau_{C}\) are independent of the input voltage. If a sinusoidal signal (clock) with a frequency of \(4f_{\text{out}}\) is added to the input, the output signal \(Q_{2}\) becomes synchronized with the input signal as shown in Fig. 5(a). If \(Q_{2}\) falls behind, \(\tau_{L}\) occurs later when the input voltage is higher. A higher input voltage shortens \(\tau_{L}\) and causes \(Q_{2}\) (and \(Q_{1}\)) to occur earlier. On the other hand, if \(Q_{2}\) moves ahead with respect to the clock signal, \(\tau_{L}\) is moved to a period during which the clock voltage is lower and as a result becomes longer and slows down the circuit. Therefore, although the latches are always on, the circuit is still in phase with the clock signal and

\[ \tau_{L} + \tau_{P} + \tau_{C} = T_{\text{clk}}. \]  

(8)

If the clock frequency is increased, \(\tau_{L}\) is moved to a period during which the input voltage is higher so that the new \(\tau_{L}\) satisfies (8). The input frequency that causes \(\tau_{L}\) to coincide with the maximum input voltage (Fig. 5(b)) is the maximum operating frequency. If the clock frequency is increased above this limit, \(\tau_{L}\) cannot become any shorter and the circuit fails to satisfy (5). Since the latches are always on, satisfaction of (4) is guaranteed. Now, the maximum operating frequency is only constrained by (5), the cell delay requirement. The minimum operating frequency positions \(\tau_{L}\) so that it coincides with the minimum input voltage.
Comparing (3) and (5), it is obvious that the maximum operating frequency can be considerably increased by replacing flip-flops with level-triggered latches. Also, the number of nodes in every cell is decreased by this replacement, which results in a lower power consumption. The price for these is having a relatively narrow operating frequency range.

III. CIRCUIT DESIGN

Fig. 7 shows the functional block diagram of the dual-modulus frequency divider, which includes a divide-by-3-or-4 synchronous counter as the first (high-frequency) stage followed by a divide-by-4 asynchronous counter as the second (low-frequency) stage. The input signal, amplified by a logic inverter configuration, clocks the first stage. The first stage output clocks the second stage. Depending on the signal value at MC1 (Fig. 7), the first stage division ratio is 3 (MC1 = 0) or 4 (MC1 = 1). If there is a logic 1 on MC, the first stage always divides by 4, resulting in a total division ratio of 16 for the divider. For MC = 0, the OR gate (Fig. 7) forces the first stage to divide by 3 during one of the four states of the second stage, changing the total division ratio of the divider to 15.

The first stage, clocked with the high-input frequency, determines the maximum operating frequency of the divider. The second stage only needs to be fast enough to operate at the first stage output frequency. Higher division-ratio designs can be obtained by simply increasing the number of flip-flops in the second stage.

The first stage of our divider, a two-bit Johnson counter, employs a single-phase-clock differential-logic level-triggered latch. The key design goal has been to increase the maximum operating frequency of the first stage without compromising its stable operation. Fig. 8 shows the schematic diagram of the latch. The two p-channel transistors form a regenerative circuit, and act like the two cross-coupled inverters of a static memory cell. The inverters shown in Fig. 8, with larger W/L ratios than those of the regenerative p-channel transistors, act as buffers. When the clock rises, the inverters are strong enough to toggle the regenerative circuitry of another latch if connected to its D and D inputs. The latching period (τL) is the time required to toggle the cross-coupled circuit, while the propagation period (τP) is the delay introduced by the buffers.
Fig. 8. Differential-logic latch.

The schematic diagram of the first stage is shown in Fig. 9. The required inverting operation within the Johnson counter is achieved without additional combinational logic circuitry by flipping the differential outputs of the latch. The logic function of the AND and two-input OR gates shown in Fig. 7 is performed by one p-channel and four n-channel transistors and the first buffer of latch L1 as shown in Fig. 9. This circuit receives its input signals from the regenerative circuit outputs; therefore, it starts to operate before the latches settle to their final values. This results in a partial overlap of the propagation period $T_p$ and the combinational logic delay $T_C$, keeping the maximum cell delay from becoming much longer than the minimum cell delay. It has been possible to avoid series combination of p-channel transistors (see Fig. 9), a slow configuration due to the lower mobility of p-channel carriers aggravated by the series configuration [5].

The schematic diagram of the first-stage divide-by-3-or-4 counter is shown in Fig. 9. The required inverting operation within the Johnson counter is achieved without additional combinational logic circuitry by flipping the differential outputs of the latch. The logic function of the AND and two-input OR gates shown in Fig. 7 is performed by one p-channel and four n-channel transistors and the first buffer of latch L1 as shown in Fig. 9. This circuit receives its input signals from the regenerative circuit outputs; therefore, it starts to operate before the latches settle to their final values. This results in a partial overlap of the propagation period $T_p$ and the combinational logic delay $T_C$, keeping the maximum cell delay from becoming much longer than the minimum cell delay. It has been possible to avoid series combination of p-channel transistors (see Fig. 9), a slow configuration due to the lower mobility of p-channel carriers aggravated by the series configuration [5].

The second stage is constructed from two true-single-phase-clock flip-flops [4] in a ripple counter configuration. The schematic diagram of the flip-flop is shown in Fig. 10. This type of flip-flop is fast enough to be clocked with the output of the synchronous counter.

The divider has been fabricated with a 1.2-micron double-metal N-well CMOS process from Northern Telecom. Table I includes a summary of the process parameters [6].

Device sizing plays an important role in increasing the operating frequency of a circuit [4]. Transistor sizes were carefully optimized for speed in a series of post-layout simulation and layout-modification trials. All internal nodes have rail-to-rail voltage swings. SPICE (level 3) was used for simulations. Numbers in Fig. 9, represent the channel widths of transistors in micrometers. All of the transistors have the same channel length of 1.2 micrometers.

The first stage layout required special considerations in order to minimize its parasitics. Every effort was made to compact the layout and keep the parasitic capacitances and resistances as small as possible. Due to the high resistance of polysilicon and high capacitance between poly and substrate, interconnections were made with metal, and the use of poly was limited to gates. Where necessary, 45° metal lines have been used so that they would be shorter and their parasitics minimized. Wide metal tracks were used for power lines and decoupling capacitors were placed on the unused areas of the chip. The divider circuitry, which includes the first stage, the second stage, two inverters, and a three-input NOR gate occupies an area of 130 $\mu$m in 120 $\mu$m. Fig. 11 is a photograph of the internal parts of the fabricated frequency divider.

**Table I**

<table>
<thead>
<tr>
<th>Parameter</th>
<th>NMOS</th>
<th>PMOS</th>
<th>Units</th>
</tr>
</thead>
<tbody>
<tr>
<td>UO</td>
<td>566.3</td>
<td>200.8</td>
<td>cm²/√Vs</td>
</tr>
<tr>
<td>VTO</td>
<td>0.7572</td>
<td>-0.8307</td>
<td>V</td>
</tr>
<tr>
<td>TOX</td>
<td>2.502E-08</td>
<td>2.502E-08</td>
<td>m</td>
</tr>
</tbody>
</table>
IV. PERFORMANCE EVALUATION

The fabricated frequency dividers were tested to determine their maximum operating frequency, power consumption, and input sensitivity. Measurements were performed on packaged devices, and therefore through bonding wires. A block diagram of the test setup is shown in Fig. 12.

Minimum and maximum operating frequencies at three values of the dc bias voltage are given in Table II. The divider was considered to be properly operating when it had a jitter-free output, verified by a spectrum analyzer, with a frequency of $f_{in}/15$ for $MC = 0$ and $f_{in}/16$ for $MC = 1$.

Fig. 13 shows the oscilloscope traces of the input and output signals at $f_{in} = 1.4$ GHz. A sine wave with an amplitude of 0.5 V clocks the divider, while the divider provides a rail-to-rail output signal.

As discussed in the previous section, the maximum operating frequency and the oscillation frequency of the divider are both inversely proportional to the latching and propagation delays of the latches and the combinational logic delays. To measure the effect of supply voltage on the maximum operating frequency, one can therefore measure the rate at which the oscillation frequency is affected by the supply voltage and then multiply it by the division ratio. The frequency at the output of the divider was measured by the spectrum analyzer, while the input buffer was connected to a 0-volt supply and the divider was set to divide by 15. Fig. 14 shows the measured output frequency $f_{out}$ as a function of the supply voltage. The corresponding input frequencies are also given in the same diagram.

The oscillation frequency as a function of the input bias voltage and the corresponding input frequencies are shown in Fig. 15.

To investigate the effect of power-supply noise on the operation of the divider, supply voltage was modulated with an ac component and the resultant output phase modulation was measured. A 50-Hz ac component with an amplitude of 50 mV caused a maximum time shift of 18 ps at the output.

The power consumed by the frequency divider was measured by measuring the current supplied to the divider circuitry at the core of the chip. Although the values reported here do not include the power consumption of the output pad circuitry,
they do include the input and output buffer powers and the power dissipated in an on-chip divide-by-two stage which is clocked by the output of the divider. At 1.5 GHz, the divider consumes 13.15 mW and 12.5 mW while dividing by 15 and 16, respectively. As expected, the power consumption is a linear function of input frequency [5], [7], and has a slope of 5 mW/GHz. The power consumption as a function of supply voltage is shown in Fig. 16. As the supply voltage increases, the power consumption grows at a rate faster than \( V_{dd}^2 \), showing that in this circuit, power dissipation is not entirely due to the charge and discharge of capacitive loads, but also to short-circuit currents flowing from supply to ground [7]. This is expected, since at such high-frequencies, transitions take considerable portions of every cycle.

Fig. 17 shows the minimum input signal voltage required by the circuit as a function of frequency. Those voltages were the signal amplitudes at the output of the signal generator clocking the chip. One can predict that the signal at the input of the divider circuitry has even a lower amplitude due to the impedance of bonding wires and interconnects.

When a dual-modulus frequency divider is used as a fractional divider, its division ratio is periodically changed. This change should be possible once every output cycle. To evaluate this capability of the fabricated divider, the following test was performed.

An on-chip D-type flip-flop, wired to perform a divide-by-two function, was clocked by the frequency divider. The flip-flop output was connected to the divider modulus control input as shown in Fig. 18. This configuration applies \{1 0 1 0 \ldots\} to MC, changing the division ratio once every output cycle (average division ratio: 15.5). A divider functioning properly with this modulus control bit stream also guarantees operation with any arbitrary fractional division ratio. The divider was then clocked with a 1.4-GHz signal. The frequency spectrum of the divider output is shown in Fig. 19. In Fig. 19(a), the spectrum analyzer marker is on the fundamental frequency, but due to the resolution limitations of the spectrum analyzer, the marked frequency is not accurate. As shown in Fig. 19(b), the fundamental frequency is exactly equal to 1400/15.5 MHz confirming the proper operation of the frequency divider while performing fractional division.

V. CONCLUSIONS

We have proposed a CMOS level-triggered differential-logic latch to be used as a building block in a dual-modulus frequency divider. We have shown how keeping the latches on effectively increases the maximum operating frequency of the divider. This kind of divider is ideal for the frequency synthesizer of a cellular mobile radio since the required frequency range for that application is very narrow (less than 25 MHz) [8].

---

Fig. 15. Oscillation frequency as a function of bias voltage.

Fig. 17. Minimum required input voltage as a function of input frequency.

Fig. 16. Power consumption as a function of supply voltage.

Fig. 18. Fractional-division test configuration.
The fastest chip fabricated with a 1.2-μm CMOS process operated at 1.5 GHz with a power consumption of 13.15 mW. When fabricated with a submicron technology, the device could perform better than reported devices implemented with a special high-performance CMOS technology [9].

ACKNOWLEDGMENT

The authors are grateful to the Canadian Microelectronics Corporation for its continuing support of Carleton University’s Microelectronics Design Laboratory and for facilitating the access to the I.C. fabrication technology. They also thank Northern Telecom for providing access to the CMOS4S technology.

REFERENCES


Navid Foroudi received the B.Eng. degree in electronics from Ferdowsi University, Mashad, Iran, in 1987, and the M.Eng. degree in electrical engineering from Carleton University, Ottawa, Ontario, Canada, in 1992.

From 1992 to 1994, he was with Bell-Northern Research, Ottawa. He is currently with Northern Telecom, Ottawa, designing analog integrated circuits for communications systems. His research interests are in high-speed CMOS and BiCMOS integrated circuits.

Tadeusz A. Kwasniewski was born in Szczecin, Poland, on December 23, 1951. He received the M.Eng. degree in electrical engineering from the Warsaw University of Technology, Warsaw, Poland, in 1974, and the Ph.D. degree in electrical engineering from the Institute of Nuclear Research, Warsaw, Poland, in 1980.

From 1974 to 1981, he was with the Institute of Nuclear Research’s industrial nuclear electronics group as Research Engineer and then as Assistant Professor. In 1982, he joined Voest-Alpine in Linz, Austria, and in 1983 Lakehead University in Thunder Bay, Ont., Canada. Since 1985 he has been with Carleton University, Ottawa, Ont., where he is a Professor in the Department of Electronics. He has consulted for various government and private organizations in Canada and the United States. His interests are in architectures and circuits for signal processing, VLSI, and analog and digital circuits for communications.