( ESNUG 258 Item 3 ) -------------------------------------------- [2/7/96]
From: "Erich C. Whitney" <erich@datacube.com>
Subject: Pipelined Designs, Latches, Test Compiler and Flip-Flopping
Dear John,
I've been dealing with the question: "How do you avoid using latches in a
design when they're so darned convenient for holding control bits?" a lot
lately.
First you may ask, "Why would you want to avoid them?" And my answer is
simple, "Test Compiler!" I've read every bloody tech note on latches
with Test Compiler and the issues of testability, but nowhere does anyone
have the guts to come out and say that, "One, TC hates them, and two, they
cause more problems than they're worth in the context of an otherwise
beautifully synchronous design."
A latch presents a problem for ATPG because the LATCH ENABLE input prevents
the flow of data from D to Q. So, it has to either figure out how to
properly open and close the latch during scan test such that all of the
faults are testable. (Yes, I know that TC v3.5 can handle this in the
general case and I've been told that TC+ is even better at it but in the
context of a real design, latches are a big pain in the butt! For those of
us without a big wad of cash for TC+, this means that for every D-Latch in
a design, you can't cover the LE input or ANY of the gates that drive it.)
An Example:
I have a synchronous pipeline design consisting of about 3000 flip-flops.
Just the sort of thing that makes Synopsys shine. BUT I need about 4000
bits worth of control registers to run that pipeline! That's 4000 of those
stinking D Latches! Now, I can't justify the price of TC+ to get fault
coverage on 4000 Latches and when you take into account the loss of coverage
of the decode logic, we're talking about a 20% deficit in coverage.
Unacceptable.
Here's the original circuit:
The latches are typically grouped together into words of up to 16 bits, so
there's more than one latch per comparator as the diagram might otherwise
imply. There's probably about 1/10 as many comparators as are latches.
D-LATCH
+-----+
Data_Bus []-|>-------------------|D Q|------>Off to the pipeline...
| |
| LE |
ADDRESS +-----+
COMPARE |
+------+ |
Address_Bus []-|>--| A=B? |-+ |
+------+ | |
| +-----+ |
+-----+ +-| | |
Chip_Select []-|>--O| | | AND |--+
| AND |---| |
Write_Enable []-|>--O| | +-----+
+-----+
A CPU Write Cycle looks like this:
Chip_Select --------\_____/-------------
Write_Enable ----\_____________/---------
Address_Bus ====X=============X=========
Data_Bus ZZZZX=============XZZZZZZZZZ
A CPU Read Cycle looks like this:
Chip_Select --------\_____/-------------
Write_Enable ----------------------------
Address_Bus ====X=============X=========
Data_Bus ZZZZZZZZZZX=====XZZZZZZZZZZZ
The Solution:
It has been suggested that I make the CPU interface to this design
synchronous which would imply registering all of the input signals
shown above. That would gain some fault coverage, but I neither
need nor want a synchronous interface.
I scratched my head for a while until the answer came to me in the shower
one morning. First of all, I know that D latches are about half as big as D
flip-flops, but I'm using a 0.5 micron gate array and there's more gates than
I can shake a stick at. So I said to myself, "To hell with all of that 70's
logic minimization crap, be a guy of the 90's and eat gates! Test Compiler
wants to see DFFs, then I'll stuff so many DFFs down its shirt it'll beg for
mercy!" (Sorry, I guess I really shouldn't drink so much coffee at work.)
I want to use a D flip-flop but I don't want to clock it off of the pipeline
clock because that'll burn power AND make the interface synchronous. Yech.
So I'll make my own clock. Here's what I came up with:
+------------------------+
| 2-1MUX |
| +----+ DFF |
+---|B | +-----+ |
Data_Bus []-|>-------------|A Y|------|D Q|--+->Off to pipeline...
+-|SEL | | |
| +----+ +--|> |
ADDRESS | | +-----+
COMPARE | |
+------+ | |
Address_Bus []-|>--| A=B? |-+ |
+------+ |
|
+-----+ |
Chip_Select []-|>---| | |
| AND |------------+---->
Write_Enable []-|>--O| | (global gated chip select)
+-----+
Here, the D Latch is replaced by a DFF and a 2-1 MUX. All of the flip-flops
are clocked off of the gated chip select. New data is only written into the
control register IF it's a write cycle AND the address matches. Furthermore,
the control registers are ONLY clocked during CPU write cycles. And the
really cool thing is that this structure is handled by Test Compiler with no
complaints! You have to mark the gated chip select as a global clock, but
you can use that to your advantage because now you can easily constrain
Design Compiler to give you reasonable CPU access times to the chip!
Caveats:
There is one real-world point that I haven't made here (yet). Typically,
chips that have a CPU port such as the one I described here allow for
read-back of some kind. This opens up yet another can of worms with Test
Compiler and fault coverage. The problem is that a bidirectional data bus
implemented with BIDIR IO cells causes un-testable faults, namely in one
side or the other of the tri-state gate in the BIDIR port. Although this
has absolutely nothing to do with the problem with latches, it does create
headaches in the design because we're trying to design a practical CPU
interface.
A typical CPU interface read back using BIDIR cells:
BIDIR PAD BIDIR CELL
+-+ |\
Data_Bus |X|----+----| +------X-> to control registers as shown above
+-+ | |/ /|
+--------+ |----< from read back MUX
\| (see posting 257 #2!)
|
+-----< from output enable logic
Yet another problem:
Test Compiler recommends that you force the BIDIR into output mode which
means that you lose the testability of the data bus inputs we fought so hard
to get by using flip flops in the first place! You have to break the
combinational feedback loop introduced by the BIDIR at the point marked
by an 'X' in the diagram above (by using the set_test_isolate command).
Yet another solution:
My solution to this is to put another set of flip-flops in. "When you're a
hammer, everything looks like a nail." This provides Test Compiler with a
way to scan data into the input side of those 2-1 MUXs shown above. The 2-1
MUX shown below is there to allow scan data to get to the control registers
during normal operation. The Test Hold input is asserted (with 1) during
scan test which engages the DFF and allows scan data to pass through to the
control registers. You only have to build this structure once for each data
bus bit. And you will likely have many more control registers so the loss of
coverage introduced by the 2-1 MUX/Test Hold combination is dwarfed by the
gain in coverage you get by being able to scan into ALL of the control
register flip-flop D inputs. You end up with 2 scan clocks, one is the
global system clock, and the other is the global chip select clock we
made above. The good news is that Test Compiler handles this with no
problem.
2-1MUX
+----------+ +---+
BIDIR PAD BIDIR_CELL | DFF +-|A |
+-+ |\ | +---+ | Y|-> to control registers
Data_Bus |X|-----+--| +---X---+---|D Q|----|B |
+-+ | |/ +-|> | | S |
| | +---+ +---+
global chip-----------------------+ |
select clock | |
from example above | |
| |
Test_Hold -------------------------------------+
| /|
+-------+ |----< from read back MUX
\| (see ESNUG 257 #2)
|
+-----< from output enable logic
Test Compiler Script snippet:
/*
* make the gated chip select into a legal gated clock
* (you need to fill in the timing details for your design)
*/
create_clock -name "Chip_Select" -period 25 -waveform {"0" "12.5"} \
find(port, "Chip_Select")
set_dont_touch_network find(clock, "Chip_Select")
set_fix_hold find(clock, "Chip_Select")
/*
* declare the pins that are statically held during scan test
* (Write_Enable is necessary to make the gated clock scheme work)
*/
set_test_hold 1 find(port,"Test_Hold")
set_test_hold 0 find(port, "Write_Enable")
/*
* disable the feedback loop through the BIDIR cells
* (use need to edit BIDIR and X to fit your design and library)
*/
foreach (cell_found, find(cell, "BIDIR*") {
set_test_isolate cell_found + "/X"
}
NOTE: Depending on your ASIC vendor's technology library, you may want to
consider putting a clock tree in to drive the gated chip select signal.
Conclusion: I know this is a long and winding road just to make a CPU
interface, but it does work and it makes life easier in the long run.
Happy Flip Flopping!
- Erich C. Whitney
Datacube, Danvers, MA
|
|