I'd like to thank the 347 engineers who responded to my Cadence/Avant!
lawsuit survey last week. It's been kind of interesting doing it; I've
had three Wall Street investment banker types phone me trying to get the
inside scoop on it! ("Sorry, I can't tell you...") You (and the Wall
Street guys) can see the results in next week's EE Times. But because of
space limitations, I couldn't write-up there what the top three sources
of where the engineers said they got their news on the case. Here it is:
74 percent said "EE Times"
36 percent said "Internet/ESNUG discussions"
23 percent said "Opinions From Co-workers"
16 percent said "Direct Contact With Avant!"
13 percent said "Direct Contact With Cadence"
7 percent said "Opinions Heard At Conferences"
6 percent said "Other (Please Specify)_________"
11 mentions of San Jose Mercury News
5 mentions of Silicon Valley friends
3 mentions of former Cadence employees
3 mentions of Electronic News
1 mention of "British Trade Press"
4 percent said "Opinions From Boss/Management"
One engineer's comment I particularly got a laugh from was:
"Say the term "channel routing" and lawyers think you are probably
trying to guide a boat through narrow waterways... On the bright side
for Avant!, the Simpsons trial has shown us, just because you find an
item in one place (blood or source code) and can trace it back to one
party (Cadence or OJ) doesn't mean anything. Hell, maybe Cadence's
source code was planted by a renegade detective! :^)"
Next week, if you have problems trying to reach me, it's because I've
enrolled in the U.S. Federal Witness Protect Program. (Let's just say
that the PR people from both Cadence & Avant! haven't been too happy with
me since running this survey! <grin>)
- John Cooley
the ESNUG guy
( ESNUG 245 Item 1 ) ---------------------------------------------- [8/1/96]
[ Editor's Note: Every now and then, as a change of pace, rather than
running the usual 6 to 9 separate topics in ESNUG, I like to run
one complete adventure a Synopsys user has gone through in chasing
an issue. In replying to this, because there's so much being mentioned
here, please cite back the exact part you're replying to in Andy's
"Multicycle Path Odyssey With Synopsys", OK? Enjoy! - John ]
From: jaf@arl.wustl.edu (Andy Fingerhut)
Subject: My Multicycle Path Odyssey With Synopsys So Far...
Hi John,
Note: Our designs are fully synchronous, and all registers are clocked
on the same edge of the clock signal.
Suppose you have a 32 bit register A, and you want to compute some
function on it and store the computed result in another register B.
To be concrete, let's choose the "+1" function, that is, we want to
store in B a value that is 1 more than the value in A.
Simple, right? Here's even some VHDL code to do it, assuming that A
and B are defined the right way so that "+" is defined for them.
process (CLK)
begin
if (CLK'event and CLK = '1') then
B <= A + 1;
end if;
end process;
So you synthesize your design, but it turns out that even with all of
the fancy carry lookahead behemoths that Synopsys DesignWare brings to
bear on the problem, it just can't compute A+1 in a single clock
period.
Now, I know that there are all kinds of fancy things that one can do
with counters, like using Gray codes so that the combinational logic
is fast, but the representation of the values changes. Those are
perfectly useful in some situations, but here are some conditions
under which you probably wouldn't want to bother, that apply in our
designs:
1. It takes time to come up with such a circuit that is correct.
2. We aren't allowed to increase the clock period.
3. It *is* acceptable to take more than one clock period before the
correct value appears on B's outputs.
4. We want a solution that applies when conditions 2 and 3 hold,
but the function computed could be anything, because we have
many instances of this situation in our designs, and many of
them have different functions that need to be computed.
So, at some point we realize that Synopsys has "multicycle" path
constraints that can be set, which allow more than one clock period
for certain combinational paths to settle. Great, let's set one for
our example, say for 3 clock periods, and then synthesize.
set_multicycle_path 3 -from { "A_reg*" } -to { "B_reg*" }
Things synthesize great. Design Compiler still seems to try too hard
to make the +1 logic as fast as a clock period, but we can live with
that, as long as every path that must settle in a clock period does.
One strange thing that we find is that when reporting the minimum
delay path in the design, we get hold violations on many bits of B,
the worst being on the least significant bit of B. Why? Well, we
didn't realize it when first trying the command above, but setting a
multicycle path of n clock periods as shown above not only makes the
maximum delay of the combinational logic allowed to be n clock
periods, it also sets a minimum delay of n-1 clock periods for hold
time checks. By default, "compile" doesn't worry about these minimum
delay constraints. It happily produces an inverter between the least
significant bit of A to the least significant bit of A+1, which is
significantly faster than 1 clock period, let alone 3-1=2 clock
periods.
I haven't tried it, but I would guess that using the set_fix_hold
command before compile would cause the resulting design to have a
bunch of extra delay added for the bits that are computed more
quickly, to avoid these hold violations. One reason I don't want to
do this is that it seems like a waste of area and power. Another is
that no matter how many invertors it puts in a chain, there is
probably no way that the delay on the path could fall within the [2
clock periods min, 3 clock periods max] delay range for both best and
worst case of the process we're using (and perhaps most processes are
like this? I've only worked with the one we're using now), and we'd
like everything to check out OK over the whole range of process
variations.
Well, the manual page for set_multicycle_path describes options like
-setup and -hold, so that the minimum and maximum delays can be
specified separately. I tried these without much luck. The manual
page also recommends using the set_min_delay command for setting the
minimum delay. I've used that command with success. Let's do it for
our design, like so:
set_min_delay 0 -from { "A_reg*" } -to { "B_reg*" }
Throughout our scripts, we have such a set_min_delay 0 command for
every one of our set_multicycle_path commands.
Now when we check the min and max paths in the design, everything
comes out fine.
But what does the output of B look like over time? After a rising
clock edge when A changes, some bits of A+1 will be computed before
others. By the multicycle path command, B is guaranteed to have the
correct value of A+1 3 clock periods after A changes, as shown in the
timing diagram below in Figure 1, but 1 or 2 clock periods after A
changes, the outputs of B could be a mixture of the correct and
incorrect results. Some flip-flops of B could even have setup/hold
violations during these times, and the flip-flops could go metastable
as a result. All of this ugliness goes away after 3 clock periods,
but you have to be careful that all of the logic depending on B's
outputs handles this appropriately.
Notes on figure 1:
Vertical lines are times when a rising clock edge occurs, and are
numbered at the top so that they may be referred to in the text.
X's for a signal value immediately after a rising edge of clock are
just meant to indicate a possible change in the flip-flop output
value. If no X's occur, then the register outputs are the same as
they were during the previous clock period.
/'s indicate an unknown signal value.
0 1 2 3 4 5 6 7
|___|___| __|___|___|___|___|
A | a1| |X | a2| | | |
|___|___|X__|___|___|___|___|
| | | | | | | |
| | | | | | | |
|___|___|___| | | __|___|
B | |a1+1 |X///|X a2+1 |
|___|___|___|X///|X__|___|
Figure 1) The timing diagram for the multi-cycle problem
So what is a safe way to handle these unknown values? Well, for some
designs, it might not be an issue at all. The downstream circuits
might safely ignore such unknown values.
However, in several of our designs, it was important that B have the
old correct value of a1+1 until the combinational logic computing a2+1
was finished, and then B instantly changed over to a2+1.
Method 1
--------
Here is one way that one designer here used consistently. I don't
know if he realized the potential problems associated with Method 2
below, and therefore avoided using that method, or if he just got
lucky. I'll have to ask him next time I get a chance.
Assume that you can create a "timing pulse" signal as shown in Figure
2, where it is high on the rising edge of clock after B becomes good,
and low otherwise.
0 1 2 3 4 5 6 7
|___|___|___| | | __|___|
B | |a1+1 |X///|X a2+1 |
|___|___|___|X///|X__|___|
| | | | | | | |
| | | | | | __| |
sample_B |___|___|___|___|___|/ |\__|
Figure 2) A Timing pulse approach.
Then you can add another register B_clean to your design whose input
value is the output of a mux. The mux selects the output of B when
sample_B is high, or the output of B_clean when sample_B is low.
Here's the VHDL code for B_clean, and what its waveform looks like,
relative to all other signals.
process (CLK)
begin
if (CLK'event and CLK = '1') then
if (sample_B = '1') then
B_clean <= B;
end if;
end if;
end process;
0 1 2 3 4 5 6 7 8
|___|___| __|___|___|___|___|___|
A | a1| |X | a2| | | | |
|___|___|X__|___|___|___|___|___|
| | | | | | | | |
| | | | | | | | |
|___|___|___| | | __|___|___|
B | |a1+1 |X///|X a2+1 | |
|___|___|___|X///|X__|___|___|
| | | | | | __| | |
sample_B |___|___|___|___|___|/ |\__|___|
| | | | | | | | |
|___|___|___|___|___|___| __|___|
B_clean | |a1+1 | | | |X a2+1 |
|___|___|___|___|___|___|X__|___|
Figure 3) The new timing from adding another 32-bit register
This solution has the advantages of being safe and simple, but the
disadvantages of requiring another 32 bit register, and an additional
clock period before the value of a2+1 is ready. These disadvantages
were perfectly acceptable to the designer that used this method, since
he didn't have very many multicycle paths, and the timing schedule
wasn't so tight that the extra clock period hurt.
(Aside: I might be assuming too much here about the physical behavior
of a mux, since there are probably ways to implement one physically
that don't have this masking affect. For those who already know what
a hazard is, the mux cell itself might have a hazard. Anyone know of
standard cell mux implementations that do this?)
Method 2
--------
In a lot of my designs, I didn't even think of the solution above. I
thought instead of creating a timing signal, let's call it change_B,
with a high pulse one clock period earlier than sample_B does. Then I
wrote code for driving the inputs to register B differently, so that
it inferred combinational logic that is logically equivalent to the
circuit in Figure 4. Here's the code.
process (CLK)
begin
if (CLK'event and CLK = '1') then
if (change_B = '1') then
B <= A + 1;
end if;
end if;
end process;
change_B --------------------------+
|
+----------------+ V
A -->| combinational | +----+ +----+
| logic |---->|1 S |---->|D Q|-----+---->B
| to compute A+1 | | | | | |
| | +->|0 | |B_reg |
| | | +----+ | | |
+----------------+ | +----+ |
| |
+------------------------+
Note: All data paths are 32 bits wide, except for change_B.
Figure 4) Moving the high pulse one clock earlier
The same multicycle path of 3 clock periods exists from A to B, with a
minimum delay of 0, but the path from the register driving change_B to
B is a single cycle path (the default).
Here's the behavior I expected for any circuit synthesized from this
code and timing specifications:
0 1 2 3 4 5 6 7 8
|___|___| __|___|___|___|___|___|
A | a1| |X | a2| | | | |
|___|___|X__|___|___|___|___|___|
| | | | | | | | |
| | | | | __| | | |
change_B |___|___|___|___|/ |\__|___|___|
| | | | | | | | |
|___|___|___|___|___| __|___|___|
B | |a1+1 | | |X a2+1 | |
|___|___|___|___|___|X__|___|___|
Figure 5) The expecting timing from moving the high pulse earlier
If Design Compiler produced a circuit like Figure 4, that would be
safe, because any garbage data at the output of the logic computing
A+1 would be masked out by the mux, since at those times the mux
selects the current value of B_reg. In other words, the mux and
signal change_B "mask out the garbage".
(Aside: As mentioned in the earlier aside, this may be assuming a bit
more about the behavior of a mux than is warranted. However, I think
it's probably safe for most mux implementations.)
Unfortunately, Design Compiler is not *guaranteed* to produce such a
safe circuit. I don't have an example of a synthesized circuit
produced by Design Compiler that is bad, but that may only be due to
lack of spending lots and lots of time looking for such a bad example.
I'd rather spend my time thinking about ways to guarantee that the
circuit operation is as desired, rather than finding evidence that a
particular design practice that might be unsafe (perhaps only one time
out of a hundred) is unsafe.
However, I can make up examples that might scare you into reading
further! :^) Even if those examples aren't enough, you could go to
Synopsys SolvIt and read note SYNTH-1016, titled "Potential glitch
hazards when synthesizing and the methodology to avoid them". It
mentions that having having multicycle paths with different durations
ending at the same point can be dangerous. That is exactly what I'm
doing above, since there is a multicycle path of 3 from A to B, and a
"multicycle path of 1" from change_B to B.
+-----+
change_B -->| |
| |
A -+--------------------->| XOR | +----+
| | |---->|D Q|----------> B
+---delay of time T--->| | | |
+-----+ | DFF|
| |
+----+
Figure 6) An example to explain the Synopsys definition of a "hazard"
This example isn't meant to be a functionally useful circuit, since
all it does logically is pass the value of change_B through. The
interesting thing is that a change in input A produces two changes in
the output of the combinational logic, time T apart. This can happen
in functionally useful circuits, too, and the term used to describe
this behavior is that the combinational logic has a _hazard_ -- a
change in an input produces more than one change in the output.
Note that the maximum delay from change_B to B could easily be less
than 1 clock period, and the maximum delay from A to B could easily be
at most 3 clock periods. However, look at the timing diagram of B's
output in Figure 7. Assume that the fastest path from A to B is less
than a clock period, and the longest is at least 2, but less than 3.
0 1 2 3 4 5 6 7 8
| | | __|___|___|___|___|___|
A |___|___|/ | | | | | |
| | | | | | | | |
change_B |___|___|___|___|___|___|___|___|
| | | | | | | | |
| | | | __|___| | | |
B |___|___|___|/ | |\__|___|___|
Figure 7) The timing diagram for fig. 6
The basic idea is that there _could_ be a hazard from a single input
to an output if there are two different paths from the input to the
output, and they have different delays (which is hard to avoid if
there are two different paths). I see combinational logic like this
all the time coming out of Design Compiler, and Synopsys explicitly
makes no guarantees of the synthesized logic avoiding hazards. If you
have a synchronous design where every path is single-cycle, then
hazards in combinational logic don't hurt you at all functionally.
The basic method that we used to fix this was to force Design Compiler
to produce a design that is structurally like that in Figure 4. The
most straightforward way I could think of for doing that, given our
existing VHDL code, is to create a sub-design containing the
mux/flip-flop pair of cells, inputs A and change_B, and output B. A
hierarchy boundary should prevent Synopsys from getting any funny
ideas about mixing up the combinational logic of the mux and the +1.
Actually, the +1 logic would already likely be separated by a
hierarchy boundary, because it would be in a DesignWare part. But, as
noted before, I want a solution that works for other combinational
functions, too. Here is the final code:
process (A)
begin
B_combinational <= A + 1;
end process;
B_GATED_REG:
GATED_REG
port map (
CLK => CLK,
LOAD => change_B,
DATA_IN => B_combinational,
DATA_OUT => B
);
This might look awfully ugly to you. I agree. However, it has the
nice feature that it can also work if the original process had
multiple signals like "change_B" that are mutually exclusive of each
other, and several multicycle paths from different sources, perhaps
with different lengths. Then the LOAD signal can be fed the OR of all
the signals like "change_B".
- Andy Fingerhut
Washington University
|
|