( ESNUG 480 Item 9 ) -------------------------------------------- [03/05/09]
From: Sridevi Warrier <sridevi.warrier=user domain=analog hot mom>
Subject: Engineer finds PrimeTime dmsa_fix_hold works but with caveats
Hi John,
We recently taped out a 65 nm, 5M instance chip with 27 mm2 area. Its main
clock 450 MHz. I wanted to share our hold fixing story using the PrimeTime
Distributed Multi-Scenario Analysis (DMSA) on our main CORE block and pass
along a few recommendations to other users.
Background
----------
CORE is our timing-critical block. It's hierarchical & contains sub-blocks
MUL and ALU. Place and route for all the three blocks was in IC Compiler.
While we finished P&R without any hold violations, but our design practice
required that we build in +70 psec of extra hold margin. So we needed to
further reduce hold times, but couldn't afford to trash the critical setup
timing we had achieved in IC Compiler.
That's when we thought of trying out PrimeTime's DMSA for the CORE block to
reduce the hold time without compromising our setup times.
Our Approach
------------
The flow we used is as follows:
1. Report the setup and hold timing and check for hold violations.
2. Run dmsa_fix_hold utility to fix hold and generate an ECO command file.
3. Post-process the ECO file. This had to be done since PT is run flat
and we had to generate ECO files for the three layout sub-blocks.
4. Perform the ECOs in ICC, then re-run the DMSA script to analyze setup
and hold again in PT. We ran through this flow (steps 1-4) four times
which resulted in a design with all hold violations fixed without any
degradation in setup times; the benefit was this didn't require any
manual fixes on our part and the setup and hold fixes were taken care
of across multiple corners and modes.
Results and Observations
------------------------
Before hold fixing, the design had 280 psec negative setup slack and
140 psec negative hold slack, including 1675 hold violations with 0 hold
margin. While the negative setup slack was acceptable, our design flow
requires us to ensure hold slack margins of at least +70 psec.
We ran dmsa_fix_hold in PrimeTime multi scenario mode. We found that when
targeting a +70 psec slack threshold from the beginning, ICC's runtime was
too long. We had better results increasing the slack incrementally and
doing multiple STA/layout iterations to bring new buffers into the design.
We obtained the following results with four iteration:
Iteration 1 (Using 0 psec hold margin)
Inserted 982 buffers in 1.2 hours in PT only
Setup slack: -290 psec
Hold slack: -10 psec (relative to the target 0 psec hold margin)
6 hold violations below the hold margin (down from 1,675)
Iteration 2 (Using +30 psec hold margin)
Inserted 1,762 buffers in 1.5 hours in PT only
Setup slack: -290 psec
Hold slack: -10 psec (relative to the +30 psec target hold margin)
105 hold violations below the hold margin (down from 2,867)
Iteration 3 (Using +50 psec hold margin)
Inserted 3,651 buffers in 2 hours in PT only
Setup slack: -280 psec
Hold slack: -10 psec (relative to the +50 psec target hold margin)
200 hold violations below the hold margin (down from 4,675)
Iteration 4 (Using +70 psec hold margin)
Inserted 6,511 buffers in 6.2 hours in PT only
Setup slack: -280 psec
Hold slack: -40 psec (relative to the +70 psec target hold margin)
3,100 hold violations below the hold margin (down from 9,134)
In the first run, without any additional margin on hold, PrimeTime confirmed
that setup remained at -280 psec with 0 hold violations. After implementing
the ECOs generated from DMSA in ICC and bringing the new layout files back
into PrimeTime, the setup was violating by -290 psec which was degraded by
10 ps with 6 remaining hold violations.
As we increased the hold slack threshold to +30 psec, +50 psec, and then
finally +70 psec, the setup violation remained within a 10 psec range of the
original setup slack. At the end of these runs, we had fixed the hold
violations up to a +30 psec value, with 3100 "violators" between +30 psec
and +70 psec hold slack. Setup slack stabilized back to its original value
of -280 psec at the end of the hold-fixing process.
We found that it is always better to fix majority of the hold violations
and then use DMSA after that, as the run time increases with the number of
hold violations. The reason for this is:
1. PT saves all the hold critical pins and reports timing to them.
2. PT first inserts the buffers at hold-critical paths, then analyses
the setup and hold violations at each of these buffers in an
iterative fashion.
3. The number of inserted buffers may be more or less than the number of
hold violations depending on the magnitude of the hold violation, the
delay of the provided buffers, and the amount of shared logic. While
PrimeTime DMSA hold fixing analyzes the setup and hold violations
separately, it reports the paths through all the inserted buffers in
each of these iterations. So as the number of violations increase,
the numbers of iterations will also increase and the number of paths
that must be reported in each of the iterations will also increase
causing increase in the runtime.
Limitations using DMSA
----------------------
1. The combined ICC/PT runtime increased as we kept more constraints in
specific data of create scenario command.
2. At the time we used DMSA, the current script was not selecting buffer
insertions which would benefit multiple hold-violating paths. We were
told that this was a bug in the older version, and more recent versions
of the script perform this optimization properly.
3. Even though DMSA run shows no change in setup slack, it may violate
once the placement and routing is done after hold fixing.
4. At this time, dmsa_fix_hold can only use buffer insertion for fixing.
It does not currently consider downsizing existing cells to improve
hold slack.
5. We found long runtime when targeting a slack threshold of +70 psec in
the first run, so we used multiple iterations. We are told that the
latest version of dmsa_fix_hold is significantly faster for large
numbers of slack violations. This could reduce the number of
iterations needed, and we'll check this out on our next design.
6. All paths in DMSA scripts should be absolute. DMSA run will hang if
relative paths are given in the script but are not resolvable through
the search path. The reason is that each of the scenarios is running
in its own scenario subdirectory.
7. At 65 nm and below, there is a huge difference in delays between slow
and fast corners. Because this difference increases further with the
size of the buffer, using bigger buffers in the buffer list can cause
the setup to violate even before the hold is fixed.
8. You have to use as many host machines as there are scenarios in your
design. Otherwise, your run will take a very long time due to
swapping the images in the remote machine.
9. Keep most of your constraints in common_data itself as it can decrease
the PT runtime.
PrimeTime DMSA was reasonably easy to use and the runtime it took was in
hours to fix hold violations without any setup time degradation.
This is better than our old semi-automated hold fix approach which we were
using in our previous projects. We're looking forward to trying out the
new version of dmsa_fix_hold on our next project.
- Sridevi Warrier
Analog Devices, Inc. Bangalore, India
Join
Index
Next->Item
|
|