Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS

( ESNUG 480 Item 9 ) -------------------------------------------- [03/05/09]

From: Sridevi Warrier <sridevi.warrier=user domain=analog hot mom>
Subject: Engineer finds PrimeTime dmsa_fix_hold works but with caveats

Hi John,

We recently taped out a 65 nm, 5M instance chip with 27 mm2 area.  Its main
clock 450 MHz.  I wanted to share our hold fixing story using the PrimeTime
Distributed Multi-Scenario Analysis (DMSA) on our main CORE block and pass
along a few recommendations to other users.

Background
----------

CORE is our timing-critical block.  It's hierarchical & contains sub-blocks
MUL and ALU.  Place and route for all the three blocks was in IC Compiler.

While we finished P&R without any hold violations, but our design practice
required that we build in +70 psec of extra hold margin.  So we needed to
further reduce hold times, but couldn't afford to trash the critical setup
timing we had achieved in IC Compiler.

That's when we thought of trying out PrimeTime's DMSA for the CORE block to
reduce the hold time without compromising our setup times.

Our Approach
------------

The flow we used is as follows:

 1. Report the setup and hold timing and check for hold violations.

 2. Run dmsa_fix_hold utility to fix hold and generate an ECO command file.

 3. Post-process the ECO file.  This had to be done since PT is run flat
    and we had to generate ECO files for the three layout sub-blocks.

 4. Perform the ECOs in ICC, then re-run the DMSA script to analyze setup
    and hold again in PT.  We ran through this flow (steps 1-4) four times
    which resulted in a design with all hold violations fixed without any
    degradation in setup times; the benefit was this didn't require any
    manual fixes on our part and the setup and hold fixes were taken care
    of across multiple corners and modes.

Results and Observations
------------------------

Before hold fixing, the design had 280 psec negative setup slack and
140 psec negative hold slack, including 1675 hold violations with 0 hold
margin.  While the negative setup slack was acceptable, our design flow
requires us to ensure hold slack margins of at least +70 psec.

We ran dmsa_fix_hold in PrimeTime multi scenario mode.  We found that when
targeting a +70 psec slack threshold from the beginning, ICC's runtime was
too long.  We had better results increasing the slack incrementally and
doing multiple STA/layout iterations to bring new buffers into the design.

We obtained the following results with four iteration:

  Iteration 1 (Using 0 psec hold margin)
     Inserted 982 buffers in 1.2 hours in PT only
     Setup slack: -290 psec
     Hold slack: -10 psec (relative to the target 0 psec hold margin)
     6 hold violations below the hold margin (down from 1,675)

  Iteration 2 (Using +30 psec hold margin)
     Inserted 1,762 buffers in 1.5 hours in PT only
     Setup slack: -290 psec
     Hold slack: -10 psec (relative to the +30 psec target hold margin)
     105 hold violations below the hold margin (down from 2,867)

  Iteration 3 (Using +50 psec hold margin)
     Inserted 3,651 buffers in 2 hours in PT only
     Setup slack: -280 psec
     Hold slack: -10 psec (relative to the +50 psec target hold margin)
     200 hold violations below the hold margin (down from 4,675)

  Iteration 4 (Using +70 psec hold margin)
     Inserted 6,511 buffers in 6.2 hours in PT only
     Setup slack: -280 psec
     Hold slack: -40 psec (relative to the +70 psec target hold margin)
     3,100 hold violations below the hold margin (down from 9,134)

In the first run, without any additional margin on hold, PrimeTime confirmed
that setup remained at -280 psec with 0 hold violations.  After implementing
the ECOs generated from DMSA in ICC and bringing the new layout files back
into PrimeTime, the setup was violating by -290 psec which was degraded by
10 ps with 6 remaining hold violations.

As we increased the hold slack threshold to +30 psec, +50 psec, and then
finally +70 psec, the setup violation remained within a 10 psec range of the
original setup slack.  At the end of these runs, we had fixed the hold
violations up to a +30 psec value, with 3100 "violators" between +30 psec
and +70 psec hold slack.  Setup slack stabilized back to its original value
of -280 psec at the end of the hold-fixing process.

We found that it is always better to fix majority of the hold violations
and then use DMSA after that, as the run time increases with the number of
hold violations.  The reason for this is:

  1. PT saves all the hold critical pins and reports timing to them.

  2. PT first inserts the buffers at hold-critical paths, then analyses
     the setup and hold violations at each of these buffers in an
     iterative fashion.

  3. The number of inserted buffers may be more or less than the number of
     hold violations depending on the magnitude of the hold violation, the
     delay of the provided buffers, and the amount of shared logic.  While
     PrimeTime DMSA hold fixing analyzes the setup and hold violations
     separately, it reports the paths through all the inserted buffers in
     each of these iterations.  So as the number of violations increase,
     the numbers of iterations will also increase and the number of paths
     that must be reported in each of the iterations will also increase
     causing increase in the runtime.

Limitations using DMSA
----------------------

  1. The combined ICC/PT runtime increased as we kept more constraints in
     specific data of create scenario command.

  2. At the time we used DMSA, the current script was not selecting buffer
     insertions which would benefit multiple hold-violating paths.  We were
     told that this was a bug in the older version, and more recent versions
     of the script perform this optimization properly.

  3. Even though DMSA run shows no change in setup slack, it may violate
     once the placement and routing is done after hold fixing.

  4. At this time, dmsa_fix_hold can only use buffer insertion for fixing.
     It does not currently consider downsizing existing cells to improve
     hold slack.

  5. We found long runtime when targeting a slack threshold of +70 psec in
     the first run, so we used multiple iterations.  We are told that the
     latest version of dmsa_fix_hold is significantly faster for large
     numbers of slack violations.  This could reduce the number of
     iterations needed, and we'll check this out on our next design.

  6. All paths in DMSA scripts should be absolute.  DMSA run will hang if
     relative paths are given in the script but are not resolvable through
     the  search path. The reason is that each of the scenarios is running
     in its own scenario subdirectory.

  7. At 65 nm and below, there is a huge difference in delays between slow
     and fast corners. Because this difference increases further with the
     size of the buffer, using bigger buffers in the buffer list can cause
     the setup to violate even before the hold is fixed.

  8. You have to use as many host machines as there are scenarios in your
     design.  Otherwise, your run will take a very long time due to
     swapping the images in the remote machine.

  9. Keep most of your constraints in common_data itself as it can decrease
     the PT runtime.

PrimeTime DMSA was reasonably easy to use and the runtime it took was in
hours to fix hold violations without any setup time degradation. 

This is better than our old semi-automated hold fix approach which we were
using in our previous projects.  We're looking forward to trying out the
new version of dmsa_fix_hold on our next project.

    - Sridevi Warrier
      Analog Devices, Inc.                       Bangalore, India

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)