Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS

( ESNUG 331 Item 1 ) --------------------------------------------- [10/7/99]

From: "Sam Appleton" <sama@groovy.mti.sgi.com>
Subject: One User's Review Of The Synopsys FlexRoute ("Everest TLR") Tool

Dear John,

I know you like user reviews of tools.  But, before I go into experiences
using Everest TLR / Synopsys FlexRoute, I've got to explain the physical
landscape of the chip it was used on.  Our chip, Krypton, is 3 million
gates, 133 Mhz, and it's an in-socket replacement for an older chip.  Our
design goal was to add more functionality, 33 percent higher speed and,
therefore, significantly improve our transistor density by at least 20
percent -- hence our focus on the newer, supposedly better P&R tools.

Krypton had 23 major sub-blocks (with 7,000 nets between them), two top
level 256-bit global buses and four global 128-bit buses.

The total "RAM area" was perhaps 10-20% of the chip (although I can't
measure precisely by just eyeballing a plot) with 47 RAM/RF instances.
The bulk of the RF's were Artisan 32x32's and 32x64's.

Krypton had 751,830 placeable instances to place-and-route.

The Five Big Flat Layout "Hells"
--------------------------------

I know it's very common for some companies to do layout as a process on one
big flat design.  We considered flat, but these five "hells" came up:

  o big flat designs are run-time hell

    We had five copies of Cadence Silicon Ensemble (rev 5.1).  Placing and
    routing a 3 Mgate ASIC would have a huge runtime (guessing 3-4 days or
    more), making iterations very difficult and time consuming.  Block
    logic grouping and global wiring patterns becomes intractable.  Timing
    convergence and signal integrity goals becomes a complete mess.

  o big flat designs are extraction hell

    Once a final layout GDS/DEF is obtained, the design must be extracted
    and passed through static timing to verify timing behavior.  Based on
    our experience with Avanti Star-RC rev 98.5, extraction of our 3 Mgate
    ASIC flat was very difficult and it typically took 1-2 weeks just to
    obtain good a SPF.  In addition, the 2 giga byte size limit imposed by
    Solaris 2.5 prohibits the filesizes required for a 3 Mgate SPF, unless
    the design is extracted hierachially.

  o big flat designs are back-annotation hell

    Once our full flat chip SPF has been generated, it must be accurately
    converted to SDF.  Even if a full flat SPF can be generated, generating
    an SDF from this file was impossible with Avanti Star-DC rev 98.5, so
    we used Ultima Millennium rev 1.8 from <hhtp://www.ultimatech.com> .
    We liked Ultima.  We could extract all the blocks in our chip on an
    individual basis and then Ultima would merge all these SPFs into one
    big SDF.  We could then load that into Primetime and have a ball doing
    timing analysis.  Good stuff.

  o big flat designs are clock tree hell

    Inserting a clock into a flat design that is 11 x 11 mm is a bad joke
    from almost every perspective.  In a big flat design, our signal wires
    in initial levels of the tree were very long, making predictability of
    delay in the first stages of the clock very hard.  Too many crosstalk
    problems, too.  The tool we had to use, Cadence's CTgen rev. 3.3,
    ability to analyze and insert a clock with a network of 500K flip-flops
    over such an area sucked.  Overall, CTgen sucked.  To insert clocks, I
    had to write my own tool with a few thousand lines of Perl to do the
    final clock insertion.

  o big flat designs are timing closure hell

    John, as your readers know, Design Compiler *estimates* wireloads.
    Once our flat design was routed, fixing the small errors due to the
    differences between *estimated* wireloads and the real design meant
    either IPO netlist iterations or our whole design had to be redone
    starting from the floorplan.  Either way, our full design must still be
    re-extracted & re-annotated -- making a very long timing closure loop.

In practical terms, with engineers here running around tweaking and pumping
netlists out of Design Compiler every day, some way to compartmentalize
their work is a MUST.

So, obviously John, we chose the hierarchical approach.

Choosing Routing Tools
----------------------

As we began developing our CAD flow, we looked at the grid-based routers.
We eliminated Avanti for this review because we've had such spotty support
from them in the past and recently.  This left only Cadence WarpRoute and
Compass PathFinder to review.  We found:

  o no facilities for manual editing and route control
  o they're are not tuned to block-based designs
  o major buses cannot be controlled or planned
  o special nets like clocks and noise-sensitive signals can not be
    adequately controlled 
  o variable width and spacing on nets is very limited

For top-level block-based routers, we evaluated Everest TLR (which is now
Synopsys FlexRoute) and Cadence IC Craftsman.  After using IC Craftsman
for a couple of weeks, we very quickly gave up once we saw Everest.  IC
Craftsman was a pain.  Hard to get data in and out, hard to control busses
(or anything for that matter), and it was slow.  Everest had an easier GUI,
it worked faster, and it did all that stuff I listed above plus:

  o block pin optimization and control
  o controllable bus routing
  o top-level floorplanning 
  o shielding and length balancing
  o easy net/bus prerouting

So we chose to use Everest TLR / Synopsys FlexRoute.

Our Experiences With FlexRoute
------------------------------

First off, the ASCII text format of FlexRoute's database made it very easy
to interface to other place-and-route tools, as well as do design revision
control with.  No cryptic binaries or proprietary formats to "manage".

Second, FlexRoute's Perl interface also allowed us to easily automate many
repetitive functions as well as add custom functionality that was unique
to our environment.  FlexRoute's interface itself provides full access to
the tools' database for manipulation -- a very handy feature for those
wishing to customize the tool for their particular flow.

On to the nitty-gritty...

Since FlexRoute is for hierachial chip assembly, it fit into our overall
hierarchy-preferred approach quite well.  We split our design along sections
of the logical hierachy, using 23 top-level blocks, each of which was a
place-and-route "unit".  Some of these blocks also had sub-units (either
custom logic or other place-and-route units), giving a 3-level hierachy in
parts of the design.  This enabled an incredible acceleration of our design
cycle:

  o smaller place-and-route units

    Although there were many blocks, the layout generation time for each
    was very small, ranging from 1 to 8 hours.  Using our five Silicon
    Ensemble licenses enabled many blocks to be turned around in parallel
    (impossible with a flat approach).  Design management for these blocks
    was much easier.  Clock Insertion (using our homebrew tool) on these
    smaller blocks gave excellent skew performance and a relatively shallow
    tree.  It also allowed better control of crosstalk issues on these
    critical signals.

  o hierachial extraction got easy

    We set up our extraction tool, Avanti Star-RC, to operate on the top
    level layout only to the interface "ports" of each block.  This
    massively improved top-level extraction time as well as making basic
    chip-level LVS faster.  Extraction time went from 10 days (because
    Star-RC kept crashing) to about 4 hours using it hierarchically.

    Our delay calculator, Ultima Millennium, would merge hierachial
    extraction results and give a full-chip SDF for back-annotation.
    Changing one part of the design required only that part of the design
    be re-extracted, rather than the full-chip (a much faster and much
    less difficult task).

  o hierachial verification got easy, too

    Each block was also run through Cadence Dracula3 DRC verification 
    (rev 4.6) in parallel with layout extraction with Avanti Star-RC.  (The
    project got very CPU hungry then!)  We able to verify each block as
    clean, with the top-level layout clean, requiring verification of only
    DRC issues pertaining to interactions between top-level and block-level
    layouts.  This allowed us to tapeout after only one DRC check on the
    completed GDS -- it had no errors after layout generation, requiring no
    time-intensive iterations for DRC fixes.  To verify the final layout,
    we used no hierachy and ran full-chip LVS (Avanti Hercules) to ensure
    no interface issues remained from assembly.

  o transistor density increased 25%

    Our transistor density increased 25% with FlexRoute because of lack of
    significant channels and the clean, aligned routing that FlexRoute
    performed.  Nice.  We iterated hundreds of times on the top-level
    floorplan with pin changes and block size modifications, for
    optimization of both  top-level and block-level routing, as well as
    overall timing results.

    FlexRoute pin optimization was used numerous times to optimize the
    top-level routing, resulting in a route with almost no channels and no
    "switchboxes" (created when pins are not aligned and routes need to
    be hooked up).  Pin optimization combined with ultra-fast global
    routing (approximately 30 seconds on our 23 block/7000+ net design)
    allowed us to very quickly evaluate the chip floorplan for routability
    and channel density issues before re-routing affected blocks.

    I guesstimate a 10-20% greater density could have been achieved had
    we pushed the floorplan harder.

Make, Timelines and Tapeout
---------------------------

Using this hierachial design approach let us iterate the entire chips'
layout in just over a day.  Our steps were:

  1.) designer creates Verilog
  2.) synthesize with Design Compiler
  3.) P&R with Silicon Ensemble
  4.) layout extraction with Avanti Star-RC
  5.) delay calculation using Ultima
  6.) load it into PrimeTime
  7.) analyze critical paths, etc. back to step 1 or 2 (depending)

Steps 3 through 6 were done automatically with a make file.  Didn't have
to touch nor hand hold anything.  This took typically 2 and 16 hours to
do depending on the block size with the largest being 120 Kgates.

Going steps 1 through 7, we could turn around the entire chip from new block
netlists to a top-level timing report in around 1.5 days.

For FlexRoute itself, our 3 Mgate ASIC with 23 top-level blocks and over
7,000 nets had routing runtimes of 30 minutes (for preliminary top-level
routes during pre-tapeout timing checks) and 60 minutes (for full top-level
routing with all preroutes and design data for tapeout-quality layout,
including tuned clocks.)  By comparision, our old Compass tools used to
take 24 hours just to route a smaller sized design!

This massive improvement allowed us to iterate many more times for optimal
timing and area utilization than we were able to do before.

We iterated on the top-level clock network approximately 50 times for
the optimal skew characteristics.  These changes were made in the FlexRoute
database (instead of in a layout database), -- so that new changes to the
netlist for timing and functionality did not disrupt the ongoing effort in
clock tuning the top-level network.

    - Sam Appleton
      SGI                                    Moutain View, CA

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)