( ESNUG 366 Item 2 ) --------------------------------------------- [02/23/01]
Subject: ( ESNUG 344 #5 ) A Second Time User Of PhysOpt Tells His Story
> I liked Bob Prevett's review in ESNUG 335 #1 of PhysOpt. I see he used it
> in a predominantly Avanti backend tool flow. I'm about to leave Matrox
> and join a new start-up, so I thought I'd send you a review of what it was
> like to use PhysOpt in a mostly Cadence backend tool flow. ...
>
> Next time, my plan would be to use PhysOpt in a completely hierarchical
> fashion. First synthesize and place, and then route all of my modules.
> I'll then assemble it into a full chip using a real top level router. ...
>
> - David Romanauskas, Design Engineer
> Matrox Montreal, Canada
From: David Romanauskas <dromanau@hyperchip.com>
Hi John,
I wrote about my experience with Physical Compiler (PhysOpt) just over a
year ago (2/23/00), and I thought that I'd let you know how things have
progressed since then.
I am now at Hyperchip in Montreal, and we used Physical Compiler on a design
we called "The Matrix", which is an intelligent switch and will be used in
our petabit router system (which, by the way, will be a kick-ass system!)
:-) Some key points of the design are:
- IBM ASIC flow with placement handoff
- 0.18um SA27E technology
- 16.6mm x 16.6mm die
- 155MHz clock
- 4.3M gates
- 3M bits of memory
Most of the aspects of Physical Compiler that are important to me have
remained the same since I am still seeing good results without having to
provide much placement guidance to the tool. No detailed floorplanning is
necessary, and I ran it on fairly large blocks (250k gates + RAMs).
The IBM floorplanner supported a hierarchical flow, so we took advantage of
that since the design consisted of 16 identical hierarchical blocks.
First, we ran Physical Compiler on one of the 16 blocks and closed timing on
it. To help with this, we made sure that the inputs and outputs of the
block were flopped. Also, to help with top level routability, we only
permitted the blocks to use 3 layers of metal. PhysOpt was placing the
input/output pin flops far from the pins within the modules, creating timing
and max cap problems when integrated at the top level. A TCL script was
used to place the flops at the pins to get around this. A net weight
attribute within PhysOpt probably could have helped with this.
Physical Compiler completed on the block in about 15-20hrs, and the block
met the target timing specs and was routable in 3 layers of metal.
After completing the placement of the block we ran the tool at the top
level. We generated both physical and timing models of the block -- the
physical LEF model with a script that converted the the IBM floorplan
netlist, and the timing model from PrimeTime (stamp model). During the top
level run, the blocks were considered as black-boxes and the rest of the
design between the blocks and the I/O's was optimized. Physical Compiler
handled the top level buffering quite well, even over a 16.6mm x 16.6mm die.
PhysOpt did have some difficulty with the logic near the main I/O's -- so we
grouped those with regions. PhysOpt still took the liberty of placing them
far outside the designated areas since it only considers regions as soft
boundaries, but it was much better than without the regions. Those areas
were meeting timing, but presented some trouble for clocking since they
spanned a much larger area than anticipated. According to Synopsys, this
should be resolved by using 'set_congestion_options -max_util', which I have
not yet tested.
In order to keep top-level run-time reasonable, 'create_placement' and then
'physopt -inc' was run, which was all that was really necessary in order to
meet the requirements of this design. PhysOpt took about 24 hours to run
at the top-level.
In order to save time when running on a new netlist, both the block-level
and top-level placement/physopt runs were executed at the same time. The
top-level placement used the same timing model as previous runs since
PhysOpt was consistant in meeting timing requirements for the block.
Running these in parallel resulted in a 24 hour turnaround time from new
netlist to a full chip placement that was meeting timing.
We used an IBM ASIC flow with placement handoff, and were able to meet all
the IBM requirements for release to layout (RTL). RTL is the last
checkpoint before sending the design to IBM for all the "fun" chip
finishing stuff including scan chain optimization, clock tree insertion,
detailed routing, etc. There was some optimization that was required at
IBM to meet timing and max-transition violations, but all that was
manageable.
Overall, we were able to use PhysOpt to quickly turn a new netlist into a
placement that would meet the IBM requirements. In the beginning, it did
take several iterations of finding the best method and constraints in which
to run PhysOpt, but once that was settled we were able to reliably place
the design in a short amount of time.
- David Romanauskas
Hyperchip Inc. Montreal, Quebec
|
|