( ESNUG 362 Item 1 ) --------------------------------------------- [11/30/00]
From: Vijay Angarai <angarai@lsil.com>
Subject: How I Almost Lost My Job At LSI Logic Doing Two PhysOpt Tapeouts
Hi John,
I almost lost my job because we used PhysOpt.
I'm a design engineer in the DSP division of LSI Logic here in Dallas. We
were working on two ZSP chip cores. (I can't tell you their project names,
so I'll just call them "Chip #1" and "Chip #2" for the purposes of this
letter.) The ZSP chip cores are licensable for our customers and they're
superscalar RISC processors designed to give 800 MIPS at 200 Mhz. My job
was to tape them out in LSI's low power 0.18 (G12L) and high performance
(G12P) libraries. My goals were:
Chip #1 Chip #2
-------------- -------------------
Goal low power high performance
Lib LSI G12L LSI G12P
Clock 10 nsec (gated) 6.25 nsec (non-gated)
Metal 4 layer 5 layer
Size under 4 mm2 under 6 mm2
Mem 64 K instruction 64 K instruction
64 K data 64 K data
I started out using a standard tape-out flow. This is when my life started
to go downhill. Here was my flow:
RTL -> DC -> Gate Netlist -> "compile -scan" -> pre-layout netlist
It was nothing fancy. For layout, I did:
Prelayout netlist -> Avanti Placement -> Physical resynthesis (Saturn
and LSI upsize/downsize internal tool) -> Apollo CTS -> Apollo Route
-> RC Extraction -> PrimeTime
Although we called it our "Days of Hell" tape-out, this miserable process
took us 3 months to do. It was like being part of a secret CIA experiment
to test the limits of human sanity. I would run a chip through Design
Compiler and using its conservative WLMs, DC would tell me it made timing.
Then, in the Avanti placement, we'd be 2 to 3 nsec off on timing. After
Avanti Saturn resynth and LSI resizing, the best we could get was 1 nsec
off spec. Go back to try to fix things. We looped like this forever. Our
fundamental problem was that we were working with three different timing
engines that didn't agree with each other: the Synopsys DC engine, the
Avanti Saturn enginine, and our internal LSI engine. Finally, after 3
months of this we just decided to stop. Here were our results:
Chip #1 Chip #2
-------------- -------------------
# of Cells 79,000 instances 80,000 instances
Size 4.2 mm2 6.0 mm2
Timing 11.0 nsec (1 nsec off) 7.0 nsec (0.75 nsec off)
My Brush With Unemployment
--------------------------
Since we were so far off after 3 months of timing Hell, we decided to give
Synopsys a call to see if their PhysOpt tool could save us. Since Chip #2
was the most recent chip I was running through in the old flow, I decided to
run Chip #2 in the new PhysOpt flow for our eval.
That was my mistake.
Our new flow changed only slightly. We just swapped PhysOpt for the Avanti
placement and Avanti Saturn steps in our old flow.
Prelayout netlist -> PhysOpt -> Netlist + PDEF -> Load into Apollo ->
-> Apollo CTS -> Apollo Route -> RC Extraction -> Primetime
We used a Perl script given to us by Synopsys to convert PhysOpt's PDEF
output into an Avanti SCHEME file. After we loaded the PhysOpt placement
into Apollo, we used an internal LSI tool to do synthesis an ramptime fixes
on the high fanout nets like "Reset" in our designs. PhysOpt doesn't know
how to build a balanced tree for such nets, so we had to fix them ourselves.
It's important to note that in PhysOpt these nets like "reset" must be set
off with set_ideal_net and set_false_path being "true". PhysOpt also can't
do Clock Tree Synthesis yet even though the higher ups in Synopsys give me
great assurances that it will soon. We used Apollo CTS.
The Chip #2 PhysOpt run only took 6 hours. It met our 6.25 nsec spec and it
correlated well with the other timing engines after we did the final Apollo
layout. Due to PhysOpt's good flip-flop clustering, we quickly got a very
low clock skew (~70 psec) in the Apollo CTS run. This was a dream compared
to our old flow (~150 psec) which required 2 or 3 rounds of buffer resizing
and hand tweaks to many of the buffers. The 70 psec skew gave us no hold
violations to fix in the Chip #2 design. That old 150 psec skew gave me
scads of hold violations that I had to chase down and fix!
Everything looked great. I recommended we buy PhysOpt.
Things started to fall apart when we began running Chip #1 through our newly
purchased PhysOpt flow. The first problem we ran into was that it took
PhysOpt 7 days to complete its run. (This is where I learned the difference
between Synopsys customer support and Avanti customer support. We were
early adopters of Avanti's Saturn tool at LSI. Saturn had a lot of crashes
and segmentation faults. Avanti didn't help us that much and I had the
impression that they didn't know what was really wrong with Saturn. It was
very unpleasant for us to do critical resynthesis with Saturn if it had
problems. In contrast, when PhysOpt gave us trouble, Synopsys planted a
full time AE on site with us as well as dedicating AEs to us at their own
site.) Anyway, despite the great customer support, it took us 7 days to do
that Chip #1 PhysOpt run and an additional 7 days for Synopsys to duplicate
this problem at their own site.
I had just recommended PhysOpt! I was afraid of losing my job. I wasn't
the sole eval engineer here at LSI, so my career was temporily safe, but I
was on the hook to make PhysOpt work.
The first thing I noticed from the 7 day PhysOpt run was that Chip #1's area
had gone down 18 percent(!), so I tried setting max_area to a reasonable
value. PhysOpt seemed to ignore this and try to do its best.
My old "Days of Hell" memories came back when we discovered that Chip #1's
PhysOpt placement was not routeable because of the extra buffers inserted
by Apollo to achieve gated clock tree balancing. (Chip #1 had gated clocks;
Chip #2 didn't. Every 8 flops in Chip #1 had a clock gate to conserve
power. There were over 1,000 clock gating cells. Apollo CTS added more
than 1,000 buffers. Congestion went haywire because of the extra nets
with these buffers.)
We then resorted to using the netlist generated by PhysOpt but discarded the
PhysOpt placement. We used the Avanti placer and CTS engine to finally
close the design. Because PhysOpt generated a better physically aware
synthesized netlist than Design Compiler, we were still able to achieve our
target 10 nsec speed using the Avanti placer.
We later noticed that Chip #1 had only 4 metal layers and a low performance
lib with smaller, weaker cells (G12L). Chip #2 had 5 metal layers and
strong cells with G12P. The metal pitch was the same in both libs so Chip
#1's low power lib (G12L) had less available metal tracks to use, too. Also
Chip #1 had only one horizontal layer available (metal 3) because metal 1
was used up by the low power G12L lib. With Chip #2's G12P, we had metal 3
and metal 5 for horizontal routing.
Found The Way Out
-----------------
All these problems made Chip #1 very sensitive to congestion problems. We
were relieved to find this congestion can be alleviated by reading in the
netlist and PDEF from Avanti after CTS into PhysOpt and doing an overnight
incremental PhysOpt run to fix congestion problems. We didn't know about
this incremental PhysOpt run fix to solve that problem at the time so we
taped out Chip #1 not using the PhysOpt placement data. On a post-tape-out
rerun of the flow, the 7 day PhysOpt run dropped to 12 hours. The final
PhysOpt results (compared to the old flow were):
Chip #1 Chip #1
Old DC Flow New PhysOpt Flow
-------------- -------------------
# of Cells 79,000 instances 67,000 instances (15% reduction)
Size 4.2 mm2 3.9 mm2 (7% reduction)
Timing 11.0 nsec (1 nsec off) 9.36 nsec (0.64 slack)
Chip #2 Chip #2
Old DC Flow New PhysOpt Flow
-------------- -------------------
# of Cells 80,000 instances 80,000 instances
Size 6.0 mm2 6.0 mm2
Timing 7.0 nsec (0.75 nsec off) 6.25 nsec (on spec)
I've also been told by Synopsys that there was a bug in PhysOpt v 1.21 that
may have caused my 7 day runtime problem. They say it's been fixed in the
new PhysOpt 2.0 coming out next week.
In the end, we believe that clock tree synthesis capability is a must in
PhysOpt to avoid the late surprises as we experienced. I've had high
assurances from Synopsys that its coming. We have other chips coming up and
we don't want CTS biting us like it did. Now, though, because we have this
Avanti CTS / PhysOpt incremental workaround, we can get these chips done
either way.
- Vijay Angarai
LSI Logic, DSP division Dallas, TX
|
|