Fusion Compiler benchmark

( ESNUG 587 Item 1 ) ---------------------------------------------- [05/24/19]

Subject: User benchmarks DC-ICC2 vs Fusion Compiler vs Genus-Innovus flows


AN RTL SYNTHESIS TIPPING POINT?: Roughly 17 years ago, a new business
book came out, "The Tipping Point", that emphasized how little things

now can have a large impact on future events.  It was sort of like the
old Sci-Fi "The Butterfly Effect" story; but for business. ...

What makes this interesting is since Innovus has been eating ICC2's lunch
in PnR over the past few years -- we might now be seeing a tipping point
happening in the RTL synthesis market, too.  From the many user comments
below, it appears that Cadence Genus RTL, when paired with Innvous PnR,
is now becoming a credible threat to Aart's Design Compiler monopoly.

    - from http://www.deepchip.com/items/dac17-04.html


From: [ Dr. Pepper ]

Hi, John,

For reasons of security and privacy please keep me anonymous.

In your DAC'17 #4 post you outlined what I and many others think is the key
issue with Synopsys digital implementation tools compared to what Cadence
R&D have been able to achieve. 

We have used Synopsys for a long time.  In the last 4 years we struggled
greatly to reach our project goals.  Our flow was:

   DC -> SNPS DFT insertion -> ICC2 -> PT/STAR-RC -> then Calibre/ICV

In the last 2 years my guys have attended many SNUG's, CDNlive's, and DAC's,
etc. in hopes a solution to our issues.  Then 18 months ago we decided to
give Innovus a shot to see if it could meet PPA on short schedules, then
slowly try other Cadence tools in our flow to see if there was any benefit.

Other than using Calibre as our golden DRC/LVS signoff, my group doesn't
like to do piecemeal best-in-class point tools because of the inherent
support a mixed flow brings.

For example, if we saw a mismatch between what Synopsys DC-Topo says and
Cadence Tempus STA, the Synopsys guys will say "it's a Cadence Tempus
problem" while the Cadence guys will say "it's a Synopsys DC problem" and
we're helplessly stuck in between them.  Our only exception to that rule is
using Calibre for golden sign-off.  If Calibre finds a problem, it must be
fixed upstream from Calibre.  For us, what Calibre says is gospel,
everything else is suspect.

We do a lot of different types of chips recently in TSMC 16FF, but our bread
and butter now are 3.2 Ghz multi-ARM core compute chips in TSMC 7FF.  Many
of my colleagues told us we were wasting time doing evaluations and to just
go with a full Cadence flow based on their experiences -- but our mgmt will
not let us make such drastic change without us doing a full eval ourselves.
Too much is at stake.  We were choosing tools for the next 3-4 years; not
just trying to solve short-term problems.

In our eval we took 5 high-performance blocks from Verilog RTL and inserted
the test compression/decompression logic in the RTL to create our 3.0 M inst
start block we named "Mongo".  Since we most often cannot partition blocks
(because it's too messy and impractical) we needed a big 3.0 M inst "Mongo"
design to find who has the best single big block PnR tool.

Then we compared power, performance, area, (PPA) and runtime through 5 flows
using the TSMC CLN7FF library:

 Tools                                Flow Name         Comment
 ----------------------------------   ---------------   ------------------
    DC-> test ins -> ICC2 -> PT       SNPS-All          our old SNPS flow
     SNPS Fusion Compiler -> PT       SNPS-New          new SNPS flow
 DC-> test ins -> Innovus -> PT       Innovus-PT        old Innovus flow
 DC-> test ins -> Innovus -> Tempus   Innovus-Tempus    old Innovus+Tempus
 Genus-> Modus -> Innovus -> Tempus   CDNS-All          CDNS only flow

Our goal was to have "Mongo" with 3.0 M inst, a number of ARM cores and hard
macros, and some very tight power requirements reach 3.2 GHz (or better) in
TSMC CLN7FF.






Flows


Best Frequency
Achieved


TNS left
on table


Total Power


  TAT  





SNPS-All


2.87 Ghz


97 nsec


1,838 mW


14.7
days





SNPS-New


2.67 Ghz


165 nsec


1,923 mW


12.4
days





Innovus-PT


3.06 Ghz


44 nsec


1,720 mW


11.7
days





Innovus-Tempus


3.12 Ghz


24 nsec


1,667 mW


9.8
days





CDNS-All


3.22 Ghz


0 nsec


1,586 mW


8.2
days





What we found is the "CDNS-All" flow consistently gave us better PPA in the
shortest runtime of any of the flows. 


Here's my notes on each stage of the eval we did:

SNPS-ALL: DC Topo -> test insertion -> ICC2 -> PT

This was our 4 year old baseline SNPS only flow.  Notice it's the 2nd worst
numbers overall.  In 14.8 days it got 2.87 Ghz and 1,838 mW.  We haven't
touched our DC flows for years as SNPS has always treated that area as their
personal franchise.

Our key issue with DC has been runtime.  They refuse to fix it claiming
something called "Descartes" is coming and it will apparently solve it. 

Our key issue with ICC2 has been PPA.  It just keeps churning and it shows
in its runtime.  We have to call the optimization multiple times to hit a
QOR that's at least reasonable for our frontend designers to iterate over.

Their Primetime flow is pretty much standard, but its ECO is iterative,
because PT-ECO is not physically savvy.  In one case it even inserted
buffers outside our block's boundary!!!  Our scripts can do much better. 


SNPS-New: Fusion Compiler -> PT

Synopsys knew we were unhappy with the PPA of their flow and told us to hold
tight because their next new thing would solve all our problems.  This is
where it got interesting.  We had been talking to SNPS about their synthesis
runtimes and the ICC2 QOR issues we were facing.  We had a meeting with SNPS
to discuss this and there were TWO teams from SNPS.

One SNPS team was handling the old DC, and another SNPS team handling their
new "Descartes" synthesis tool (also called DC2).  At one stage in the
meeting the two teams broke into a mini argument with each other about whose
tool was better! 

In the end they recommended "Descartes" + ICC2, now known as Fusion Compiler
but said only they would run Fusion Compiler themselves in taxicab mode and
that we cannot have access to the new tool until the evaluation concludes.
We agreed, but required that our engineers would run all the final timing
and power sign off checks on their final results.

In 12.4 days Fusion Compiler got 2.67 Ghz and 1,923 mW; which was 1st place
for worst numbers overall except for an improved runtime.


Innovus-PT: DC Topo -> test insertion -> Innovus -> PT

This was careful surgery in our SNPS flow to just replace Synopsys ICC2 with
Cadence Innovus in the PnR portion.  We thought it would be most difficult.
But it was surprisingly easy.  I guess a lot of people in the industry have
helped mature Innovus to be able to plug it in almost painlessly now. 

We were surprised by the placement quality and QOR we were getting at all
stages of PnR.  After some some close work with the Innovus support team,
we implemented our complex clocks during the early stages of our placement
step.  That was the most impressive. 

Its routing worked fine, with good results.  And it was refreshing to run
Innovus ourselves instead of having to trust the taxicab mode numbers like
we had to with Fusion Compiler.

In 11.7 days Innovus-Primetime got 3.06 Ghz and 1,720 mW; and was 3rd place
for worst numbers overall.  Still not making the 3.2 Ghz goal!

(This is without any fancy stuff like distributed.  Just plain old fashioned
timing and power convergence.  We didn't know enough Innovus on how to turn
on the fancy stuff yet.)


Innovus-Tempus: DC Topo -> test insertion -> Innovus -> Tempus

After TSMC certified Tempus at 7nm for final signoff, our CAD guys started
to look at it seriously.  They confirmed for themselves that Tempus was just
as accurate (or more so) than Primetime -- plus Tempus timing was within 3%
of Spectre SPICE runs on the same nets.

Since we already had the Innovus cockpit installed, it was trivial to enable
Tempus as our sign off environment.  No files to be handed off etc.

And, no surprises, Tempus gave us timing numbers that aligned well with what
the internal Innovus timing engine was telling us.

Cadence promotes "common engines" throughout their digital flow.  Running
Tempus ECO with Innovus gives us a clearer idea of the brilliance in this.
We could start the Tempus ECO phase from the Innovus prompt.  It reduced
our power, area and fixed timing in PBA mode.  And when it was done, it
was DRC clean. 

There is huge advantage to common engines and they have been talking about
it for years.  CDNS gets it.  I think finally SNPS is getting it. 

In 9.8 days Innovus-Tempus got 3.12 Ghz and 1,667 mW; and was in 2nd place
for *best* numbers overall.  It's just 0.08 Ghz shy of the 3.2 Ghz goal!


CDNS-All: Genus -> Modus -> Innovus -> Tempus

For test, we just used DFT Compiler and TestMAX because it was part of our
origanal all SNPS flow.  Out test nerds prefer Mentor Tessent, but we were
lazy and didn't involve them in this eval.

CDNS suggested to use their Modus for test insertion.  Since our compressor
and decompressor was in RTL, just using Modus to stitch the chains was easy;
we didn't have to do a lot of evaluation.  It passed our sanity checks with
our DFT Team.

The bigger change was in swapping DC-Topo with Genus for RTL synthesis.
Going to Genus gave us speed and ability to do what we call true physical
synthesis.  In Genus+Innovus they do placement very early and then the RTL
synthesis decisions are made on that placement.  Our initial assessment
showed the combo did a lot better job on our datapaths and MUXes; which is
key to power improvement.  Impressive.

The final Genus to Innovus flow works great.  Again common engines makes a
big difference and both tools have matured these for last 3-4 years since
Cadence started talking about it.  With it, we could do early clock
implementations in our early synthesis stages so the tools could optimize
the impact of expected skews.

We also switched from Star-RC over to Quantus (QRC) for extraction here.
The results are pretty much the same, but we've found Tempus + Quantas
runs 1.5x to 2.0x faster than Primetime + Star-RC.

In 8.2 days All-CDNS got 3.22 Ghz and 1,586 mW and was the only flow that
had 0 TNS plus it beat the 3.2 Ghz goal by 0.02 Ghz!  It was 1st place.

        ----    ----    ----    ----    ----    ----    ----

Warnings & Caveats

  1. Even though we've switched over to an All-CDNS flow, we are still
     diehard Mentor Calibre users for our DRC/LVS work.  As I said
     earlier: If Calibre finds a problem, it must be fixed upstream
     from Calibre.  For us, what Calibre says is gospel, everything
     else is suspect.  We sometimes use ICV for quick checks because
     we have the licenses.

     We're not using Pegasus because we don't think it's ready yet,
     plus it's very difficult for our backend guys to trust anything
     other than Calibre. 

  2. This was a PPA eval.  We did not do any noise nor IR-drop runs in
     it because we didn't have the time.  Our hunch is since Voltus
     is in the common engine philosphy that CDNS R&D supports, it'll
     fare better than a SNPS Fusion Compiler / ANSS Redhawk flow which
     has different engines / different db's issues -- but that's just
     a hunch.  We don't know for sure.

  3. Collectively my company has at least 500 man-years invested in
     developing Synopsys DC, PrimeTime, ICC, ICC2 Tcl scripts.  It's
     going to be very painful to have to redo all this prior script
     work just to have Cadence scripts.

  4. As much as we like the All-CDNS flow, it's not documented 100%.
     Some of the PPA results we got came from tool switches that the
     Cadence AE's and R&D told us about --  which we didn't find much
     documentation on.

        ----    ----    ----    ----    ----    ----    ----

Summary

We have completed our production migration over to an "All-CDNS" digital
flow for our next 7nm project.  So far the deployment is going well, with
the extended team able to pick up the new tools rapidly.

The value of Fusion Compiler is still unclear:

  - It requires a completely new UI.  If I am translating all my existing
    SNPS Tcl scripts, might as well move them to all CDNS flow.
 
  - They have 3 synthesis engines now. Old DC. Faster DC. And Descartes.

  - How can a weak backend (ICC2) cover for whatever happens in synthesis?
 
  - Though FC synthesis runtimes are faster, apparently they are skipping
    optimization and losing QOR.
 
One of our engineers started referring to it as "Confusion Compiler" during
the eval.  After their ICC2 disaster, why is Synopsys doing this mistake a
second time?

    - [ Dr. Pepper ]


Related Articles

    Genus RTL synthesis gaining traction vs. DC is #4 of Best of 2017
    After 16nm benchmark, 7nm user swaps out DC-Graphical for Genus-RTL
    ICC2 patch rev, Innovus penetration, and the 10nm layout problem
    Aart's SUE RIVALS policy backfires horribly on core SNPS patents

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)