( ESNUG 587 Item 1 ) ---------------------------------------------- [05/24/19]

Subject: User benchmarks DC-ICC2 vs Fusion Compiler vs Genus-Innovus flows

AN RTL SYNTHESIS TIPPING POINT?: Roughly 17 years ago, a new business
book came out, "The Tipping Point", that emphasized how little things
     
now can have a large impact on future events.  It was sort of like the
old Sci-Fi "The Butterfly Effect" story; but for business. ...

What makes this interesting is since Innovus has been eating ICC2's lunch
in PnR over the past few years -- we might now be seeing a tipping point
happening in the RTL synthesis market, too.  From the many user comments
below, it appears that Cadence Genus RTL, when paired with Innvous PnR,
is now becoming a credible threat to Aart's Design Compiler monopoly.

    - from http://www.deepchip.com/items/dac17-04.html


From: [ Dr. Pepper ]

Hi, John,

For reasons of security and privacy please keep me anonymous.

In your DAC'17 #4 post you outlined what I and many others think is the key
issue with Synopsys digital implementation tools compared to what Cadence
R&D have been able to achieve. 

We have used Synopsys for a long time.  In the last 4 years we struggled
greatly to reach our project goals.  Our flow was:

   DC -> SNPS DFT insertion -> ICC2 -> PT/STAR-RC -> then Calibre/ICV

In the last 2 years my guys have attended many SNUG's, CDNlive's, and DAC's,
etc. in hopes a solution to our issues.  Then 18 months ago we decided to
give Innovus a shot to see if it could meet PPA on short schedules, then
slowly try other Cadence tools in our flow to see if there was any benefit.

Other than using Calibre as our golden DRC/LVS signoff, my group doesn't
like to do piecemeal best-in-class point tools because of the inherent
support a mixed flow brings.
For example, if we saw a mismatch between what Synopsys DC-Topo says and
Cadence Tempus STA, the Synopsys guys will say "it's a Cadence Tempus
problem" while the Cadence guys will say "it's a Synopsys DC problem" and
we're helplessly stuck in between them.  Our only exception to that rule is
using Calibre for golden sign-off.  If Calibre finds a problem, it must be
fixed upstream from Calibre.  For us, what Calibre says is gospel,
everything else is suspect.
We do a lot of different types of chips recently in TSMC 16FF, but our bread
and butter now are 3.2 Ghz multi-ARM core compute chips in TSMC 7FF.  Many
of my colleagues told us we were wasting time doing evaluations and to just
go with a full Cadence flow based on their experiences -- but our mgmt will
not let us make such drastic change without us doing a full eval ourselves.
Too much is at stake.  We were choosing tools for the next 3-4 years; not
just trying to solve short-term problems.
In our eval we took 5 high-performance blocks from Verilog RTL and inserted
the test compression/decompression logic in the RTL to create our 3.0 M inst
start block we named "Mongo".  Since we most often cannot partition blocks
(because it's too messy and impractical) we needed a big 3.0 M inst "Mongo"
design to find who has the best single big block PnR tool.

Then we compared power, performance, area, (PPA) and runtime through 5 flows
using the TSMC CLN7FF library:

 Tools                                Flow Name         Comment
 ----------------------------------   ---------------   ------------------
    DC-> test ins -> ICC2 -> PT       SNPS-All          our old SNPS flow
     SNPS Fusion Compiler -> PT       SNPS-New          new SNPS flow
 DC-> test ins -> Innovus -> PT       Innovus-PT        old Innovus flow
 DC-> test ins -> Innovus -> Tempus   Innovus-Tempus    old Innovus+Tempus
 Genus-> Modus -> Innovus -> Tempus   CDNS-All          CDNS only flow

Our goal was to have "Mongo" with 3.0 M inst, a number of ARM cores and hard
macros, and some very tight power requirements reach 3.2 GHz (or better) in
TSMC CLN7FF.
Flows Best Frequency
Achieved
TNS left
on table
Total Power   TAT  
SNPS-All 2.87 Ghz 97 nsec 1,838 mW 14.7
days
SNPS-New 2.67 Ghz 165 nsec 1,923 mW 12.4
days
Innovus-PT 3.06 Ghz 44 nsec 1,720 mW 11.7
days
Innovus-Tempus 3.12 Ghz 24 nsec 1,667 mW 9.8
days
CDNS-All 3.22 Ghz 0 nsec 1,586 mW 8.2
days
What we found is the "CDNS-All" flow consistently gave us better PPA in the shortest runtime of any of the flows. Here's my notes on each stage of the eval we did: SNPS-ALL: DC Topo -> test insertion -> ICC2 -> PT This was our 4 year old baseline SNPS only flow. Notice it's the 2nd worst numbers overall. In 14.8 days it got 2.87 Ghz and 1,838 mW. We haven't touched our DC flows for years as SNPS has always treated that area as their personal franchise. Our key issue with DC has been runtime. They refuse to fix it claiming something called "Descartes" is coming and it will apparently solve it. Our key issue with ICC2 has been PPA. It just keeps churning and it shows in its runtime. We have to call the optimization multiple times to hit a QOR that's at least reasonable for our frontend designers to iterate over. Their Primetime flow is pretty much standard, but its ECO is iterative, because PT-ECO is not physically savvy. In one case it even inserted buffers outside our block's boundary!!! Our scripts can do much better. SNPS-New: Fusion Compiler -> PT Synopsys knew we were unhappy with the PPA of their flow and told us to hold tight because their next new thing would solve all our problems. This is where it got interesting. We had been talking to SNPS about their synthesis runtimes and the ICC2 QOR issues we were facing. We had a meeting with SNPS to discuss this and there were TWO teams from SNPS. One SNPS team was handling the old DC, and another SNPS team handling their new "Descartes" synthesis tool (also called DC2). At one stage in the meeting the two teams broke into a mini argument with each other about whose tool was better! In the end they recommended "Descartes" + ICC2, now known as Fusion Compiler but said only they would run Fusion Compiler themselves in taxicab mode and that we cannot have access to the new tool until the evaluation concludes. We agreed, but required that our engineers would run all the final timing and power sign off checks on their final results. In 12.4 days Fusion Compiler got 2.67 Ghz and 1,923 mW; which was 1st place for worst numbers overall except for an improved runtime. Innovus-PT: DC Topo -> test insertion -> Innovus -> PT This was careful surgery in our SNPS flow to just replace Synopsys ICC2 with Cadence Innovus in the PnR portion. We thought it would be most difficult. But it was surprisingly easy. I guess a lot of people in the industry have helped mature Innovus to be able to plug it in almost painlessly now. We were surprised by the placement quality and QOR we were getting at all stages of PnR. After some some close work with the Innovus support team, we implemented our complex clocks during the early stages of our placement step. That was the most impressive. Its routing worked fine, with good results. And it was refreshing to run Innovus ourselves instead of having to trust the taxicab mode numbers like we had to with Fusion Compiler. In 11.7 days Innovus-Primetime got 3.06 Ghz and 1,720 mW; and was 3rd place for worst numbers overall. Still not making the 3.2 Ghz goal! (This is without any fancy stuff like distributed. Just plain old fashioned timing and power convergence. We didn't know enough Innovus on how to turn on the fancy stuff yet.) Innovus-Tempus: DC Topo -> test insertion -> Innovus -> Tempus After TSMC certified Tempus at 7nm for final signoff, our CAD guys started to look at it seriously. They confirmed for themselves that Tempus was just as accurate (or more so) than Primetime -- plus Tempus timing was within 3% of Spectre SPICE runs on the same nets. Since we already had the Innovus cockpit installed, it was trivial to enable Tempus as our sign off environment. No files to be handed off etc. And, no surprises, Tempus gave us timing numbers that aligned well with what the internal Innovus timing engine was telling us. Cadence promotes "common engines" throughout their digital flow. Running Tempus ECO with Innovus gives us a clearer idea of the brilliance in this. We could start the Tempus ECO phase from the Innovus prompt. It reduced our power, area and fixed timing in PBA mode. And when it was done, it was DRC clean. There is huge advantage to common engines and they have been talking about it for years. CDNS gets it. I think finally SNPS is getting it. In 9.8 days Innovus-Tempus got 3.12 Ghz and 1,667 mW; and was in 2nd place for *best* numbers overall. It's just 0.08 Ghz shy of the 3.2 Ghz goal! CDNS-All: Genus -> Modus -> Innovus -> Tempus For test, we just used DFT Compiler and TestMAX because it was part of our origanal all SNPS flow. Out test nerds prefer Mentor Tessent, but we were lazy and didn't involve them in this eval. CDNS suggested to use their Modus for test insertion. Since our compressor and decompressor was in RTL, just using Modus to stitch the chains was easy; we didn't have to do a lot of evaluation. It passed our sanity checks with our DFT Team. The bigger change was in swapping DC-Topo with Genus for RTL synthesis. Going to Genus gave us speed and ability to do what we call true physical synthesis. In Genus+Innovus they do placement very early and then the RTL synthesis decisions are made on that placement. Our initial assessment showed the combo did a lot better job on our datapaths and MUXes; which is key to power improvement. Impressive. The final Genus to Innovus flow works great. Again common engines makes a big difference and both tools have matured these for last 3-4 years since Cadence started talking about it. With it, we could do early clock implementations in our early synthesis stages so the tools could optimize the impact of expected skews. We also switched from Star-RC over to Quantus (QRC) for extraction here. The results are pretty much the same, but we've found Tempus + Quantas runs 1.5x to 2.0x faster than Primetime + Star-RC. In 8.2 days All-CDNS got 3.22 Ghz and 1,586 mW and was the only flow that had 0 TNS plus it beat the 3.2 Ghz goal by 0.02 Ghz! It was 1st place. ---- ---- ---- ---- ---- ---- ---- Warnings & Caveats 1. Even though we've switched over to an All-CDNS flow, we are still diehard Mentor Calibre users for our DRC/LVS work. As I said earlier: If Calibre finds a problem, it must be fixed upstream from Calibre. For us, what Calibre says is gospel, everything else is suspect. We sometimes use ICV for quick checks because we have the licenses. We're not using Pegasus because we don't think it's ready yet, plus it's very difficult for our backend guys to trust anything other than Calibre. 2. This was a PPA eval. We did not do any noise nor IR-drop runs in it because we didn't have the time. Our hunch is since Voltus is in the common engine philosphy that CDNS R&D supports, it'll fare better than a SNPS Fusion Compiler / ANSS Redhawk flow which has different engines / different db's issues -- but that's just a hunch. We don't know for sure. 3. Collectively my company has at least 500 man-years invested in developing Synopsys DC, PrimeTime, ICC, ICC2 Tcl scripts. It's going to be very painful to have to redo all this prior script work just to have Cadence scripts. 4. As much as we like the All-CDNS flow, it's not documented 100%. Some of the PPA results we got came from tool switches that the Cadence AE's and R&D told us about -- which we didn't find much documentation on. ---- ---- ---- ---- ---- ---- ---- Summary We have completed our production migration over to an "All-CDNS" digital flow for our next 7nm project. So far the deployment is going well, with the extended team able to pick up the new tools rapidly. The value of Fusion Compiler is still unclear: - It requires a completely new UI. If I am translating all my existing SNPS Tcl scripts, might as well move them to all CDNS flow. - They have 3 synthesis engines now. Old DC. Faster DC. And Descartes. - How can a weak backend (ICC2) cover for whatever happens in synthesis? - Though FC synthesis runtimes are faster, apparently they are skipping optimization and losing QOR. One of our engineers started referring to it as "Confusion Compiler" during the eval. After their ICC2 disaster, why is Synopsys doing this mistake a second time? - [ Dr. Pepper ] Related Articles Genus RTL synthesis gaining traction vs. DC is #4 of Best of 2017 After 16nm benchmark, 7nm user swaps out DC-Graphical for Genus-RTL ICC2 patch rev, Innovus penetration, and the 10nm layout problem Aart's SUE RIVALS policy backfires horribly on core SNPS patents
Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.














Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)