( ESNUG 587 Item 5 ) ---------------------------------------------- [07/22/19]

  EDITOR'S NOTE: I love love love deep technical "how to" and especially
  "how NOT to" user posts like this!  Now that's good DeepChip!  - John

Subject: 12 good and 4 bad switches in new Genus/Innovus/Tempus 19.1 flow

Then we compared power, performance, area, (PPA) and runtime throughout
5 flows using the TSMC CLN7FF library:

 Tools                                Flow Name         Comment
 ----------------------------------   ---------------   ------------------
    DC-> test ins -> ICC2 -> PT       SNPS-All          our old SNPS flow
     SNPS Fusion Compiler -> PT       SNPS-New          new SNPS flow
 DC-> test ins -> Innovus -> PT       Innovus-PT        old Innovus flow
 DC-> test ins -> Innovus -> Tempus   Innovus-Tempus    old Innovus+Tempus
 Genus-> Modus -> Innovus -> Tempus   CDNS-All          CDNS only flow

Our goal was to have "Mongo" with 3.0 M inst, a number of ARM cores and
hard macros, and some very tight power requirements reach 3.2 GHz (or
better) in TSMC CLN7FF.
Flows Best Frequency
Achieved
TNS left
on table
Total Power   TAT  
SNPS-All 2.87 Ghz 97 nsec 1,838 mW 14.7
days
SNPS-New 2.67 Ghz 165 nsec 1,923 mW 12.4
days
Innovus-PT 3.06 Ghz 44 nsec 1,720 mW 11.7
days
Innovus-Tempus 3.12 Ghz 24 nsec 1,667 mW 9.8
days
CDNS-All 3.22 Ghz 0 nsec 1,586 mW 8.2
days
What we found is the "CDNS-All" flow consistently gave us better PPA in the shortest runtime of any of the flows. - from Benchmark of DC-ICC2 vs Fusion Compiler vs Genus-Innovus

From: [Ralph, an adult Mutant Ninja Turtle]

Hi, John,

We saw similar trends when we went from DC/ICC2 over to Genus/Innovus 18.1.

Right now we've just swiched over to Genus/Innovus/Tempus 19.1 and here's
how we made it work.  First, here's the four new parts of the new CDNS 19.1
flow in a nutshell.

    - new physical restructuring with iSpatial
    - new Mux and Datapath Restructuring
    - new Machine Learning based optimization
    - new Tempus ECO that skips post-route optimization

Cadence has this idea of common engines.  They used to have common placement
and common rout-ing engines across all the tools.  With CDNS 19.1, we get
"iSpatial", which what Cadence marketing calls putting GigaOpt everywhere in
their flow to have one common optimization engine everywhere, too.  (GigaOpt
is the optimization engine orginally in Innovus.  Now it's in Genus and
Tempus, too.) 

With iSpatial we see a ~1.8x runtime speed up for our full 19.1 flow vs. the
old 18.1 full flow.  The new iSpatial predicted our area exactly, and our
power exactly (because it's GigaOpt moved upstream into Genus).  This better
timing used by our RTL team to tune the RTL.  So when this better 19.1 data
is taken to Innovus, Innovus doesn't have to do placement optimization
again.  Innovus 19.1 goes purely incremental after that, which is what gets
us that 1.8x runtime speed up.

To enable iSpatial we needed to do the following switches:

    set_db limited_access_feature {ispatial 214480224}
    set_db opt_spatial_effort extreme
    syn_opt -spatial

This is what we're using now because it's what the Cadence folks told us to
do.  Are other users using different switches to turn on iSpatial?

        ----    ----    ----    ----    ----    ----    ----

On top of a 1.8x speed up, we also saw better QOR overall with iSpatial from

  - Early Clocking: since we now get Gigaopt within Genus, it now uses
    the early clocking flow in synthesis with useful skew.  The clock
    gates and skewed macros now start to show up early in synthesis.

  - Physical Restructuring: the Genus mapper now gets to use GigaOpt to
    make early (and clever) physical optimations during RTL synthesis.

    Let's say our design has a cascade of adders 6 deep.  The old 18.1
    Genus can't see the whole cone of logic it works out to.  But with
    19.1, it can see the whole cone and restructure it optimally.

With this we are getting gains on QOR and more importantly on congestion.
 18.1 Genus/Innovus   19.1 iSpatial flow 
Timing WNS/TNS  -88psec/-265nsec -32psec/-93nsec
Power 1,245 mW 1,173 mW
TTR/TAT 8.12 days 4.78 days
Here's the new switch to enable this restructuring set_db limited_access_feature {ispatial_restructuring 439424160} set_db opt_spatial_restructuring true ---- ---- ---- ---- ---- ---- ---- IMPORTANT: DO NOT RERUN INNOVUS This new 19.1 restructuring flow above puts Innovus in incremental mode. But you might lose the TAT benefit if you re-start Innovus again. So the user has to watch out for that. So you have to make sure you turn on your Innovus place optimization to the much faster incremental version: setPlaceMode -place_global_exp_skip_gp true place_opt_design Cadence says as long the Genus dB is handed over to Innovus backend dB, we will get this incremental behavior in Innovus. (This works because is the GigaOpt stage is already done and the dB's automatically sense that.) But it must be a dB handoff! If you DON'T do a dB handoff, then you MUST use the two switches above. ---- ---- ---- ---- ---- ---- ---- NOW HAS MUX AND DATAPATH RESTRUCTURING IN RTL SYNTHESIS We are using the 19.1 based Genus release. Our previous 18.1 methodology used a low-level RTL coding style. This is pretty common and most synthesis tools handle this well enough. However we are now doing chips with a lot of machine learning logic content in them that requires the use of high level RTL logic coded in System Verilog. For example: max= 0 for (int i=0; i <4; i++) beginif (max < array [i]) begin max = array [i] end end This would expand to in a normal unrolling of the above loop out to
But the better solution is to expand it to
For 4 levels, this saves us a 1 level of logic.  Now imagine this is with
a loop thats 16 levels deep.  It would save 11 logic levels.  This trade
off between levels of logic, area, power, timing is crucial for SV chips. 

We turned this on in Genus 19.1 with the DP Turbo switch:

    set_attr dp_opt_turbo true

We checked that the decisions 19.1 took were correct and it didn't need
any guidance to get the data path right.

Timing is important at 5nm, but our key QOR focus is on power.  Getting
data path and MUX just right is important.  This DP Turbo switch is new.
(I think Cadence calls this Compus.  I don't think they've announced this
yet.)  We like the fact that just one switch enables this, and not too
many options. 

Cadence gave us a way to measure path depths:
 Path Depths Range   # of Paths w/ 18.1   # of Paths w/ 19.1 
 plus high level opto 
0-4 levels 22349 22785
5-9 levels 34987 35955
10-14 levels 46876 46903
15-19 levels 43875 44329
20-24 levels 74892 75309
25-29 levels 54985 60783
30-34 levels 18904 24509
35-39 levels 7854 1023
40-44 levels 5689 23
45-50 levels 1208 0
Notice with 19.1 the path depth shrinks at over 35 levels. See above 35 with 18.1 there are 7,854 + 5,689 + 1,208 == 14,751 levels See above 35 with 19.1 there are 1,023 + 23 + 0 == 1,046 levels What 19.1 is doing is MUX and datapath restructing to shave off 15 levels. We were using Genus for our high end CPU QOR. With 19.1 we now have one tool for both CPU style designs and other designs. (As compared to SNPS which uses Fusion Compiler for CPU and DC-NXT for all the non-CPU design; and FC and DC-NXT have different UI's with different unique commands; whereas CDNS 19.1 has the same commands throughout.) ---- ---- ---- ---- ---- ---- ---- CDNS 19.1 NOW HAS MACHINE LEARNING We got new code from Cadence where they brag above having Machine Learning. We found that it fixed their pre-route to post-route correlation problem. It used to be when we went from 16nm to 7nm, 18.1 correction was 30% lower. With ICC2, the correction was also 30% lower.
 19.1 without ML   19.1 with ML 
WNS/TNS  -9psec/-50psec   -10psec/-48psec 
Power QOR   945 mW  911 mW
TTR/TAT  2.3 days  2.2 days
With CDNS 19.1 in training, we got ~5% aligned pre-route to post-route; and that was close enough it gave us another 3.5% power savings. What's also good is all our ML training design data resides on our own network and we feel secure with that. We enabled it using the following setMachineLearningMode -training net_cell_delay python <training_package_path>/run.py -mode train \ -type net \ -datadir <data_dir> \ -outdir <model_dir> \ -log train.log setMachineLearningMode -deployment As you can see the numbers weren't all that different with or without ML; but fixing that pre-route to post-route 30% lowball correlation problem is why we like the new ML in 19.1. ---- ---- ---- ---- ---- ---- ---- NEW 19.1 TEMPUS ECO SKIPS POST-ROUTE OPTIMIZATION Cadence 19.1 is proposing PBA based optimization. (They're saying skip the post-route optimization step altogether.) Since Tempus is integrated into Innovus, with Tempus ECO we now get is PBA based optimization straight into Innovus. Looks like they strengthened their timing driven routing. With Machine Learning we get such good pre-route to post-route correlation, we can directly move on to Tempus ECO power optimization as the final step in our flow. All PBA based. The switch for it is: signoffOptDesign -setup -hold -leakage -dynamic -drv -area The above command got us PBA based optimization and replaced the old one: optDesign -postRoute <-- BAD OLD COMMAND! DO NOT USE!!! Important it is a must that you specify the process node setting. setDesignMode -process 5 It's the one setting to make sure 5nm is enabled properly in 19.1. ---- ---- ---- ---- ---- ---- ---- BROKEN/DANGEROUS STUFF IN 19.1 We tried some other stuff that didn't get us much gain and would like to see if other users are able to make this work. 1.) We tried hold aware scan chain reorder after inserting clocks. setScanReorderMode -holdAware true scanReorder This didn't work well in reducing hold violations. Cadence needs to fix this. This is just not good right now. It messed up our scan chains and improved hold but the design was unroutable. On another design it didn't change hold much. 2.) We have some glitch power left on the design to be optimized. We tried the Joules engine for glitch power optimization - it does good job calculating and reporting it, but we need to optimize it. It literally cannot optimize it! Cadence says an optimizing Joules is at least 3 to 6 months away. 3.) We had some functional ECO that might touch the clock network and we ended up fixing transition violations on those clock nets ourselves. Cadence told us about this command: ccopt_pro That command was not good. It destroyed and touched our whole clock tree whereas we just wanted part of the clock network optimized. STAY AWAY FROM THIS! 4.) We also tried to do restricted metal only ECO limited to 4 layers using the following ecoRoute -modifyOnlyLayers 5-8 It said it changed 4 layers but it touched the vias between the layer 4 to layer 5 which cost us another layer change costs. ---- ---- ---- ---- ---- ---- ---- WE HAVE NOT TRIED VIRTUS YET We are a Tempus/Voltus signoff flow. We are worried about true IR aware signoff at 5nm and would like to be trying out Virtus in this rev of our chip. We would like to find out in vectorless analysis which is our real sensitive paths. (Apache isn't true vestorless yet.) Fingers crossed. ---- ---- ---- ---- ---- ---- ---- CONCLUSION Our best high level 19.1 recipe (right now) seems to involve - new physical restructuring with iSpatial - new Mux and Datapath Restructuring - new Machine Learning based optimization - new Tempus ECO that skips post-route optimization Would like Cadence to fix their hold aware scan chain reorder problem and to get glitch power optimization working in Joules. - [Ralph, an adult Mutant Ninja Turtle] ---- ---- ---- ---- ---- ---- ---- Related Articles User benchmarks DC-ICC2 vs Fusion Compiler vs Genus-Innovus flows Genus RTL synthesis gaining traction vs. DC is #4 of Best of 2017 After 16nm benchmark, 7nm user swaps out DC-Graphical for Genus-RTL ICC2 patch rev, Innovus penetration, and the 10nm layout problem Aart's SUE RIVALS policy backfires horribly on core SNPS patents
Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.














Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)