( ESNUG 561 Item 1 ) -------------------------------------------- [05/27/16]

Subject: A second CDNS Voltus-DP vs. ANSS Gear RedHawk-DMP user benchmark

WHAT I HEARD: In the end, the Voltus-DP die-package simulations matched up with real silicon perfectly. Yes, we looked at the Gear flavor of Ansys RedHawk and it crashed! Same for SNPS PrimeRail!


(click on pic to enlarge image)

The only tool that could IR-drop our big ass chip was Voltus-DP. We benchmarked Voltus-DP and it trounced both Apache Gear RedHawk and SNPS PrimeRail in capacity. ("And your ~300 M inst estimate is waaaay off, you idiot, Cooley!! Nyah! Nyah!")

And Voltus-DP did this chip flat! Whoa, mama! Not hierarchical, but flat!

    - from http://www.deepchip.com/items/0560-03.html


From: [ Babar, the Elephant ]

Hello John,

Please keep me anonymous.

My group liked this CDNlive'16 Nvidia benchmark.  It's posts like these is
why I read DeepChip.

I lead a physical design team responsible for designing and implementing
TSMC 16nm chips for networking and servers.  My team's responsibilities
include synthesis, PnR and sign-off.  Our design sizes are typically in
the range of 50M to 300M instances with a lot of IP content such as ARM
cores and Desigware memory controllers.

Right now our design flow is:

    - First Encounter for floor planning
    - DC Topo for synthesis
    - a mix of Encounter/Innovus and ICC/ICC2 for PnR
    - Primetime for timing signoff
    - both Voltus and Redhawk for power signoff

We have evaluated both CDNS Voltus and ANSS Redhawk for power signoff.  I'd
like to share with you my team's experience on the two products.

For us, power signoff is critical on our TSMC 16nm designs because of issues
that come up during final SoC integration and supply voltage scaling.  (And
we're expecting it to be an even bigger headache for 10nm.)

 
MARKETING IMPRESSIONS GOING IN

The Ansys Apache marketing machine has been in overdrive.  They have a new
distributed version of Redhawk, Redhawk-DMP, that has big data algorithms,
elastic-compute, and Hadoop from Gear.  (See John Lee in ESNUG 554 #1.)  My
local Apache/Gear salesman also claims ANSS has new Gear RnD talent to breed
new life into the Redhawk product line.
  


  
Anirudh's Cadence team has also been hyper-active claiming that they finally
have their own big data distributed tool called Voltus-DP.  Also claim that
it's integrated with Innvous and Tempus "for rapid design convergence".

Both Voltus and Redhawk claim dramatic improvements on runtime and capacity,
so we had to take a look at them ourselves to see who is right.

The design for our Voltus vs. Redhawk benchmarks.

   - TSMC 16FF+
   - SoC including Cortex-A53 CPU, Dolphin SRAM and other DW IP blocks
   - ~20 power domains
   - 160M instances 
   - SRAMs modeled down to lower metal level
   - long dynamic runs based on VCD vectors


FLAT RUN BASELINE:

First we ran the testcase on a single dedicated Linux server with 48 CPUs
and 1TB memory which is our biggest machine.  The single machine results:

                          Voltus            Redhawk
         ------------------------------------------------
         runtime           46 hours         133 hours
         peak memory      750 G             820 G
         disk usage       420 G             630 G 

The two tools gave us very similar hotspots, and their effective IR-drops
(IVD) were within +/-5% of each other.  The 2.5X runtime and 10% to 30%
smaller memory footprint advantage of Voltus over Redhawk is mostly
inline with our past experience.


DISTRIBUTED COMPARISION:

Now that we have our single machine baseline, we then benchmarked their
distributed Voltus-DP vs. Redhawk-DMP.  Both do massive parallel IR-drop
simulations across multiple machines through LSF.  Our expectations with
these IR-drop tools going distributed is that wall clock runtime should
dramatically drop.

                          Voltus-DP         Redhawk-DMP
         ------------------------------------------------
         runtime           10 hours          26 hours
         peak memory      120 G             330 G
         disk usage       400 G            1600 G

Voltus-DP was 2.5x faster vs. Redhawk-DMP, which is similar on the flat
runs.  Voltus-DP used 1/4 to 1/3 the memory footprint that Redhawk-DMP.
That was far better than the flat run memory footprints.


DISTRUBUTED IR-DROP ACCURACY PROBLEMS:

Second, we looked at the accuracy of the two distributed tools.  We found
that the results by Redhawk-DMP vary a lot with the number of machines we
used in our runs.

We tried Redhawk-DMP on the same design with the same VCD dynamic run files
on 2/4/8 machines respectively and compared the IR-drops of all the runs
versus those of the flat Redhawk run.  (Our flat run was considered as the
baseline.)  We observed that the IR-drop of any instance in the testcase can
vary up to 50mV across those runs.   The following table shows IR-drops of
four instances given by Redhawk-DMP runs using different number machines.

         Redhawk-DMP  1-machine  2-machines  4-machines  8-machines
         ----------------------------------------------------------
          DFF4 93459    92mV        66mV        73mV        88mV
          DFF4 12103    63mV        82mV        33mV        75mV
          MUX4 04567    50mV        73mV        89mV        62mV
           INV 2341     32mV        20mV        12mV        43mV

This variation is unacceptable to our signoff criteria which is typically
Worst IR-drop in TSMC 16FF+ designs of only around 100mV.

We think this accuracy loss of Redhawk-DMP is most likely to be caused by
a fundamental flaw in their distribution algorithm.  According to the
logfile, Apache/Gear partitions the design from the very beginning and
solves each of the partitions independently like what extraction/DRC people
normally do.  That approach will almost certainly get busted in the context
of power grid simulation due to the tight coupling nature of power grids.

On the other hand, Voltus-DP is able to produce the same accuracy as its
flat run scaling up to 8 machines were used.  We did not test Voltus-DP
beyond 8 machines.  Its variation across different runs were under 1%.

         Voltus-DP    1-machine  2-machines  4-machines  8-machines
         ----------------------------------------------------------
         DFF4 93459     87mV        87mV        87mV        87mV
         DFF4 12103     66mV        66mV        66mV        66mV             
         MUX4 04567     52mV        52mV        52mV        52mV
          INV 2341      30mV        30mV        30mV        30mV

To make our benchmark more complete, we tried the same run with the chip
package applied.  It's a must-have check we do for power signoffs as the
chip package itself can have a great impact on IR-drops.

We noticed again that Redhawk-DMP showed an even bigger variation on
IR-drops, while Voltus-DP didn't have any accuracy loss.

We also noticed that this variability can be even more visible looking at
resistor current results, making EM signoff using Redhawk-DMP impossible.


DISTRUBUTED MEMORY/DISK USE:

We also compared memory and disk use of Voltus-DP vs Redhawk-DMP.  We found:

                      1-machine  2-machines  4-machines  8-machines
         ----------------------------------------------------------
         Voltus-DP      750G        400G        210G        120G
         Redhawk-DMP    820G        620G        450G        330G

On the per machine basis, Voltus-DP had nearly linear scalability on memory
usage up to 8 machines.  The disk usage of Voltus-DP is close to its flat
run value, and stays constant as we increase number of machines.

Redhawk-DMP memory's scalability was sublinear, saturating at 4 machines.
The disk usage of Redhawk-DMP bloated as we used more machines.


WHY WE WENT TO VOLTUS-DP:
     
Although Redhawk-DMP is a Gear tool, we did not see any "big data" Hadoop
technology impacting our flat vs. distributed IP-drop benchmark.  Perhaps
it was behind the scenes?

Both of the two IP-drop tools provide a significant runtime/capacity
improvement compared to their flat runs.  Compared to Redhawk, Voltus had
better overall accuracy and memory/disk usage scalability.

On the other hand Voltus needs to improve its GUI loading time, and
warning/error messages.  Cadence R&D said that they are working on fixing
those issues and will get back to us in Q3 of 2016.

We were also told by Cadence that they are working on integrating Voltus
with Innovus' placement engine so that we can automatically fix IR-drop
problems in the placement stage.  This sounds intriguing.  I will report
my experience if I get chance to play with it.

As a result of our benchmark, we decided to use Voltus-DP as our sole power
signoff solution for the TSMC 16FF+ node.

    - [ Babar, the Elephant ]

        ----    ----    ----    ----    ----    ----   ----

Related Articles

    The Nvidia stealth benchmark of CDNS Voltus vs. ANSS Gear RedHawk

Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.









Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)