( ESNUG 561 Item 1 ) -------------------------------------------- [05/27/16]
Subject: A second CDNS Voltus-DP vs. ANSS Gear RedHawk-DMP user benchmark
WHAT I HEARD: In the end, the Voltus-DP die-package simulations matched up
with real silicon perfectly. Yes, we looked at the Gear flavor of Ansys
RedHawk and it crashed! Same for SNPS PrimeRail!

(click on pic to enlarge image)
|
The only tool that could
IR-drop our big ass chip was Voltus-DP. We benchmarked Voltus-DP and it
trounced both Apache Gear RedHawk and SNPS PrimeRail in capacity. ("And
your ~300 M inst estimate is waaaay off, you idiot, Cooley!! Nyah! Nyah!")
And Voltus-DP did this chip flat! Whoa, mama! Not hierarchical, but flat!
- from http://www.deepchip.com/items/0560-03.html
From: [ Babar, the Elephant ]
Hello John,
Please keep me anonymous.
My group liked this CDNlive'16 Nvidia benchmark. It's posts like these is
why I read DeepChip.
I lead a physical design team responsible for designing and implementing
TSMC 16nm chips for networking and servers. My team's responsibilities
include synthesis, PnR and sign-off. Our design sizes are typically in
the range of 50M to 300M instances with a lot of IP content such as ARM
cores and Desigware memory controllers.
Right now our design flow is:
- First Encounter for floor planning
- DC Topo for synthesis
- a mix of Encounter/Innovus and ICC/ICC2 for PnR
- Primetime for timing signoff
- both Voltus and Redhawk for power signoff
We have evaluated both CDNS Voltus and ANSS Redhawk for power signoff. I'd
like to share with you my team's experience on the two products.
For us, power signoff is critical on our TSMC 16nm designs because of issues
that come up during final SoC integration and supply voltage scaling. (And
we're expecting it to be an even bigger headache for 10nm.)
MARKETING IMPRESSIONS GOING IN
The Ansys Apache marketing machine has been in overdrive. They have a new
distributed version of Redhawk, Redhawk-DMP, that has big data algorithms,
elastic-compute, and Hadoop from Gear. (See John Lee in ESNUG 554 #1.) My
local Apache/Gear salesman also claims ANSS has new Gear RnD talent to breed
new life into the Redhawk product line.
Anirudh's Cadence team has also been hyper-active claiming that they finally
have their own big data distributed tool called Voltus-DP. Also claim that
it's integrated with Innvous and Tempus "for rapid design convergence".
Both Voltus and Redhawk claim dramatic improvements on runtime and capacity,
so we had to take a look at them ourselves to see who is right.
The design for our Voltus vs. Redhawk benchmarks.
- TSMC 16FF+
- SoC including Cortex-A53 CPU, Dolphin SRAM and other DW IP blocks
- ~20 power domains
- 160M instances
- SRAMs modeled down to lower metal level
- long dynamic runs based on VCD vectors
FLAT RUN BASELINE:
First we ran the testcase on a single dedicated Linux server with 48 CPUs
and 1TB memory which is our biggest machine. The single machine results:
Voltus Redhawk
------------------------------------------------
runtime 46 hours 133 hours
peak memory 750 G 820 G
disk usage 420 G 630 G
The two tools gave us very similar hotspots, and their effective IR-drops
(IVD) were within +/-5% of each other. The 2.5X runtime and 10% to 30%
smaller memory footprint advantage of Voltus over Redhawk is mostly
inline with our past experience.
DISTRIBUTED COMPARISION:
Now that we have our single machine baseline, we then benchmarked their
distributed Voltus-DP vs. Redhawk-DMP. Both do massive parallel IR-drop
simulations across multiple machines through LSF. Our expectations with
these IR-drop tools going distributed is that wall clock runtime should
dramatically drop.
Voltus-DP Redhawk-DMP
------------------------------------------------
runtime 10 hours 26 hours
peak memory 120 G 330 G
disk usage 400 G 1600 G
Voltus-DP was 2.5x faster vs. Redhawk-DMP, which is similar on the flat
runs. Voltus-DP used 1/4 to 1/3 the memory footprint that Redhawk-DMP.
That was far better than the flat run memory footprints.
DISTRUBUTED IR-DROP ACCURACY PROBLEMS:
Second, we looked at the accuracy of the two distributed tools. We found
that the results by Redhawk-DMP vary a lot with the number of machines we
used in our runs.
We tried Redhawk-DMP on the same design with the same VCD dynamic run files
on 2/4/8 machines respectively and compared the IR-drops of all the runs
versus those of the flat Redhawk run. (Our flat run was considered as the
baseline.) We observed that the IR-drop of any instance in the testcase can
vary up to 50mV across those runs. The following table shows IR-drops of
four instances given by Redhawk-DMP runs using different number machines.
Redhawk-DMP 1-machine 2-machines 4-machines 8-machines
----------------------------------------------------------
DFF4 93459 92mV 66mV 73mV 88mV
DFF4 12103 63mV 82mV 33mV 75mV
MUX4 04567 50mV 73mV 89mV 62mV
INV 2341 32mV 20mV 12mV 43mV
This variation is unacceptable to our signoff criteria which is typically
Worst IR-drop in TSMC 16FF+ designs of only around 100mV.
We think this accuracy loss of Redhawk-DMP is most likely to be caused by
a fundamental flaw in their distribution algorithm. According to the
logfile, Apache/Gear partitions the design from the very beginning and
solves each of the partitions independently like what extraction/DRC people
normally do. That approach will almost certainly get busted in the context
of power grid simulation due to the tight coupling nature of power grids.
On the other hand, Voltus-DP is able to produce the same accuracy as its
flat run scaling up to 8 machines were used. We did not test Voltus-DP
beyond 8 machines. Its variation across different runs were under 1%.
Voltus-DP 1-machine 2-machines 4-machines 8-machines
----------------------------------------------------------
DFF4 93459 87mV 87mV 87mV 87mV
DFF4 12103 66mV 66mV 66mV 66mV
MUX4 04567 52mV 52mV 52mV 52mV
INV 2341 30mV 30mV 30mV 30mV
To make our benchmark more complete, we tried the same run with the chip
package applied. It's a must-have check we do for power signoffs as the
chip package itself can have a great impact on IR-drops.
We noticed again that Redhawk-DMP showed an even bigger variation on
IR-drops, while Voltus-DP didn't have any accuracy loss.
We also noticed that this variability can be even more visible looking at
resistor current results, making EM signoff using Redhawk-DMP impossible.
DISTRUBUTED MEMORY/DISK USE:
We also compared memory and disk use of Voltus-DP vs Redhawk-DMP. We found:
1-machine 2-machines 4-machines 8-machines
----------------------------------------------------------
Voltus-DP 750G 400G 210G 120G
Redhawk-DMP 820G 620G 450G 330G
On the per machine basis, Voltus-DP had nearly linear scalability on memory
usage up to 8 machines. The disk usage of Voltus-DP is close to its flat
run value, and stays constant as we increase number of machines.
Redhawk-DMP memory's scalability was sublinear, saturating at 4 machines.
The disk usage of Redhawk-DMP bloated as we used more machines.
WHY WE WENT TO VOLTUS-DP:
Although Redhawk-DMP is a Gear tool, we did not see any "big data" Hadoop
technology impacting our flat vs. distributed IP-drop benchmark. Perhaps
it was behind the scenes?
Both of the two IP-drop tools provide a significant runtime/capacity
improvement compared to their flat runs. Compared to Redhawk, Voltus had
better overall accuracy and memory/disk usage scalability.
On the other hand Voltus needs to improve its GUI loading time, and
warning/error messages. Cadence R&D said that they are working on fixing
those issues and will get back to us in Q3 of 2016.
We were also told by Cadence that they are working on integrating Voltus
with Innovus' placement engine so that we can automatically fix IR-drop
problems in the placement stage. This sounds intriguing. I will report
my experience if I get chance to play with it.
As a result of our benchmark, we decided to use Voltus-DP as our sole power
signoff solution for the TSMC 16FF+ node.
- [ Babar, the Elephant ]
---- ---- ---- ---- ---- ---- ----
Related Articles
The Nvidia stealth benchmark of CDNS Voltus vs. ANSS Gear RedHawk
Join
Index
Next->Item
|
|