( ESNUG 523 Item 4 ) -------------------------------------------- [05/02/13]
From: [ Pine Yan of Nvidia ]
Subject: Nvidia gets 23X faster, 80% less memory with VCS plus Rocketick
Hi, John,
NVidia uses Rocketick's simulation accelerator "RocketSim" for full chip
gate simulation. We started evaluating Rocketick's gate-level simulation
accelerator 2.5 years ago, and purchased it ~1 year ago. I can share some
of our production data. Rocketick can also accelerate RTL simulations,
but it is still under evaluation by another group.
We use Rocketick's GPU-based simulator to increase our verification
throughput and debug efficiency. It needs a co-simulator to run the
behavioral stuff and can work with any simulator on the market. We
primarily use Synopsys VCS for final simulation sign off and following
data are with VCS. Rocketick uses NVidia's GPU for acceleration, so
it's a double win for us.
SPEED
We use Rocketick in production, so my benchmark data is for real designs,
not test cases. Our largest design is more than 1 billion gates.
Example 1: Design 1 - 240 M gates, transition test
Captures time (hours) Time/capture Speed-up
-------- ----------- ------------ --------
VCS 7 8.2 1.17 hrs 1X
Rocketsim 7 0.64 0.09 hrs 13X
RocketSim 102 5.31 0.05 hrs 23X
As you can see in the first example, Rocketick is 10-20X faster than normal
software simulation. In the second, it didn't run at all. All the
simulation run time below were with various ATPG patterns, where a capture
is an test segment including scan chain load/unload and capture sequences.
Example 2: Design 2 - 470 M gates, stuck-at test
Captures time (hours) Time/capture
-------- ----------- ------------
VCS 0 n/a wouldn't run
RocketSim 1000 33 hrs 0.03 hrs/capture
Keep in mind initialization time is constant regardless of number of
captures. So the real per capture runtime comparison is not available since
we can't calculate the original VCS per capture time without initialization.
Rocketick totally changes the picture -- it is fast enough to even makes
interactive debug possible. We can run individual jobs in 30 min versus
7 to 8 hours with normal SW simulation. If I find a problem I can fix it,
then run Rocketick again and come back in an hour.
MEMORY UTILIZATION
Rocketick also has substantially lower memory utilization compared with VCS;
the example below shows it only requiring 1/5th of the memory of normal
simulation. The smaller memory footprint also improves RocketSim's capacity
because it doesn't have to constantly swap out memory; we can run full chip
simulation on larger chips.
Design: Design 3 Size: 1+ billion gates
Memory Memory
Product Required Usage Reduction
---------- ------ ---------
VCS 256 GB 1.0 X
RocketSim 50 GB 0.2 X
We have lots of jobs competing for our memory machines. GPU simulator
naturally eased the congestion since GPUs can be added to any small memory
server and off-load most of the memory need onto GPU. With GPU servers we
can run 30 jobs in parallel in 80 hours without competing for resource.
If we can run one job on one GPU, and we have 2 GPUs, we can improve the
performance for a particular job or run multiple jobs across multiple GPUs.
RocketSim's distribution system automatically takes care of that.
In comparison, simulators use the same shared CPU memory and run slower due
to all the memory swapping back and forth.
SCALABILITY
We often need 10-12 GB of memory to run RocketSim for our large designs.
Our GPU's memory ranges from 4 GB to 6 GB. RocketSim is scalable, so we
can add 4-5 GPU cores in our servers, and RocketSim will automatically
partition the design into different blocks. It is completely parallel and
they do a very good job of splitting it.
COMPILE TIME
RocketSim's compilation time is shown below. It is a bit longer than VCS,
but this is not a concern for us, as long as its simulation is so fast.
Especially because we don't do many compilations, each compile run for a
test is shared across multiple verification engineers. We may use a
concurrency of 1 or 12 with RocketSim, it depends on how much resources
I can get.
Case 1: Design 4, 260 M gates
Compile time Concurrency
------------ -----------
VCS 2.0 hrs 1
Rocketick 2.7 hrs 8
Rocketick 6+ hrs* 1
Case 2: Design 1, 240 M gates
Compile time Concurrency
------------ -----------
VCS 1.1 hrs 1
Rocketick 2.1 hrs 8
* -- I no longer have the exact original number; we do not run at
concurrency 1 anymore.
SET-UP TIME
Setting up RocketSim for a new project takes about 3-4 hours, which isn't
much more time than it takes us for VCS. We use the same file list that we
input to VCS for compilation, and run the simulations with the same
arguments we use for VCS. Rocketick's simulator interface and use model is
similar to VCS.
ACCURACY
We started out as a beta user of Rocketick in 2011. We started with sample
test cases they had mismatches all over. We ironed through them. It was a
gradual process.
As of 7-8 months ago, Rocketick had NO mismatches after many runs. We felt
the tool was reliable enough to use for our production projects, and now use
it for all our runs.
We mainly use RocketSim as a regression tool, in order to obtain quick
coverage between design releases. We still do sign-off runs with VCS - but
only using limited patterns, and then run the FULL set with RocketSim. As
our confidence and usage grow, I expect we will also gradually move toward
doing design sign-off with Rocketsim
DEBUG
1. Rocketick claims they are compliant with Verilog IEEE 1364-2001,
1364-2005, VHDL, and System Verilog. We have not tested VHDL
support.
2. Rocketick can simulate (4-state logic) 0/1/X/Z states properly.
3. Rocketick claims support for VCD and FSDB. We run RocketSim
with dump enabled, then run Silotti to view the waveforms.
4. Rocketick has PLI-compliant interface to with native simulator,
which means the Rocketsim can run with any logic simulation
tool, Cadence, Mentor, etc.
5. RocketSim's "XRAY profiler" detects level of activity (events)
running between clocks. It helps to identify the potential of
GPU acceleration for various tests we run. And we could also
use that to help increase our tests' activity level to improve
verification efficiency.
ROCKETBUILD
Prior to doing simulation acceleration, RocketSim has a function called
"RocketBuild", which analyzes the combined design and testbench, and then
identifies which structures can be accelerated, and which structures, such
as behavioral code and non-synthesizable testbenches, cannot be accelerated.
The portions that can't be accelerated remain in VCS. So we still need a
simulator license to run RocketSim -- in our case we primarily use it with
VCS. Both tools run in sync with each other.
In our case, RocketSim can accelerate a super majority:
Typical portion of the design staying in Rocketick: 95%
Testbench and behavioral code remaining in VCS: 5%
RocketSim also includes a "module verifier" function. The module verifier
automatically generates testbench/ stimuli for each targeted library/design
modules, and then automatically runs exhaustive regression tests on a
module-by-module basis to validating that each module can simulate properly
in RocketSim. This greatly helps off-loading our burden in figuring out
which modules should be kept in VCS and allows Rocketick to enhance their
tool efficiently if a module should have been accelerated.
- Pine Yan
Nvidia Santa Clara, CA
Join
Index
Next->Item
|
|