( ESNUG 517 Item 6 ) -------------------------------------------- [01/17/13]
Subject: Cadence replies on why SNPS-EVE has zero impact on Palladium/RPP
> Q: After weeks of denial, Synopsys today offically announced
> it bought EVE and its ZeBu emulator. Rumor is Aart paid
> around $150 million for the deal.
>
> For your company, this Synopsys-EVE merger is (choose one)
> GOOD news, BAD news, NEUTRAL news because (say why):
From: [ Frank Schirrmeister of Cadence ]
Hi, John,
So what's new after Synopsys bought EVE Zebu?
Nothing, really.
We at Cadence fully agree with Srikanth Muroor of Telegent in ESNUG 486 #1
that users need *both* emulation and FPGA-based emulation/prototyping.
Synopsys now actually has three FPGA-based verification engines:
- Synplicity HAPS, that came in a 2008 acquisition,
- ProDesign CHIPit, another acquisition a year later, and now
- EVE Zebu.
Mentor's Veloce and Veloce 2 systems are based on custom FPGAs.
In contrast, Cadence provides processor-based emulation (Palladium) as well
as FPGA based prototyping (Cadence Rapid Prototyping Platform - RPP).
BEST OF BOTH WORLDS FOR CADENCE USERS
We connected RPP and Palladium to combine the advantages of both; fast
turn-time and debug/visibility in Palladium plus FPGA speeds in RPP. The
same front-end allows designs running in Palladium to come up *much* faster
than normal in RPP. It's weeks in RPP -- as opposed to hours in Palladium
once you have a new RTL drop -- but much faster than in a homebrew custom
FPGA prototyper (see ESNUG 486 #1).
Once a new bug is found in RPP, using its higher speed, the same netlist
can be loaded and debugged in Palladium where debug is much better; just
like in RTL simulation. This way users get the best of both worlds.
---- ---- ---- ---- ---- ---- ----
What are the major differences between processor-based and FPGA-based
emulation/acceleration/prototyping?
The fundamental issue with FPGAs is that the internal capacity grows much
faster than the number of pins. The situation is OK when the design fits
into one FPGA, but routing-per-FPGA becomes more complex. Once a design
is too complex to fit into one FPGA, the design has to be partitioned.
That's where the troubles begin.
Partitioning and routing between FPGAs becomes very hard because of the
capacity-per-pin ratio.
|
Year
|
Family
|
Capacity (Gate equivalents)
|
Max I/O (pins)
|
Capacity per I/0
|
|
2003
|
Virtex-II Pro
|
1,230,000
|
1,040
|
1,183
|
|
2004
|
Virtex-4
|
2,490,000
|
2,594
|
960
|
|
2006
|
Virtex-5
|
3,320,000
|
1,200
|
2,767
|
|
2009
|
Virtex-6
|
9,105,000
|
1,200
|
7,588
|
|
2011
|
Virtex-7
|
14,000,000
|
1,200
|
11,667
|
By 2011, you have signals of 11,667 gates trying to get through 1 pin!
On first sight, time-division-multiplexing of your FPGA pins seems to be
the answer. Have several signals share the bandwidth of one pin and the
less bandwidth that is required per signal. More signals per pin can be
used at the same execution speed.
However, in reality a small bandwidth per signal is hard to achieve in
FPGAs because of internal routing delay unpredictability and your timing
constraints increase compilation time significantly.
With unpredictable FPGA timing, FPGA capacity does not really scale and
performance -- the main advantage of FPGAs -- degrades quickly.
Getting FPGA-based verification up and running essentially becomes a very
complicated place and route problem with timing closure issues.
In short, weeks to load, poor debug visibilty, lightening simulation speed.
---- ---- ---- ---- ---- ---- ----
In processor-based emulation, instead of SoC-gates-mapping-to-FPGA-gates
and SoC-wires-toFPGA-wires as in traditional FPGA based emulation, Cadence
Palladium uses a full custom processer that deterministically models logic
functions as a Boolean expression, i.e. gates are abstracted to Booleans.
Connections between gates are then modeled as a data dependency graph so
therefore wires translate into communications links. We can schedule
communications between nodes and aren't limited by wire-level connectivity.
And for debug - because the fundamental structure is a computer, every node
in you SoC design is addressable -- which gives superior visibility -- full
SW-simulation-like visibility with massive tracing capability built into our
Palladium custom processor structure.
NO signals of 11,667 gates trying to get through 1 pin with Palladium!
In short, hours to load, great debug visibilty, slower simulation speed.
---- ---- ---- ---- ---- ---- ----
What to use when?
It all depends on the maturity of your RTL code. Users tell us that when
they still get several RTL drops per day, the turn-around time and debug
visibility drive them towards Palladium. When their RTL code gets stable
enough then the-need-for-speed takes over and Cadence RPP is best. Here's
a summary table:
|
User care about:
|
RTL SW Simulation
|
Acceleration (RTL SW testbench with HW Assistance)
|
Processor Based Emulation
|
FPGA Based Prototyping
|
|
Speed
|
Hz - KHz Too slow for SW
|
100x to 1000x OK for some SW
|
MHz Good for SW
|
10’s of MHZ Great for SW
|
|
Turn-Time
|
Minutes
|
Hours
|
Hours
|
Weeks to month
|
|
Debug and Visibility
|
Great for HW
|
Great for HW
|
Great for HW and SW
|
Limited for HW Great for SW
|
|
Connectivity to real world
|
Environment needs to be modeled as testbench
|
Environment needs to be modeled as testbench
|
Real hardware with rate adapters
|
Real hardware with rate adapters
|
|
Multi-user access
|
Limited by # of simulation licenses
|
Limited by hardware access
|
Large numbers of user per system
|
Small number of users per system
|
Why I'm NOT conncerned about Synopsys acquiring yet a 3rd FPGA-based system
with EVE Zebu is because they lack *any* processor-based system. This is a
serious hole in Aart's verification offering that Palladium excels in:
- Palladium XP does predictable compile times of 50 M gates per hour,
fast turn-times and debug visibility. See ESNUG 486 #1.
- Palladium has best cost per design seat, great debug, and the ability
to serve variable payload sizes (4 MG to 2 BG in 4 MG increments for
512 users) as documented by Broadcom, Nvidia, AMD, Renesas-Mobile.
- Palladium has in-circuit and acceleration adapters based on SpeedBridges
and Accelerated Verification IPs as documented by Samsung, PMC-Sierra
and Marvell.
- Palladium’s built-in dynamic low power analysis (DPA) allow designers
to capture power switching activity for peak and average power analysis.
This is what 3rd generation processor-based emulation gets Cadence that
Synopsys is missing.
But bottom line, users need both. That's why we offer both Palladium and
the FPGA based Rapid Prototyping Platform (RPP), connected through the same
front-end to speed up RPP bring-up and allowing efficient debug in
Palladium once bugs have been found in RPP at faster speed. And it's all
nicely connected to Incisive RTL SW and TLM SystemC, too.
Let's talk when SNPS catches up to us in 3rd gen processor-based emulation.
- Frank Schirrmeister
Cadence Design Systems, Inc. San Jose, CA
Join
Index
Next->Item
|
|