Cadence replies on why SNPS-EVE has zero impact on Palladium/RPP

( ESNUG 517 Item 6 ) -------------------------------------------- [01/17/13]

Subject: Cadence replies on why SNPS-EVE has zero impact on Palladium/RPP

>    Q: After weeks of denial, Synopsys today offically announced
>       it bought EVE and its ZeBu emulator.  Rumor is Aart paid
>       around $150 million for the deal.
>
>       For your company, this Synopsys-EVE merger is (choose one)
>       GOOD news, BAD news, NEUTRAL news because (say why):


From: [ Frank Schirrmeister of Cadence ]

Hi, John,

So what's new after Synopsys bought EVE Zebu?

Nothing, really.

We at Cadence fully agree with Srikanth Muroor of Telegent in ESNUG 486 #1
that users need *both* emulation and FPGA-based emulation/prototyping.

Synopsys now actually has three FPGA-based verification engines:

   - Synplicity HAPS, that came in a 2008 acquisition,
   - ProDesign CHIPit, another acquisition a year later, and now
   - EVE Zebu.

Mentor's Veloce and Veloce 2 systems are based on custom FPGAs.

In contrast, Cadence provides processor-based emulation (Palladium) as well
as FPGA based prototyping (Cadence Rapid Prototyping Platform - RPP).

BEST OF BOTH WORLDS FOR CADENCE USERS

We connected RPP and Palladium to combine the advantages of both; fast
turn-time and debug/visibility in Palladium plus FPGA speeds in RPP.  The
same front-end allows designs running in Palladium to come up *much* faster
than normal in RPP.  It's weeks in RPP -- as opposed to hours in Palladium
once you have a new RTL drop -- but much faster than in a homebrew custom
FPGA prototyper (see ESNUG 486 #1).

Once a new bug is found in RPP, using its higher speed, the same netlist
can be loaded and debugged in Palladium where debug is much better; just
like in RTL simulation.  This way users get the best of both worlds.

         ----    ----    ----    ----    ----    ----   ----

What are the major differences between processor-based and FPGA-based
emulation/acceleration/prototyping?

The fundamental issue with FPGAs is that the internal capacity grows much
faster than the number of pins.  The situation is OK when the design fits
into one FPGA, but routing-per-FPGA becomes more complex.  Once a design
is too complex to fit into one FPGA, the design has to be partitioned.

That's where the troubles begin.

Partitioning and routing between FPGAs becomes very hard because of the
capacity-per-pin ratio.

Year	Family	Capacity (Gate equivalents)	Max I/O (pins)	Capacity per I/0
2003	Virtex-II Pro	1,230,000	1,040	1,183
2004	Virtex-4	2,490,000	2,594	960
2006	Virtex-5	3,320,000	1,200	2,767
2009	Virtex-6	9,105,000	1,200	7,588
2011	Virtex-7	14,000,000	1,200	11,667

By 2011, you have signals of 11,667 gates trying to get through 1 pin!

On first sight, time-division-multiplexing of your FPGA pins seems to be
the answer.  Have several signals share the bandwidth of one pin and the
less bandwidth that is required per signal.  More signals per pin can be
used at the same execution speed.

However, in reality a small bandwidth per signal is hard to achieve in
FPGAs because of internal routing delay unpredictability and your timing
constraints increase compilation time significantly.

With unpredictable FPGA timing, FPGA capacity does not really scale and
performance -- the main advantage of FPGAs -- degrades quickly.

Getting FPGA-based verification up and running essentially becomes a very
complicated place and route problem with timing closure issues.

In short, weeks to load, poor debug visibilty, lightening simulation speed.

         ----    ----    ----    ----    ----    ----   ----

In processor-based emulation, instead of SoC-gates-mapping-to-FPGA-gates
and SoC-wires-toFPGA-wires as in traditional FPGA based emulation, Cadence
Palladium uses a full custom processer that deterministically models logic
functions as a Boolean expression, i.e. gates are abstracted to Booleans.

Connections between gates are then modeled as a data dependency graph so
therefore wires translate into communications links.  We can schedule
communications between nodes and aren't limited by wire-level connectivity.

And for debug - because the fundamental structure is a computer, every node
in you SoC design is addressable -- which gives superior visibility -- full
SW-simulation-like visibility with massive tracing capability built into our
Palladium custom processor structure.

NO signals of 11,667 gates trying to get through 1 pin with Palladium!

In short, hours to load, great debug visibilty, slower simulation speed.

         ----    ----    ----    ----    ----    ----   ----

What to use when?

It all depends on the maturity of your RTL code.  Users tell us that when
they still get several RTL drops per day, the turn-around time and debug
visibility drive them towards Palladium.  When their RTL code gets stable
enough then the-need-for-speed takes over and Cadence RPP is best.  Here's
a summary table:

User care about:	RTL SW Simulation	Acceleration (RTL SW testbench with HW Assistance)	Processor Based Emulation	FPGA Based Prototyping
Speed	Hz - KHz Too slow for SW	100x to 1000x OK for some SW	MHz Good for SW	10�s of MHZ Great for SW
Turn-Time	Minutes	Hours	Hours	Weeks to month
Debug and Visibility	Great for HW	Great for HW	Great for HW and SW	Limited for HW Great for SW
Connectivity to real world	Environment needs to be modeled as testbench	Environment needs to be modeled as testbench	Real hardware with rate adapters	Real hardware with rate adapters
Multi-user access	Limited by # of simulation licenses	Limited by hardware access	Large numbers of user per system	Small number of users per system


Why I'm NOT conncerned about Synopsys acquiring yet a 3rd FPGA-based system
with EVE Zebu is because they lack *any* processor-based system.  This is a
serious hole in Aart's verification offering that Palladium excels in:

  - Palladium XP does predictable compile times of 50 M gates per hour,
    fast turn-times and debug visibility.  See ESNUG 486 #1.

  - Palladium has best cost per design seat, great debug, and the ability
    to serve variable payload sizes (4 MG to 2 BG in 4 MG increments for
    512 users) as documented by Broadcom, Nvidia, AMD, Renesas-Mobile.

  - Palladium has in-circuit and acceleration adapters based on SpeedBridges
    and Accelerated Verification IPs as documented by Samsung, PMC-Sierra
    and Marvell.

  - Palladium�s built-in dynamic low power analysis (DPA) allow designers
    to capture power switching activity for peak and average power analysis.

This is what 3rd generation processor-based emulation gets Cadence that
Synopsys is missing.

But bottom line, users need both.  That's why we offer both Palladium and
the FPGA based Rapid Prototyping Platform (RPP), connected through the same
front-end to speed up RPP bring-up and allowing efficient debug in
Palladium once bugs have been found in RPP at faster speed.  And it's all
nicely connected to Incisive RTL SW and TLM SystemC, too.

Let's talk when SNPS catches up to us in 3rd gen processor-based emulation.

    - Frank Schirrmeister
      Cadence Design Systems, Inc.               San Jose, CA

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)