( ESNUG 567 Item 3 ) -------------------------------------------- [02/23/17]
Subject: A surly Jean-Marie sasses Cadence Palladium and Synopsys EVE Zebu
Just in time for the upcoming DVcon'17 conference 12 days from now, my spies
report that Wally & Greg's emulation guys are just about to launch a brand
new family of emulation boxes called "Veloce Strato" that's based on their
newly completed Crystal 3 microprocessor chip. ... And in a seesaw business where historically the
emulator with newest uP chip wins this new Strato is a direct in-your-face
threat to Lip-bu Tan's Palladium empire.
- John Cooley, DeepChip.com
SCOOP -- will new Veloce Crystal 3 chip crush Palladium?
From: [ Jean-Marie Brunet of Mentor Graphics ]
Hi John,
Nice write up about the Mentor Veloce Strato launch. You've proven your
network of spies is effective. DeepChip beat everyone else by a full
day on our story. And you were the only one to catch the chip side of
our story -- which we didn't brief anyone on -- nor on any mention of its
physical implementation -- which I won't discuss any further.
Some things I want to cover, key points from our official announcement.
Point 1: Our Veloce Strato is the only scalable HW/SW emulation
box on the market.
When a company invests in Veloce they minimize their total cost by buying
into a HW/SW emulator that preserves their investment. That's good news
for customers on so many fronts.
Rather than steal my own thunder on this, at the upcoming Mentor U2U event
on April 4th next month Cavium, Palo Alto Networks, Samsung, Spreadtrum,
and Starblaze will all be there detailing what a difference this makes.
(Nice Mentor U2U plug, I know.)
Point 2: You are correct. Everything starts with the chip.
It is the secret sauce of every emulator. And because of it, not every
emulator is equal. Synopsys is using commercial Xilinx FPGA chips. The
Zebu roadmap is the one of Xilinx. Aart has ZERO control over it.
Cadence is like us, they design their own chip. But their Palladium uP is
a Boolean processor tailored to their architecture. The challenge with a
Boolean processor approach is its clocking/capture scheme burns a lot of
power -- which means one needs to constantly make tradeoffs between power
and performance. The key point also when you design your own chip is that
you need to define an architecture that takes you towards a roadmap path
where you can scale. That is a problem for Palladium. (I will let Frank
respond so he can explain this matter better than me, and of course, he'll
"confirm" that Palladium has no roadmap/power issue.)
Our Crystal 3 chip is basically our secret sauce. It is like an FPGA, but
it does not suffer from the Zebu-esque visibility problems associated
with an FPGA. So think of Crystal 3 as a very large look up table with a
key (patented) innovation called "Virtual Wire" which gives us the distinct
competitive edge of screaming fast FPGA throughput plus visibility.
Now that we addressed the chip let's talk about the system.
Point 3: Veloce Strato is indeed a 2.5 billion gate emulator in
a single box.
We're scratching our heads trying to imagine how you'd do 2.5 billion gates
with either of the other two emulators. How would you put together the
Palladium Z1 boxes to achieve 2.5B gates of effective capacity? How many
single Z1s? 5? 6? And what is the total power consumption of such a Z1
configuration? Our competitive intelligence tells us that number is very
high. Tell me how that water cooling requirement works in a datacenter
environment? If fact, how does that piggybacked Z1 configuration really
perform when it comes to total throughput?
Some data on what we gathered (since I know how much you love data John):
The Palladium Z1 has 2 options -- 384M gates and 576M gates. So let's be
nice to our competitor, we will give them the 576M gate option. So to do
2.5B gate you will need at least 4 boxes. (With the 385M gate you will
need 6 boxes.)
In both cases, the combined floor footprint is larger than Strato. So no
datacenter foot^2 floor use advantage with a rack-based approach.
In terms of power, our data shows that for 4 to 6 Palladium Z1 boxes will
consume 250 Kw to 300 Kw -- about 5x to 6x more power than our Strato (with
Strato being at 50 Kw). Palladium Z1's need water cooling for both the 384M
and 576M gate option. Our Veloce Strato is eco-friendly and does not require
water. Our customers love the fact that Veloce's don't need water cooling.
Moving on. How many Zebu 3 Servers does it take to reach 2.5 billion gates
of effective capacity? 15? 20? 25? More importantly, has anyone even
seen it done? Maybe it's not possible -- so it really doesn't make sense
to even talk about the hypothetical performance and hypothetical throughput
of a 2.5B gate Zebu Server 3 configuration that you can't create. (To
use a verification phrase: "The answer is unreachable".)
Some data. Zebu is using commercial FPGA chips, notoriously known for
compilation and visibility problems. With one single Zebu box, for a small
design, the compilation is not really deterministic and it will "sort of"
converge by magic if the design less than 300M gates and you have an army
of AEs working on it.
The issue with the Zebu architecture is when you start adding boxes and
try to compile a large design across many boxes, the effective utilization
of the overal Zebu system becomes very poor. We hear less than 100M gate
per Zebu box. Around 30% utilization or less. (The designers out there who
handle FPGAs will sympathize.)
So let's do a math. To do 2.5B gate with about 100M gate per Zebu box, you
will need 25 boxes -- good luck in attempting this, and if you do so please
do send me a picture!
Even with the next generation of Zebu, assuming a 3X capacity improvement
per Zebu Server 4 box, again let's be nice to our competitor, they will
still need a lot of boxes to get to 2.5 BG. And any datacenter foot print
difference is gone.
Point 4: Your engineers don't have to partition large chips for Strato
Max capacity is the size of the biggest chip you can compile in one single
compilation in your emulator. The size where you don't need to partition
your very large chip design into two (or more) compiles. For example, one
Veloce Strato can single compile a full 2.5 billion gate design, with no
partitioning needed.
For an equivalent Palladium Z1 trying to do 2.5 billion gate, a 4-way to
a 6-way partition is needed, with 4 or 6 individual compiles needed. For
the Zebu Server 3 attempting 2.5 billion gates, it's an outrageous 25-way
partition, with 25 separate compiles needed.
Mentor
Veloce Strato
|
Cadence
Palladium Z1
|
Synopsys
Zebu Server 3
|
1 box for 2.5 B gates
|
6 boxes for 2.5 B gate
|
25 boxes for 2.5 B gates
|
1 box for 2.5 B gates

|
6 boxes for 2.5 B gate






(Plus they need H2O for cooling!)
|
25 boxes for 2.5 B gates

























|
So today a network processor, a CPU, or a GPU design is around 1.5 billion
gate, and it's expected to jump 10X to 15 billion gates within 5 to 6 years.
This means one Veloce Strato box can carry you through those years nicely.
And now the trends say we are moving towards systems of systems.
They can be huge ICs or multi-die ICs or full systems. Our market is moving
towards handling big data, huge stuff at many different levels. These ever
larger designs need a true high capacity emulator that can do the job. Our
Veloce Strato can do this. Palladium Z1 and Zebu Server 3 can't.
You get where I'm going here, John. Exciting times ahead.
- Jean-Marie Brunet
Mentor Graphics Corp. Willsonville, OR
---- ---- ---- ---- ---- ---- ----
Related Articles
SCOOP -- will the new MENT Veloce Crystal 3 chip crush Palladium?
The 14 metrics - plus their gotchas - used to select an emulator
Hogan compares Palladium, Veloce, EVE ZeBu, Aldec, Bluespec, Dini
MENT bigwigs say Veloce 2 will pass CDNS Palladium by end-of-2012
Join
Index
Next->Item
|
|