( DAC'19 Item 1b ) ------------------------------------------------ [12/19/19]
Subject: CDNS Palladium wins back user mindshare is #1b as the Best of 2019
PALLADIUM FLIPS: I can't speak of Palladium sales in 2016, '17, '18,
but on user perceptions I can. To be blunt, MENT's Jean-Marie Brunet had
rode roughshod on CDNS Palladium and SNPS Zebu rep with the users...
SCOOP -- Mentor Veloce to announce first HW emulator on the Cloud
Jean-Marie warns Synopsys Zebu needs 9X more "warm bodies" to run
A surly Jean-Marie sasses Cadence Palladium and Synopsys EVE Zebu
SCOOP -- will the new MENT Veloce Crystal 3 chip crush Palladium?
The only CDNS counter-marketing finally happened when Anirudh spoke out on
the DAC'17 Troublemaker Panel to defend his CDNS Palladium Z1 against these
user perceived Mentor Veloce2 and Mentor Crystal3 gains.
Sawicki and Anirudh on Veloce2, Crystal 3, Palladium Z1, Zebu4
The problem was non-emulation/prototyping news has taken up all the EDA user
mindshare during DAC'17 that the Veloce-is-beating-Palladium perception was
still sticking with the customer base.
And in 2018, the Palladium vs. Veloce war wasn't "hot" so those old default
perceptions were still sticking with the user base.
PROTIUM SPURS A COMEBACK: Out-of-nowhere, Protium, Cadence's FPGA-based
prototyper got a TON of user attention with it's super fast compile times
because Protium is very tightly integrated with Palladim -- so much so that
it got #1 Best of EDA for 2019! (See DAC'19 #1a)
ANOTHER LOOK AT PALLADIUM: that rubbed off fame, drew a fresh new look at
good old Palladium and the users saw its processor approach boasts:
- Fast Bring up. Fast compile, and once it compiles, users see
it work predictably (vs FPGA-based emulators with messy FPGA
PnR iterations to deal with).
- Killer debug. It's "full vision" mode lets engineers capture
waves up and down the hierarchy of every net in the chip in
seconds to minutes. (You need to rerun PnR with FPGA-based
emulators to capture more signals or to add a logic/bug fix.)
- Expanding uses. Palladium *was* used for HW verification;
Ffirmware verification, SW verification; HW/SW co-verification;
Architecture analysis; and post-Silicon validation. NOW it is
used earlier in design for top-level functional tests and smoke
tests, and for power analysis and dynamic power analysis.
As one user put it -- "Palladium is still the Ferrari of the emulators,
with the corresponding price point."
(UNEXPECTED FUN FACT -- In the comments you'll see a few engineers are still
maxing out their ROI by using 15 year old Palladiums, even as they upgrade
to the newer models!)
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
QUESTION ASKED:
Q: "What were the 3 or 4 most INTERESTING specific EDA tools
you've seen this year? WHY did they interest you?"
---- ---- ---- ---- ---- ---- ----
CADENCE PALLADIUM
We have the Cadence Palladium XP2, and use it for pre-silicon emulation
on our next generation baseband chips. We use it for HW debug, too.
We get about a 1 MHz operating speed -- which is approximately 1000x
faster than VCS, our Linux-based Verilog simulator.
When your use Palladium as a standalone emulator, you can just compile
it and it runs. When you use it as target hardware, and connect devices
such as Ethernet, it takes additional steps to get started -- for us it
was about 3-4 weeks. Once it's working, it's easy to use.
Our process & methodology
We move to Palladium as soon as:
- We have a synthesizable release of Verilog RTL of our design,
with our analog IP black-boxed.
- We've already run some of our top-level functional tests.
- Our verification team gives the green light we have a build
of the chip and the testbench, and key elements such as the
RAM, processor are ready and verified.
I then port the SW testbench to Palladium as a smoke test, which usually
includes scrubbing the RTL for non-synthesizable syntax and many
iterations of compilation.
It typically takes me 3-4 weeks to get basic regressions from Palladium,
where we interface some real hardware or external virtual device models
to Palladium to get outside-world stimulus.
This is one of the biggest advantages of Palladium. We take the digital
samples that come out of our radio tuner and then use radio equipment to
feed into Palladium simulation of our baseband chip. By finding bugs
this way, Palladium has saved us the expense of an additional mask set
for our silicon -- or worse, having the bug show up in the field.
- We use multi-purpose physical interface boards that we connect
to by bringing I/O pins from our chip. Cadence has wide
cables that connect to high density connectors, that connect
directly to the general interface cards for access to the
pins.
It's a reduction of sorts -- and more elegant than a
custom-designed board.
- Palladium runs slower than the hardware, so when we use
it to debug our software before silicon is available, we
must ensure that whatever interface we connect to it can
provide a slow enough clock to run at Palladium's slower
data rates.
We address this by connecting a debug-JTAG pod (from Cadence),
so our team can plug Palladium into the USB port of the
controller we are programming.
Our main mode for using Palladium is for this hardware/software
co-verification. Another way we use Palladium is if you get a bug
in silicon, you can use it for post-silicon debugging.
PALLADIUM DEBUG
With Palladium, we can capture waves up and down the hierarchy of every
net in our chip. This is another really big advantage, as it would be
nearly impossible for us to debug otherwise. (VCS's too slow for this.)
With Palladium you get wave capture and tracing right out of box.
Even though Xilinx and Altera have improved their signal tracing and
capture capabilities, in FPGA design debug, you have limitations on how
many signals you can capture in a trace and the time of that trace based
on the available memory. This kills Zebu, HAPS, and Veloce for us.
Also, with FPGAs, you must typically rerun your PnR to capture more
signals and/or to incorporate a logic or bug fix, so you must pick a
subset to pile in those probes. That can take from days to weeks.
Again, these looooong compiles kills Zebu, HAPS, and Veloce for us.
This applies to all FPGA-based emulators (i.e. Veloce and Synopsys
EVE/Zebu, plus Dini and ProDesign, etc.) even with the tools Mentor
and Synopsys have layered on to speed up prototyping.
USING PALLADIUM FOR STATIC & DYMAMIC POWER ANALYSIS
The big advantage is we can run lots of verification cycles, and then
leverage the data collection of the toggle rates on every net in the
design during long runtimes.
This is how Palladium Power Analysis works for us:
- We import the actual gate-level model used for tapeout. The
model consists of the netlist that is close to what we will
use for tapeout.
- We have our own proprietary OS, so we run a boot up of our
OS with the incoming data.
- All the clocks toggle and net switching activity is collected.
We write those toggle rates out to a file, with the data over
time as to how often a net changes state.
- Once we have captured all this statistical/deterministic data,
we pass it to our power analysis tool (ANSS Apache PowerArtist),
map it to an RC model, and get our power dissipation numbers.
It works as though we were switching with real silicon.
Dynamic power analysis flow with Palladium (DPA)
There are multiple players involved in finding a solution to dynamic
power issues, as any number of things can light up part of the chip with
hot spots. For example:
- The software may not have disabled the clocks on the
functional blocks not being used in the running application,
unnecessarily burning power.
- A section of the physical layout might need a module moved,
and/or changes made to the placement and routing.
To help with this, Cadence provides a lot of their own DPA post-process
analysis tools as part of the dynamic power analysis flow.
We use Cadence's post-process analysis on a particular IP blocks to
see where the net toggling and switching is occurring in the module
hierarchy while the baseband software application is being run on the
Palladium.
Some of our modules have their own hierarchies that can be 10 levels
deep.
We first run the Palladium analysis to see what correction strategies
might make work, i.e. a specific change in the software or the RTL, and
then run it though our transistor-level power analysis tool (AnsysTotem)
to get the power numbers. Finally, we fold the power numbers into the
power data to find out if this is the hot module.
---- ---- ---- ---- ---- ---- ----
CADENCE PALLADIUM Z1
We decided to go with Palladium as our emulation platform a few years
ago. We knew it was a good fit for our large designs based on our team
members' positive experience with it in their past jobs.
Some of Palladium's biggest strengths are:
1. Smooth compile. It's not FPGA-based, so we don't have the
PnR difficulties you can get with an FPGA-based emulator,
such as Synopsys Zebu/HAPS or Mentor Veloce or ProDesign.
2. Debug. Palladium offers fast waveform generation, with
almost no impact on runtime performance. Palladium
essentially generates waveforms for your entire design while
it's running, giving you full debug visibility.
We saw no equivalent debug with Zebu/HAPS/Veloce.
3. Speed bridges. We can recreate realistic stimulus scenarios
and software development environments with the help of
Cadence's hardware SpeedBridges for the major interfaces in
our chips.
We currently use Cadence's PCIe and Ethernet physical speed
bridge adapters. We've used ~200 ethernet ports.
(The Palladium-Z1 can handle 100s of I/Os.)
Compile Time and Performance
Because Palladium doesn't require the traditional PnR that an FPGA-based
emulator does, we can just compile our Verilog RTL for our Palladium
configuration and then run it.
Caveat: Trying to squeeze a larger design into a smaller capacity
Palladium and doing a brute force compile *will* adversely impact the
performance you get -- and/or even lead to compile failure if your
design simply cannot be fit into the target capacity.
On the flip side, you can also do things to improve the performance,
such as reducing your design size based on your Palladium capacity.
Because most of our designs are larger than the 400 million gate
Palladium capacity that we purchased, we first do some exploration and
then partition our design to fit in the Palladium box. Even our
smallest chips take up the full 400 M gates.
Our results:
- We get a 600-800 kHz speed on our main clock.
- Our initial compile with Palladium on a new 400 M gate design:
- Takes us close to 2 weeks including exploration of various
design reduction scenarios.
- It would likely only take 2-4 days if our designs were
under 400 M gatesand we did not explore subset partitions
for emulation.
- Recompiling our design for incremental design changes and bug
fixes take us about 1/2 day.
- We've had up to 5-6 users doing parallel tests of design
subsets.
Overall, we liked Palladium's quick database compile, quick debug
turnaround, and the ability to partition a design between users.
---- ---- ---- ---- ---- ---- ----
CADENCE PALLADIUM
We've now had the Cadence Palladium emulator for more than two years.
We've used it for hardware verification, software development, HW/SW
co-verification, architecture analysis, and post-silicon validation.
We've had 5 engineers running Palladium in parallel -- in batch mode
and through a grid. We also run regressions on it.
When we evaluated emulators, we choose Palladium over Mentor Veloce,
mostly because as we could get more user recommendations on Palladium
that has a longer time and track record in the industry. (I've never
actually used Veloce.)
We've run both our design submodules and our entire design on Palladium.
We have 32 domains of Palladium-XP and have run designs as large as 14
Palladium-XP domains.
- Palladium's compilation only takes a few hours, and it always
works.
- We get an operating speed of 500KHz to 1 MHz.
- We use Cadence's PCIe SpeedBridge heavily and with good
success.
Palladium's debug is very convenient as we can record many signals and
for many cycles. We do most of our hardware debug offline by looking
at traces.
It takes us a few minutes to get the waveforms.
I highly recommend Palladium. It is a great tool and has helped us a
lot with testing our architectural assumptions, verifying our logic,
and developing and validating overall system.
A recent result I can share is that it took us only 3 days (instead of
1-2 months) to bring-up our software after we got silicon back. And we
have found no bugs in our silicon to date.
---- ---- ---- ---- ---- ---- ----
PALLADIUM Z1
We purchased the Cadence Palladium Z1 emulation platform for firmware
bring up and HW debug purposes. We've now used it for a year.
Palladium's best advantage over Mentor Veloce and Synopsys Zebu are:
1. Its fast turnaround time.
2. Palladium has given me the best debug experience I've had to
date. It has great visibility into the whole design and
ad-hoc signals sampling.
- It's efficient, user friendly, and fast -- it only takes
3 minutes to get the waveforms.
- I can get the whole nets that I need and just choose what
to see. This is the biggest debug advantage and the most
useful feature we use.
Compilation time and predictability: We have 4 boards, and our biggest
design occupies 2 boards. Palladium's compilation time is about 3
hours. (We are not using the incremental flow or parallel compile.)
We've never haven't had any designs that did not compiled successfully
the first time.
We use Cadence's PCI SpeedBridge and Memory IO card.
- The PCI SpeedBridge connects Palladium and our design to a
host PC, which then controls the design via PCI.
- The memory IO card gives us the ability to connect a JTAG
probe to the design and debug the microprocessors we have
inside.
We currently have 2 engineers using Palladium in parallel -- and it
works fine. We typically get a 0.5 MHz operating speed.
I highly recommend Palladium as an emulation platform for early
firmware development.
---- ---- ---- ---- ---- ---- ----
It's not EDA SW, but I'd say Pallium for this year's best tool.
We port our ASIC design into the Palladium box and then test our design
with an external host with real world interfaces such as PCIe, NAND
devices we use in our SSDs, and models for DRAM, SPI Flash, etc.
My information below is mostly on Palladium, though I can make a few
general comparisons with Zebu/Veloce FPGA based systems.
-Speed/Capacity/Accuracy-
- Speed. Palladium has ~1 Mhz speed. The speed for any
particular design will vary depending on design factors.
- Capacity. Palladium offers significant capacity especially
given its recent advances with the Z1 platform. We can put
4-6 million gates into each domain, and there are 8 domains in
a board. We can also connect multiple boards in a rack to
support emulation of 100 Ms of gates -- close to a billion.
- Accuracy. We map our ASIC design "as is" to the Palladium
box, including all test logic. Our intent is to emulate the
real silicon in all its modes.
-Bring up-
Bringing up a new design on Palladium took us about 4 weeks, which is
quite fast. Similar bring-up on a Zebu/Veloce FPGA based system can
take several months.
-Debug-
Debug is probably Palladium's biggest advantage as it offers complete
visibility of our design. When compared with Zebu/Veloce/HAPS,
Palladium is more mature, and its debug features are better.
- We look at Palladium's strength from the viewpoint of platform
usability and turnaround. How many bugs can we find quickly
and how fast can we fix them? It does hours vs. weeks.
- Once we know there is a bug, and have the right trigger to get
the waveforms, then it is very easy for our designers to find
the root-cause for the bug.
Palladium offers compelling value here. We get full visibility, as
if we were debugging the RTL waveforms.
-Compilation-
Palladium compilations are pretty fast -- generally within 2 hours. We
did not have compile issues other than occasional System Verilog parsing
errors.
Palladium's compile methodology is much superior compared to FPGA-based
systems -- where it takes a few days to weeks to complete RTL FPGA
synthesis, FPGA place & route, and the FPGA timing optimizations.
We use Palladium for two primary purposes:
- SOC level validation - to make sure we stress the hardware and
verify several corner cases which are difficult to catch
during simulation.
- Product firmware development - to catch system issues related
to firmware and hardware interaction by exercising end-end
testing (Host <-> NAND).
Palladium is also very useful when we have an issue with our silicon.
- We make efforts to reproduce the failure on Palladium.
- Once we reproduce it, it is faster to debug and find the root
cause as there is full visibility in Palladium.
- Once we have a design fix for the problem, we bring it back to
Palladium and verify that the fix works before we roll the fix
into a new version of the silicon.
In general, we want to start emulation early as possible in our
projects. This depends on where we are in the product firmware
development and maturity of the ASIC in simulations
Emulation fits into our typical ASIC development cycle as follows:
1. As part of our verification strategy, we first verify the
modules, then the subsystems, followed by the SoC level. We
expect to run significant simulation cycles at the SoC level,
before we embark on emulation so that we don't waste time
finding basic bugs in emulation
2. Our validation team starts emulation during the subsystem and
SoC testing phases. As an example of sub-system validation,
we emulate our Flash controller sub-system with real NAND
devices before SOC level validation.
Other Palladium features & functions
- Multiple users. You can enable multiple Palladium users at
the same time as there are designs in the box. The
granularity of a design can be a domain -- limitations come
into play when you have finite cabling for the interfaces
required.
- Physical Speed bridge. We used Palladium's physical Speed
bridge to connect with an external host supporting PCIe Gen4.
Even though the emulator runs slower, we can still exercise
the full PCIe Gen 4 protocol with this approach at slow link
speeds.
- Save/Restore. Palladium offers capability to save the state
of the design and restore it later for another use. This
feature is particularly useful when a firmware boot of the
SOC takes 2-3 hours. Once boot is complete, the state can be
saved. This state can be later restored for another user
saving downtime due to boot.
Palladium is a good platform during SoC and firmware development to
identify and fix HW/FW interface bugs, and possibly architecture issues.
---- ---- ---- ---- ---- ---- ----
We've used Cadence Palladium XP for about 4 years now, with it we get
a 1 Mhz operating speed.
Our main purpose for Palladium is for our software development bring up
and HW debug -- we've had up to 4 engineers using it at the same time.
Palladium's biggest advantages are:
- Compilation speed -- typically only 20 minutes for our designs.
- Debug. We can get a lot of debug a lot of signals quickly.
It only takes seconds for us to see the waveforms.
Palladium always works for us the first time after we compile our design
database. (No PnR required that FPGA-based emulators must have.)
Palladium is Ferrari in the industry -- with the corresponding price
point. So, we also use Protium for extra capacity (and speed) at a
much lower cost.
---- ---- ---- ---- ---- ---- ----
We use Cadence Palladium Emulators for our IP development. The latest
version we have is the Palladium Z1, with 2 billion gates of capacity.
We've had up to 35 engineers doing parallel validation at one time,
with have 50-100 people having access to it.
- Our Palladium users include: Architects, Verification
engineers, QA engineers, Application engineers (for customer
tickets), software engineers, and designers.
- We use it for: hardware verification; software verification;
HW/SW co-verification; architecture analysis; post-silicon
validation; and power verification.
To drive Palladium, we've developed our own testbench infrastructure,
both virtual and ICE-based.
My feedback on Palladium.
1. Compilation Time
The largest design we've run through Palladium was 200-300 M
gates, which took 5 hours to compile. Smaller cores, such
as 10 M gates, compile in under an hour.
Palladium's compile time is the same for incremental RTL
changes. We'd love Cadence to give us the ability to
optimize for small RTL changes. Also, 99% of the time
Palladium compiles, it runs. We expect it to work, and it's
a rare surprise when we have an issue.
2. Operating Speed
We target a 1 MHz operating speed; the maximum we've
achieved is 1.3 MHz.
We run automatic checks on the frequency; if it's under a
specified target, an engineer will step in to ensure that
our designs are running fast. We want to make good use of
the expensive equipment.
3. Palladium Debug
Palladium debug is very strong for running complicated test
vectors on a complex design.
We've been using Cadence's selective SDL feature to help our
users to always capture the right waves without having to
rerun the emulator.
- We've built our own intelligent system on top of
Cadence's stuff to trigger when the test has failed
and to capture the right failure point window.
- This combination massively increases our debug
throughput and reduces our cost.
- Between capturing the right failure point and
Palladium's "full vision" debug, we have all we need
for debug.
4. Compile time.
For a 100 M gate design, it takes about 20 min to get the raw
database onto our network, then another 20 min to generate
right waveforms from that database.
5. Dynamic low-power analysis
We run our benchmark test stimuli through our hardware to
generate power data at least weekly. We compare the details
to our power targets.
We put UPF Unified Power Format (UPF) code around our IP,
which then gets loaded into Palladium, for when we run tests
to check the power logic.
We do other things for power also, but this quick testing
is an important element dor us.
We've been using the various Palladium emulators from Cadence for at
least 15 years now. We are even still happily using the 2nd box we got.
---- ---- ---- ---- ---- ---- ----
From my day-to-day use, it's Cadence Palladium + Protium.
Here's my Palladium use before Protium.
We've used Palladium for more than a decade. We keep up on what it's
Mentor and Synopsys competitors are doing, but Palladium continues to
meet our requirements.
We use Palladium in two modes:
- In Circuit Emulation -- Palladium runs in a logic analyzer
mode, so our entire design must be synthesizable and fit in
the Palladium box.
- IXCom -- simulation acceleration mode.
We use it for:
- hardware-firmware architectural exploration
- hardware verification
- early firmware development
We have nearly a 1 billion gate capacity with Palladium. With that
the compile times we see are:
Design size Palladium total compile time
------------ -----------------------------
20 M gates 60 minutes
400 M gates 90-120 minutes
Cadence's fast compile speed, even for large designs, is partly due to
them using a parallel compile process. Additionally, we keep our
compile time fast by doing incremental builds for our bigger designs.
First we build and test the subsystems. Next we integrate them and
run the system level test.
Other details:
- Palladium's speed ranges from 600 KHz to 1.4 MHz.
- multiple users running Palladium in parallel. We've easily
had more than 10 users on a single rack.
- when it compiles, 99.9% of the time it will just work. The
0.1 percent of the time when it doesn't compile could be due
to faulty resources that we fix by running a diagnosis and
then excluding any hardware resources that are faulty.
This hassle-free easy compile gives us great peace of mind.
- We use Cadence's physical speed bridges for PCIe, ethernet,
and SATA. Their speed bridges are plug-and-play physical
boards that we connect to a Palladium T-Pod and then to the
devices. They save us a lot of time.
One of Palladium's biggest assets is debug. It's wave capture is super
easy -- we get good wave depth and can use infinite trace mode to
capture an endless wave, where the only limit is our hard drive disk
capacity.
- Getting the raw data from the emulator takes only about
1 minute for 15 million nets.
- Palladium also has a 'full vision' mode, where we can capture
any net of our design. Translating the raw data into the
Palladium Simvision wave format for viewing takes only
1-2 minutes using multiple hosts.
Palladium's interface is very close to what we use for our Xcelium
(Rocketick) simulator, so we can easily share the results with our
simulation team. (Tell Avi that I said "hi"!)
My only gripe is we'd like Cadence to close the gap between traditional
Verilog SW simulation tools (like VCS, Questa, and Xcelium) and
emulation even further, so we can to be go back and forth between
the two worlds in a seamless way.
Overall, Palladium really allows us to improve coverage and stress
the design in a realistic environment. It's very efficient.
---- ---- ---- ---- ---- ---- ----
Started a Veloce Strato vs. Palladium Z1 eval this week.
Too early to say which will win. I'll let you know later.
---- ---- ---- ---- ---- ---- ----
Related Articles
CDNS Protium crazy fast "Palladium-compiles" #1a for Best of 2019
CDNS Palladium wins back user mindshare is #1b as the Best of 2019
MENT Veloce Strato, Virtual Lab, Hycon makes #1c for Best of 2019
SNPS Zebu Intel shipments slipping 2 quarters is #1d Best of 2019
Join
Index
Next->Item
|
|