( ESNUG 550 Item 1 ) -------------------------------------------- [05/01/15]
Subject: User benchmarks new CDNS Innovus vs. SNPS ICC/ICC2 workaround
> ... So now 6 months after the "Project Novus" leak, I was not surprised
> to see CDNS announce "Innovus". It's key features:
>
> - ARM guys are on record saying Innovus got the best implementation
> PPA with their Cortex-A72 -- saying but not directly saying that
> Innovus had beat Aart's IC Compiler 2 ARM Cortex-A72 PPA results.
>
> - Innovus claims closing faster because Tempus/Innovus/Voltus/Quantus
> all work on a single database -- and same single C++ codebase --
> while DC/PrimeTime/ICC/ICC2 uses different databases and codebases.
>
> - Innovus claims it does 5-10M inst blocks, while ICC/ICC2 are still
> stuck in the 1-2M inst block size limit -- which has all sorts of
> rammifications on congestion, timing closure, partitions, etc.
>
> - Innovus claims multi-threading and distributed processing, which
> means MCMM scenario acceleration.
>
> - Innovus 5-10X TAT vs. ICC/EDI claim backed by user benchmark data
> was quite impressive. It means significantly faster interations.
>
> - finally the biggest is Innovus gets 20% better PPA than ICC/ICC2.
>
> Taken altogether means Innovus/Tempus does noticably faster digital STA
> and PnR with much bigger 5-10 M inst blocks that needs less iterations to
> get signoff level 20% better PPA vs. Aart's PrimeTime/ICC/ICC2 flow.
>
> Competition is good.
>
> - from http://www.deepchip.com/items/0548-01.html
From: [ My Grandson Likes Dr. Seuss ]
Hi, John,
Please keep me anonymous.
With all this brouhaha about the new CDNS Innovus and SNPS IC Compiler 2,
I want to add my 2 cents having used both tools.
Networking requires us to be very efficient with runtime and capacity. The
China market requires a very fast time to market. Our chips are on the
larger side so for PnR; each is partitioned into many different logical
subpartitions using First Encounter. DC-Ultra and DC-Graphical for
RTL-to-gates synth.
Our current project is to jump from TSMC 28nm straight to 16FF+. We have
pretty good in-house expertise with ICC. Our engineers know how to use it.
But ICC takes 6-8 days to get results. A bit painful. And that's at 28nm.
We were afraid for 16nm. If something drastically didn't change, we were
worried that it would take forever to do designs.
---- ---- ---- ---- ---- ---- ----
FINALLY GOT ICC2 IN HOUSE
Our Synopsys support has talked to us about ICC2 for a long time without
giving access to the tool. Finally they let us use ICC2. We are getting
to see some results, but not at all close to the 10X range as advertised.
Also convergence of our design is taking much longer in ICC2 than we
anticipated as it is claimed to have been built on a different data model.
Scripting is different. Some essential features are missing in ICC2 that
are required to be a full flow. Impressions:
- ICC2 has a new database that seems to be the heart of the change.
- The ICC2 placer is the same as the ICC placer. Not sure how much
improvement this is.
- ICC2 optimization engine is claimed to be new with multi-scenario.
- ICC2 clock building is new. Not sure if it can match Azuro.
- ICC2 router is the same. It was good enough to get the job done.
The new ICC2 optimizer seems to be just about OK. It has some convergence
issues. But I think, without fundamentally changing the core technology
its design convergence is not improving. Fast runtimes without design
convergence is not much use to us.
End of last year, Cadence gave us updates on their new Innovus. It seemed
to be a similar story - fast runtime, but naturally the Cadence sales guy
claimed Innovus QOR improvements were much better.
---- ---- ---- ---- ---- ---- ----
IC COMPILER II VS. INNOVUS
We did an internal comparison not known to either Cadence and Synopsys - and
hence I want you to keep us anonymous please.
Design details:
- TSMC 16FF+
- 1M to 3M instance block sizes.
- power, area, routability and runtimes are key. Of course
timing has to be met. Power is the key though, given the
thermal runaway problems we anticipate.
- We close timing across more than 20-30 corners.
- most blocks run 1Ghz to 2Ghz range of operation
We selected two of our critical PITA blocks ("Thing 1" and "Thing 2") from
different projects at 16nm FinFET. (My grandson really likes Dr. Seuss.)
These PITA blocks are known to stress PnR for timing closure and routability
simultaneously. The two synthesized netlists for both Thing 1 and Thing 2
were generated using DC-Ultra. We did not use CDNS synthesis at all.
Block # of Instances Timing Scenarios
-------- ---------------- -------------------
Thing 1 1.4 M 8 (2 setup 6 hold)
Thing 2 1.8 M 8 (2 setup 6 hold)
What is important to us how long it takes to finish:
- from the time tool reads in the netlist,
- places and optimizes placement,
- does CTS,
- routes and optimizes route,
and finally closes. All this with DRC closure, meets the timing spec, and
keeps power down.
ICC2 plus ICC
Innovus ICC2 alone workaround
Block # of Insts runtime runtime runtime
-------- ---------- --------- ---------- --------------
Thing 1 1.4 M 29.5 hrs 34.6 hrs 91.4 hrs
Thing 2 1.8 M 34.0 hrs 52.1 hrs 121.3 hrs
ICC2 alone run by itself looks comparable to Innovous, but the crap ICC2 QOR
forced us to switch to the ICC2/ICC workaround. Those runtimes looked bad
compared to Innovus.
---- ---- ---- ---- ---- ---- ----
ICC2 NEEDS ICC AS A WORKAROUND
ICC2 full flow QOR was pretty bad and not comparable to their own older ICC
flow. Our Synopsys FAEs had us use a mix of ICC2 and ICC as a makeshift
solution. They do placement and clocks in ICC2 and then do a massive round
of optimization in ICC to finish the flow. All routing and post-route
optimization is still done in ICC.
This ICC2+ICC workaround flow slows down runtime by 3X to 4X, but its QOR is
only acceptable then.
ICC2 plus ICC
Innovus ICC2 alone workaround
Block # of Insts TNS TNS TNS
-------- ---------- --------- ---------- --------------
Thing 1 1.4 M -0.03 nsec -116.5 nsec -12.8 nsec
Thing 2 1.8 M -3.20 nsec -223.0 nsec -25.2 nsec
Innovus clearly got better TNS on the two PITA blocks.
The data for power.
ICC2 plus ICC
Innovus ICC2 alone workaround
Block # of Insts power power power
-------- ---------- --------- ---------- --------------
Thing 1 1.4 M 83.4 mW 96.8 mW 91.0 mW
Thing 2 1.8 M 151.7 mW 177.4 mW 166.1 mW
Innovus was only ~5% better than ICC2+ICC, which was OK. Not a wow. We
might look at Ansys PowerArtist to see if that can be improved.
---- ---- ---- ---- ---- ---- ----
IMPRESSIONS OF INNOVUS
Dissecting the Innovus runs for QOR and runtimes further on one of our
older designs, we figured out that Cadence has significantly changed
several internals of the old EDI flow -- and yet the scripting and
interfaces have remained stable. Innovus appears to have:
- New placer called GigaPlace. (Is Marketing working overtime?)
Seems to give better timing convergence and faster runtimes.
- CTS (Azuro CCOpt) previously was a separate step, but now
seems to be natively integrated. Their support for clock mesh
is a key requirement for our designs and its impressive.
- Several new power optimization tricks which seem OK.
- We like that Innovus is tightly integrated with Tempus and
that they have the same timing engine. (ICC/ICC2 and PrimeTime
have different timing engines.) One timing engine saves a lot
of pain.
- We are banking that timing ECO loops will be 50% to 60% faster
based on what we've seen from out initial runs.
Overall - Innovus is impressive with its readiness, runtimes and better
convergence. Cadence though needs to improve their documentation, and
online support. It was the CDNS AE support which helped us navigate
through this lack of documentation. Without them we would've been lost.
- [ My Grandson Likes Dr. Seuss ]
---- ---- ---- ---- ---- ---- ----
Related Articles:
CDNS bigwig launches Innovus with 44 jabs at PrimeTime/ICC/ICC2
Readers on ICC II, ATOP, CDNS EDI, upcharges, Z-Route, 24 months
Engineering comments point to SNPS vs. CDNS PNR shakeout at Apple
Join
Index
Next->Item
|
|