User benchmarks new CDNS Innovus vs. SNPS ICC/ICC2 workaround

( ESNUG 550 Item 1 ) -------------------------------------------- [05/01/15]

Subject: User benchmarks new CDNS Innovus vs. SNPS ICC/ICC2 workaround

> ... So now 6 months after the "Project Novus" leak, I was not surprised
> to see CDNS announce "Innovus".  It's key features:
> 
>     - ARM guys are on record saying Innovus got the best implementation
>       PPA with their Cortex-A72 -- saying but not directly saying that
>       Innovus had beat Aart's IC Compiler 2 ARM Cortex-A72 PPA results.
> 
>     - Innovus claims closing faster because Tempus/Innovus/Voltus/Quantus
>       all work on a single database -- and same single C++ codebase --
>       while DC/PrimeTime/ICC/ICC2 uses different databases and codebases.
>       
>     - Innovus claims it does 5-10M inst blocks, while ICC/ICC2 are still
>       stuck in the 1-2M inst block size limit -- which has all sorts of
>       rammifications on congestion, timing closure, partitions, etc.
> 
>     - Innovus claims multi-threading and distributed processing, which
>       means MCMM scenario acceleration.
> 
>     - Innovus 5-10X TAT vs. ICC/EDI claim backed by user benchmark data
>       was quite impressive.  It means significantly faster interations.
> 
>     - finally the biggest is Innovus gets 20% better PPA than ICC/ICC2.
> 
> Taken altogether means Innovus/Tempus does noticably faster digital STA
> and PnR with much bigger 5-10 M inst blocks that needs less iterations to
> get signoff level 20% better PPA vs. Aart's PrimeTime/ICC/ICC2 flow.
> 
> Competition is good.
> 
>     - from http://www.deepchip.com/items/0548-01.html


From: [ My Grandson Likes Dr. Seuss ]

Hi, John,

Please keep me anonymous.

With all this brouhaha about the new CDNS Innovus and SNPS IC Compiler 2,
I want to add my 2 cents having used both tools.

Networking requires us to be very efficient with runtime and capacity.  The
China market requires a very fast time to market.  Our chips are on the
larger side so for PnR; each is partitioned into many different logical
subpartitions using First Encounter.  DC-Ultra and DC-Graphical for
RTL-to-gates synth.

Our current project is to jump from TSMC 28nm straight to 16FF+.  We have
pretty good in-house expertise with ICC.  Our engineers know how to use it.
But ICC takes 6-8 days to get results.  A bit painful.  And that's at 28nm.
We were afraid for 16nm.  If something drastically didn't change, we were
worried that it would take forever to do designs.

         ----    ----    ----    ----    ----    ----   ----

FINALLY GOT ICC2 IN HOUSE

Our Synopsys support has talked to us about ICC2 for a long time without
giving access to the tool.  Finally they let us use ICC2.  We are getting
to see some results, but not at all close to the 10X range as advertised.

Also convergence of our design is taking much longer in ICC2 than we
anticipated as it is claimed to have been built on a different data model.
Scripting is different.  Some essential features are missing in ICC2 that
are required to be a full flow.  Impressions:

  - ICC2 has a new database that seems to be the heart of the change. 
  - The ICC2 placer is the same as the ICC placer.  Not sure how much
    improvement this is. 
  - ICC2 optimization engine is claimed to be new with multi-scenario. 
  - ICC2 clock building is new.  Not sure if it can match Azuro. 
  - ICC2 router is the same.  It was good enough to get the job done. 

The new ICC2 optimizer seems to be just about OK.  It has some convergence
issues.  But I think, without fundamentally changing the core technology
its design convergence is not improving.  Fast runtimes without design
convergence is not much use to us. 

End of last year, Cadence gave us updates on their new Innovus.  It seemed
to be a similar story - fast runtime, but naturally the Cadence sales guy
claimed Innovus QOR improvements were much better. 

         ----    ----    ----    ----    ----    ----   ----

IC COMPILER II VS. INNOVUS

We did an internal comparison not known to either Cadence and Synopsys - and
hence I want you to keep us anonymous please. 

Design details:

        - TSMC 16FF+
        - 1M to 3M instance block sizes. 
        - power, area, routability and runtimes are key.  Of course
          timing has to be met.  Power is the key though, given the
          thermal runaway problems we anticipate. 
        - We close timing across more than 20-30 corners. 
        - most blocks run 1Ghz to 2Ghz range of operation

We selected two of our critical PITA blocks ("Thing 1" and "Thing 2") from
different projects at 16nm FinFET.  (My grandson really likes Dr. Seuss.)

These PITA blocks are known to stress PnR for timing closure and routability
simultaneously.  The two synthesized netlists for both Thing 1 and Thing 2
were generated using DC-Ultra.  We did not use CDNS synthesis at all.

      Block        # of Instances           Timing Scenarios
      --------    ----------------         -------------------
      Thing 1          1.4 M               8 (2 setup 6 hold)
      Thing 2          1.8 M               8 (2 setup 6 hold)

What is important to us how long it takes to finish:

    - from the time tool reads in the netlist,
    - places and optimizes placement,
    - does CTS,
    - routes and optimizes route,

and finally closes.  All this with DRC closure, meets the timing spec, and
keeps power down. 
                                                        ICC2 plus ICC
                               Innovus     ICC2 alone   workaround
      Block      # of Insts    runtime     runtime      runtime
      --------   ----------   ---------    ----------   --------------
      Thing 1      1.4 M       29.5 hrs    34.6 hrs      91.4 hrs
      Thing 2      1.8 M       34.0 hrs    52.1 hrs     121.3 hrs

ICC2 alone run by itself looks comparable to Innovous, but the crap ICC2 QOR
forced us to switch to the ICC2/ICC workaround.  Those runtimes looked bad
compared to Innovus.

         ----    ----    ----    ----    ----    ----   ----

ICC2 NEEDS ICC AS A WORKAROUND

ICC2 full flow QOR was pretty bad and not comparable to their own older ICC
flow.  Our Synopsys FAEs had us use a mix of ICC2 and ICC as a makeshift
solution.  They do placement and clocks in ICC2 and then do a massive round
of optimization in ICC to finish the flow.  All routing and post-route
optimization is still done in ICC.

This ICC2+ICC workaround flow slows down runtime by 3X to 4X, but its QOR is
only acceptable then.
                                                        ICC2 plus ICC
                               Innovus     ICC2 alone   workaround
      Block      # of Insts    TNS         TNS          TNS
      --------   ----------   ---------    ----------   --------------
      Thing 1      1.4 M      -0.03 nsec   -116.5 nsec   -12.8 nsec
      Thing 2      1.8 M      -3.20 nsec   -223.0 nsec   -25.2 nsec

Innovus clearly got better TNS on the two PITA blocks.

The data for power.
                                                        ICC2 plus ICC
                               Innovus     ICC2 alone   workaround
      Block      # of Insts    power       power        power
      --------   ----------   ---------    ----------   --------------
      Thing 1      1.4 M       83.4 mW      96.8 mW       91.0 mW
      Thing 2      1.8 M      151.7 mW     177.4 mW      166.1 mW

Innovus was only ~5% better than ICC2+ICC, which was OK.  Not a wow.  We
might look at Ansys PowerArtist to see if that can be improved.

         ----    ----    ----    ----    ----    ----   ----

IMPRESSIONS OF INNOVUS

Dissecting the Innovus runs for QOR and runtimes further on one of our
older designs, we figured out that Cadence has significantly changed
several internals of the old EDI flow -- and yet the scripting and
interfaces have remained stable.  Innovus appears to have:

  - New placer called GigaPlace. (Is Marketing working overtime?)
    Seems to give better timing convergence and faster runtimes. 
  - CTS (Azuro CCOpt) previously was a separate step, but now
    seems to be natively integrated.  Their support for clock mesh
    is a key requirement for our designs and its impressive.
  - Several new power optimization tricks which seem OK.
  - We like that Innovus is tightly integrated with Tempus and
    that they have the same timing engine.  (ICC/ICC2 and PrimeTime
    have different timing engines.)  One timing engine saves a lot
    of pain.
  - We are banking that timing ECO loops will be 50% to 60% faster
    based on what we've seen from out initial runs.

Overall - Innovus is impressive with its readiness, runtimes and better
convergence.  Cadence though needs to improve their documentation, and
online support.  It was the CDNS AE support which helped us navigate
through this lack of documentation.  Without them we would've been lost.

    - [ My Grandson Likes Dr. Seuss ]

         ----    ----    ----    ----    ----    ----   ----

Related Articles:

    CDNS bigwig launches Innovus with 44 jabs at PrimeTime/ICC/ICC2
    Readers on ICC II, ATOP, CDNS EDI, upcharges, Z-Route, 24 months
    Engineering comments point to SNPS vs. CDNS PNR shakeout at Apple

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)