Aart to launch DC2 at SNUG tomorrow

( ESNUG 568 Item 4 ) -------------------------------------------- [03/21/17]

Subject: SCOOP -- Spies report that Aart to launch DC2 at SNUG'17 tomorrow

SCOOP II: Multiple spies report that on Monday, Aart de Geus is going to announce "Project Newton" IC Compiler II (ICC II) in his upcoming keynote at SNUG'14 in Santa Clara.

From what I've heard, "Project Newton" was a 5 year undertaking involving 80 SNPS R&D as a re-engineering of ICC for problems unique to sub 20 nm P&R -- but it ran into organizational issues I can't get a good fix on. Rumor was ICC II was to be launched at DAC'13 in Austin, but it wasn't ready then. Apparently it is now. From what I've heard, ICC II has a:

new placer. Runs very fast as initial placement. Runs of 2 or 3 hours now take 1 hour. Initial placement only takes 10% of overall P&R, so not much gain here. Amdahl's law.

new data model. Old ICC had two separate internal data models pre-CTS and post-CTS. Basically PhyOpt plus Astro inside. New ICC II has one data model that's a layer that unifies the two data models -- means a bit less memory use, and a minor speed-up because you don't duplicate data.

new MCMM timer. The old ICC timer was based on Astro. The new ICC II timer now has MCMM awareness added to it. Not used too much because the memory footprint is too restrictive. Instead, most work is done at ECO time for everybody: ICC, ICC II, EDI, Atoptech, and Olympus; so tweaks here minorly increase throughput.

new internal optimizer. Does buffering, resizing, moving objects, replacing instances, etc. It's rewritten from the ground up to be multi-CPU (unsure if it's multi-threaded, distributed, or both). The reports are it's REALLY fast vs. old ICC, but only up to CTS. Post-CTS it's still slow. Before ICC could do 1 or 2 scenarios in short time. If 4 scenarios ICC became dog slow. Now ICC II does 4 or 5 scenarios within a reasonable time. Again the MCMM heavy lifting is done by external ECO tools for all PnR tools.

old CTS. SNPS R&D is working on a 2nd gen CTS, but it's not ready yet to be in ICC II. Maybe in 12 to 24 months???

old Z-Route. Still the same and will be kept, but doesn't Z-Route use an old/different routing model??? It still starts over with a new route instead of what was assumed in the pre-optimization. Convergence still an issue.

If this ICC II rumor is true, what's interesting is it creates a mass call for rebenchmarking of all P&R tools. That is, if your engineers have to spend time requalifying ICC II, why not also benchmark current revs of Atoptech, CDNS EDI, & MENT Olympus-SoC, too, since your people will all be in benchmarking mode anyway?

    - http://www.deepchip.com/items/0537-10.html

From: [ John Cooley of DeepChip.com ]

Three years ago, I'm proud to say that I accurately scooped Aart's launch
of IC Compiler II (ICC2) before SNUG'14.

And in keeping with tradition, I'm (hopefully) scooping the news of Aart's
launch of Design Compiler II (DC2) in his SNUG'17 conference tomorrow.  :)

Here's what my spies tell me:

At SNUG next Wednesday Synopsys is expected to launch a new synthesis tool
replacing Design Compiler.

The internal SNPS product code name is "Descartes" (after that French guy
who said: "I think, therefore I am").  It may not have an official name yet,
but knowing the depths of Synopsys Corporate Marketing's creativity it is
expected to be called "DC2".  It is a 3-year engineering effort to increase
the runtime speed and capacity of DC in the ICC2 database.

My spies claim that DC2 has been in "taxicab mode" for the last few months
at Nvidia, HiSilicon, and Juniper with dedicated Synopsys AEs as the taxi
drivers of the new tool working with these customers to do testing.  

        ----    ----    ----    ----    ----    ----    ----

DC2 SPEED: word is for under 3 million inst blocks, the new DC2 is showing
a 2X to 3X speedup over the old standard DC Graphical/Ultra - making the
new DC2 roughly matching CDNS Genus RTL synthesis and at 1/3rd of the MENT
Oasys speeds.

I've heard at sizes over 3 million instances, Genus ballparks 50% faster
than DC2.  For Oasys, a 3 million instance design takes ~14 hours to get
to placed gates.  The same in DC-Graphical takes ~3.5 days.  That makes
Oasys 6X faster vs. the old DC-Graphical; thus Oasys is 3X faster than
Aart's new DC2.

        ----    ----    ----    ----    ----    ----    ----

DC2 CAPACITY: For capacity, Oasys still rules the roost.  ImgTec reported
at the recent DAC'16 that they did a 30 million instance GPU flat.  But
that was at 28nm planar.

On 10/7nm FinFET designs, olde DC struggles on any blocks sized over 3 M
instances because you need a colored flow down there.  Both Oasys and
Genus do "color-aware modeling", but you don't have to put colored wires
in synthesis like you have to do in PnR.  One of my spies says that "if
DC2 is truly in the ICC2 data model, it'll give them roughly a 2X block
capacity boost, with blocks up to 5 million instances."

Not bad.  But not great.

        ----    ----    ----    ----    ----    ----    ----

Aart's other problem I'm hearing of is that his Project Descartes isn't
getting passable PPA -- the performance, power, and area of its output is

not good enough for widespread use.  That is, Aart's launching DC2 a
little early anyway just to rain on Anirudh's Innovus success; not just
to repair database mismatch problems between his old DC and newer ICC2.

Here's what I've heard is in the new Design Compiler II (DC2).

  - DC2 has a new data model.  This seems to be the key its 2x-3X faster
    runtimes vs. DC-Ultra/Graphical.  The new DC2 uses the ICC2 data model
    as its underlying framework.  It's not the old PhyOpt data model nor
    Milkyway.  It's a new data model built off of the ICC2 PnR environment.

  - DC2 has a new physical placer.  DC2 has to have a new placement scheme
    because it's using the new ICC2 data model.  And no, it's not just the
    ICC2 placer stitched in because RTL synthesis has to be much faster.
    (This new DC2 placer trades accuracy for speed.)

  - DC2 gate-level physical optimization should be good.  I admit that I
    don't have the G2 on this, but old DC has a fantastic record with doing
    stuff like buffering, sizing, pin-swapping, etc.  Genus and Oasys are
    playing catch up here because Aart's had 25 years to perfect it.

  - DC2 doesn't do color flows (yet): from what I've heard DC2 routing
    topology models and placement doesn't have color awareness - which is
    "nice" at 16/14nm - and then becomes REQUIRED at 10nm on down.

    I wrote about this color problem in ESNUG 552-06.  What's I've heard is
    that Aart's R&D are working on "color-aware modeling" just like what
    Genus and Oasys have, but it's nowhere near ready for production yet.

  - DC2 uses new Tcl scripts.  This goes hand-in-hand with the new ICC2
    data model, which is organized in a very different way from the old
    DC.  Changing the data models is more than just changing individual
    commands -- it makes it impossible to automate converting those
    millions of lines of old DC Tcl scripts into new DC2 Tcl scripts.
    Why?  The old DC data model and new DC2 data model are 3rd cousins
    from each other - they're related, but very different from each
    other.  This means the old TCL procedures which you had doing attribute
    queries on your design (ex: cell type, net traces, timing arcs, etc.)
    will need to be re-written for DC2 by the users.  A line-by-line
    DC-to-DC2 command translator will get you 90% there, but you will
    need human intervention to clean up your final DC2 scripts.

  - DC2 uses old DC elaboration and mapping.  Spies say that Synopsys R&D
    is mostly reusing the original front-end code from DC.  The only
    changes are elaborating and mapping to the new DC2 data model.  That
    is, DC2 is not bringing physical optimization at the RTL level;
    unlike what Genus and Oasys claim to be doing.

  - DC2 keeps the original DC flow.  DC2 will follow the old original
    DC Topo/Graphical flow -- start with logic-only synthesis, followed
    by a placement/optimization pass.   (Again, none of this silly silly
    "physical optimization at the early RTL stage" that Anirudh & Shankar
    love to wax philosophical about...)

  - DC2 has no distributed processing.  DC2 is only doing multi-threaded
    runs on single machines right now.  In practical terms that means it
    maxes at 16 threads, because boxes with more 16 CPUs start getting
    very pricey.  In contrast, both the Oasys and Genus guys claim they
    have distributed processing where your design is auto partitioned
    and fed to multiple boxes - with numbers like 32/64/128 CPUs chowing
    your source RTL into roughly placed gates.  (For both Genus & Oasys,
    I hear beyond 64 CPUs is the diminishing returns point for now.)

So in a nutshell, this new DC2 launch is a Windows 10 maneuver because
Microsoft has to compete against Apple -- or die in the OS business.

Or in this case, Aart has to catch up with Anirudh -- or die in the
synthesis/PnR business.  Have a fun SNUG'17 everyone!  :)

    - John Cooley
      DeepChip.com                               Holliston, MA

P.S. Keep in mind, this is just what I'm hearing.  I could be wrong.  :(

         ----    ----    ----    ----    ----    ----    ----

Related Articles

    Anirudh and Sawicki on IC Compiler II, Innovus, Nitro-SoC, Antun
    The untold parts of that IMEC "world's first 5nm tapeout" story
    ICC2 patch rev, Innovus penetration, and the 10nm layout problem

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2025 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)