( ESNUG 568 Item 4 ) -------------------------------------------- [03/21/17]
Subject: SCOOP -- Spies report that Aart to launch DC2 at SNUG'17 tomorrow
SCOOP II: Multiple spies report that on Monday, Aart de Geus is
going to announce "Project Newton" IC Compiler II (ICC II) in
his upcoming keynote at SNUG'14 in Santa Clara.
From what I've heard, "Project Newton" was a 5 year undertaking
involving 80 SNPS R&D as a re-engineering of ICC for problems
unique to sub 20 nm P&R -- but it ran into organizational issues
I can't get a good fix on. Rumor was ICC II was to be launched
at DAC'13 in Austin, but it wasn't ready then. Apparently it is
now. From what I've heard, ICC II has a:
- new placer. Runs very fast as initial placement. Runs of
2 or 3 hours now take 1 hour. Initial placement only takes
10% of overall P&R, so not much gain here. Amdahl's law.
- new data model. Old ICC had two separate internal data
models pre-CTS and post-CTS. Basically PhyOpt plus Astro
inside. New ICC II has one data model that's a layer that
unifies the two data models -- means a bit less memory use,
and a minor speed-up because you don't duplicate data.
- new MCMM timer. The old ICC timer was based on Astro.
The new ICC II timer now has MCMM awareness added to it.
Not used too much because the memory footprint is too
restrictive. Instead, most work is done at ECO time for
everybody: ICC, ICC II, EDI, Atoptech, and Olympus; so
tweaks here minorly increase throughput.
- new internal optimizer. Does buffering, resizing, moving
objects, replacing instances, etc. It's rewritten from the
ground up to be multi-CPU (unsure if it's multi-threaded,
distributed, or both). The reports are it's REALLY fast
vs. old ICC, but only up to CTS. Post-CTS it's still slow.
Before ICC could do 1 or 2 scenarios in short time. If
4 scenarios ICC became dog slow. Now ICC II does 4 or 5
scenarios within a reasonable time. Again the MCMM heavy
lifting is done by external ECO tools for all PnR tools.
- old CTS. SNPS R&D is working on a 2nd gen CTS, but it's
not ready yet to be in ICC II. Maybe in 12 to 24 months???
- old Z-Route. Still the same and will be kept, but
doesn't Z-Route use an old/different routing model??? It
still starts over with a new route instead of what was
assumed in the pre-optimization. Convergence still an issue.
If this ICC II rumor is true, what's interesting is it creates a
mass call for rebenchmarking of all P&R tools. That is, if your
engineers have to spend time requalifying ICC II, why not also
benchmark current revs of Atoptech, CDNS EDI, & MENT Olympus-SoC,
too, since your people will all be in benchmarking mode anyway?
- http://www.deepchip.com/items/0537-10.html
From: [ John Cooley of DeepChip.com ]
Three years ago, I'm proud to say that I accurately scooped Aart's launch
of IC Compiler II (ICC2) before SNUG'14.
And in keeping with tradition, I'm (hopefully) scooping the news of Aart's
launch of Design Compiler II (DC2) in his SNUG'17 conference tomorrow. :)
Here's what my spies tell me:
At SNUG next Wednesday Synopsys is expected to launch a new synthesis tool
replacing Design Compiler.
The internal SNPS product code name is "Descartes" (after that French guy
who said: "I think, therefore I am"). It may not have an official name yet,
but knowing the depths of Synopsys Corporate Marketing's creativity it is
expected to be called "DC2". It is a 3-year engineering effort to increase
the runtime speed and capacity of DC in the ICC2 database.
My spies claim that DC2 has been in "taxicab mode" for the last few months
at Nvidia, HiSilicon, and Juniper with dedicated Synopsys AEs as the taxi
drivers of the new tool working with these customers to do testing.
---- ---- ---- ---- ---- ---- ----
DC2 SPEED: word is for under 3 million inst blocks, the new DC2 is showing
a 2X to 3X speedup over the old standard DC Graphical/Ultra - making the
new DC2 roughly matching CDNS Genus RTL synthesis and at 1/3rd of the MENT
Oasys speeds.
I've heard at sizes over 3 million instances, Genus ballparks 50% faster
than DC2. For Oasys, a 3 million instance design takes ~14 hours to get
to placed gates. The same in DC-Graphical takes ~3.5 days. That makes
Oasys 6X faster vs. the old DC-Graphical; thus Oasys is 3X faster than
Aart's new DC2.
---- ---- ---- ---- ---- ---- ----
DC2 CAPACITY: For capacity, Oasys still rules the roost. ImgTec reported
at the recent DAC'16 that they did a 30 million instance GPU flat. But
that was at 28nm planar.
On 10/7nm FinFET designs, olde DC struggles on any blocks sized over 3 M
instances because you need a colored flow down there. Both Oasys and
Genus do "color-aware modeling", but you don't have to put colored wires
in synthesis like you have to do in PnR. One of my spies says that "if
DC2 is truly in the ICC2 data model, it'll give them roughly a 2X block
capacity boost, with blocks up to 5 million instances."
Not bad. But not great.
---- ---- ---- ---- ---- ---- ----
Aart's other problem I'm hearing of is that his Project Descartes isn't
getting passable PPA -- the performance, power, and area of its output is
not good enough for widespread use. That is, Aart's launching DC2 a
little early anyway just to rain on Anirudh's Innovus success; not just
to repair database mismatch problems between his old DC and newer ICC2.
Here's what I've heard is in the new Design Compiler II (DC2).
- DC2 has a new data model. This seems to be the key its 2x-3X faster
runtimes vs. DC-Ultra/Graphical. The new DC2 uses the ICC2 data model
as its underlying framework. It's not the old PhyOpt data model nor
Milkyway. It's a new data model built off of the ICC2 PnR environment.
- DC2 has a new physical placer. DC2 has to have a new placement scheme
because it's using the new ICC2 data model. And no, it's not just the
ICC2 placer stitched in because RTL synthesis has to be much faster.
(This new DC2 placer trades accuracy for speed.)
- DC2 gate-level physical optimization should be good. I admit that I
don't have the G2 on this, but old DC has a fantastic record with doing
stuff like buffering, sizing, pin-swapping, etc. Genus and Oasys are
playing catch up here because Aart's had 25 years to perfect it.
- DC2 doesn't do color flows (yet): from what I've heard DC2 routing
topology models and placement doesn't have color awareness - which is
"nice" at 16/14nm - and then becomes REQUIRED at 10nm on down.
I wrote about this color problem in ESNUG 552-06. What's I've heard is
that Aart's R&D are working on "color-aware modeling" just like what
Genus and Oasys have, but it's nowhere near ready for production yet.
- DC2 uses new Tcl scripts. This goes hand-in-hand with the new ICC2
data model, which is organized in a very different way from the old
DC. Changing the data models is more than just changing individual
commands -- it makes it impossible to automate converting those
millions of lines of old DC Tcl scripts into new DC2 Tcl scripts.
Why? The old DC data model and new DC2 data model are 3rd cousins
from each other - they're related, but very different from each
other. This means the old TCL procedures which you had doing attribute
queries on your design (ex: cell type, net traces, timing arcs, etc.)
will need to be re-written for DC2 by the users. A line-by-line
DC-to-DC2 command translator will get you 90% there, but you will
need human intervention to clean up your final DC2 scripts.
- DC2 uses old DC elaboration and mapping. Spies say that Synopsys R&D
is mostly reusing the original front-end code from DC. The only
changes are elaborating and mapping to the new DC2 data model. That
is, DC2 is not bringing physical optimization at the RTL level;
unlike what Genus and Oasys claim to be doing.
- DC2 keeps the original DC flow. DC2 will follow the old original
DC Topo/Graphical flow -- start with logic-only synthesis, followed
by a placement/optimization pass. (Again, none of this silly silly
"physical optimization at the early RTL stage" that Anirudh & Shankar
love to wax philosophical about...)
- DC2 has no distributed processing. DC2 is only doing multi-threaded
runs on single machines right now. In practical terms that means it
maxes at 16 threads, because boxes with more 16 CPUs start getting
very pricey. In contrast, both the Oasys and Genus guys claim they
have distributed processing where your design is auto partitioned
and fed to multiple boxes - with numbers like 32/64/128 CPUs chowing
your source RTL into roughly placed gates. (For both Genus & Oasys,
I hear beyond 64 CPUs is the diminishing returns point for now.)
So in a nutshell, this new DC2 launch is a Windows 10 maneuver because
Microsoft has to compete against Apple -- or die in the OS business.
Or in this case, Aart has to catch up with Anirudh -- or die in the
synthesis/PnR business. Have a fun SNUG'17 everyone! :)
- John Cooley
DeepChip.com Holliston, MA
P.S. Keep in mind, this is just what I'm hearing. I could be wrong. :(
---- ---- ---- ---- ---- ---- ----
Related Articles
Anirudh and Sawicki on IC Compiler II, Innovus, Nitro-SoC, Antun
The untold parts of that IMEC "world's first 5nm tapeout" story
ICC2 patch rev, Innovus penetration, and the 10nm layout problem
Join
Index
Next->Item
|
|