( ESNUG 485 Item 2 ) -------------------------------------------- [06/08/10]
Subject: (ESNUG 473 #1) A follow up eval of Atoptech's Apogee floorplanner
> Recently my group migrated our P&R flow to Atoptech. To give you a little
> background, our team has been working in 65 nm for about 2 years. We have
> a fairly wide range of chips and IP ranging from high speed (500+ Mhz)
> semi-custom DSPs and CPUs to SOCs with a wide range of IP from small, high
> speed blocks (100k inst, 800+ Mhz) to larger, slower speed (700k instance,
> 200+ Mhz) cores. Our P&R flow is netlist-to-GDS, and our focus was in
> this area rather than a full suite from RTL-to-final-verification.
>
> - [ Iron Man ]
From: [ Iron Man ]
Hi John,
I'm now writing to share our team's experience using Atoptech's Apogee top
level floorplanning and assembly environment. We've been using Atoptech's
block PNR tool for several years and we felt the move to Apogee was a
natural transition as it would allow us to keep all of our physical design
in a single database/toolset. We started our migration roughly a year ago,
and I'm summarizing our experience to date.
Our legacy chip assembly and floorplanning environment is a mix of home
brewed scripts and partnerships with a vendor to get native features. Our
belief is the vendor should provide any features which are general purpose
in nature (aka useable by other customers), and provide us an API to code
features which are specific to our needs. In the past, our top level tool
usage has focused primarily on basic floorplanning and assembly features,
with an emphasis on K.I.S.S. principles. Our hope in transitioning to
Apogee was to get beyond baseline functionality into a more automated chip
level environment.
TWO CHIPS:
After our initial tool Q/A of Atoptech, we were ready for a production field
trial. In our first two Apogee chips we focused on bringing up baseline
functionality similar to what we had in our past environment and focusing on
out of the box QOR. Our first two chips we did were:
SOC#1 - 65 nm, 5.1 M std-instances; 814 macros; 28 PnR blocks
(excludes hard-macros, IP); 56K+ block pins
SOC#2 - 65 nm, 4.8 M std-instances; 910 macros; 30 PnR blocks
(excludes hard-macros, IP); 53K+ block pins
Both chips taped out on time using only Apogee for floorplanning & assembly
and have working silicon.
In regards to runtime and capacity, the following stats reflect SOC#1:
- 25 minutes/13.0 G to (detail) merge all blocks into the chip level
database.
- an additional 12 minutes/4.0 G to read in SDC, extract RC, and get
the full chip timing updated.
- top level buffering (for design rule violations) based on groute
took roughly one hour.
- final top level routing and cleanup took 90 minutes.
Our general experience with Apogee from these two chips:
1. Apogee has a very unique hierarchical database which is accessible
via TCL. Traditionally floorplanners model the data either in a
psuedo flat fashion (where hierarchy is somewhat available) or where
hierarchy is abstracted out and you cannot access level N+1. In
Atoptech's database, all levels of hierarchy are available and
you can access data in a very natural form and with filters to
restrict or allow hierarchical traversing. You can access timing,
logic, and physical data throughout the entire hierarchy of your
design, with read/write access.
There are no inherent two-level hierarchy restrictions or flattening
involved.
Since hierarchy is maintained, it allows the tool to automatically
abstract timing information if necessary, rather than forcing the user
to decide whether they wish to see everything (bad runtime) or the top
only (incomplete data). Furthermore, you can easily push down/pull up
data across the hierarchy and also selectively flatten blocks.
2. At first you couldn't manipulate your hierarchy with Apogee if your
physical and logical partitioning were different. This was fixed in
April '10 but we haven't had time to test this.
3. Apogee's padring router can route complex multi-domain padrings with no
manual intervention and produce no DRC or LVS errors. It could place
and route both single and dual ring structures as well as flipchip and
wirebond structures. (Most tools we've seen still have the model that
a padring is a single ring of homogenous pads... something we haven't
had for a long time.) Atoptech's padring constraints format allows us
to accurately describe how our padrings look.
4. Apogee's GUI let us manipulate complex floorplans including rectilinear
block shapes, macro block and pad placement, custom (analog) net
routing, etc. It scrolls and views with hierarchy fairly efficiently.
Base layers are viewable by merging in the GDS views... but it would be
nice though if they could have 2 separate palates for layers... one for
routing layers and one for base layers. Most times the user is only
concerned about routing layers, and its cumbersome to have to view all
layers on a single palate (although it seems most tools go this way.)
5. Apogee generates OK pin assignments out of the box. Some hand tweaking
is still currently required.
6. Apogee let's you deal with top level clock/signal buffering, and its
QOR is good. It reduced our overall clock insertion delay and OCV
impact on timing.
7. Top level routing works. We have hundreds of long clock nets at top
level and Apogee was able to route them all with extremely low jitter
in a very tight channel automatically. (In our previous tool, we had
to either manually route the clock nets or go through quite a few
iterations to fix clock jitter violations -- despite setting nondefault
routing rules -- for clock balancing. This affected a lot the full
chip timing closure cycle.) In Apogee, this was pretty much done in
one shot, no iterations. Results are also predictable (i.e. they do
not vary from run time to run time, which is very important for clock
balancing). We ran into one issue regarding proper extraction of non-
preferred routing, but I believe this is resolved now.
8. Top level LVS and DRC correlated very well with Calibre, and the top
level router produced few DRVs that needed hand cleanup.
9. Our chips required a lot of custom routing at the top level. Custom
(param) route features (PG route, padring route, differential route,
etc.) were pretty good overall. Some enhancements are needed. The
tool still in some cases produces DRC violations on custom routed nets;
moreso than detail routed nets.
10. Hierarchical floorplan interaction (passing floorplan constraints
between levels) was smoothly handled in Apogee's system, with Atoptech
tailoring handoff user-options to our needs.
The negative experiences (cons) during our first two chips:
1. Interactive routing editing needed improvement. Had issues custom wire
editing; it could not drop vias properly and it would sometimes create
DRC violations. Still some issues on this, but they're minor. They
tweaked it so our interactive cleanup experience was easier on the 2nd
chip tapeout, but we look forward to a more gui-friendly automation.
2. Database reference library management was not as flexible as we'd
prefer. Loading a full hierarchial database, for example reading a
Verilog netlist for a block, with all of its .libs and LEFs and then
doing an incremental update of when they all changed is a mess. You
have to track each .lib and LEF individually -- same as the other
P&R tools. We're hoping they'll fix this.
Some nice timing-related features:
- Native SDC 1.7 support
- Multi-threaded RC extraction, tight correlation to StarRCXT
at both 65 nm and 40 nm nodes
- Tight timing/si correlation to our signoff STA tool. We have
tested correlation with PT-SI, PT/CeltIC and Goldtime flows
both at 65 nm and 40 nm.
- CCS (noise + delay) models as well as OCV and AOCV timing
- MCMM timing analysis, either multi-threaded or distributed
- Database natively supports hierarchical design
- PnR block databases can be merged into the chip level
database very quickly
- User chooses merging the sub PnR block database at chip level
hard_hier mode: sub PnR block is in read-only mode
flat mode: sub PnR block is in read/write mode.
This means tool can do full chip timing optimization.
- Tool will use physical abstract model from each sub PnR block
database for top level PnR.
- Tool won't need to re-extract the RC for sub PnR blocks.
- Tool can automatically identify the interface logic in each
sub PnR block during the full chip timing update and thus timing
update is very quick.
TWO MORE CHIPS:
Our second round of Apogee chips focused on transition to 40 nm and code
stability. From a 40 nm standpoint, the main points we focused on:
- Correlation to signoff timing and extraction (Goldtime/STAR-RCXT)
- Low power functionality including voltage islands and power down.
- 40 nm DRC correlation with Calibre (and more importantly the
ability to not create DRCs in 40 nm.)
Correlation to signoff at 40 nm was excellent and both chips successfully
taped out. We additionally worked with Atoptech on developing native RDL
(signal + PG) routing abilities rather than using home brew scripts. Our
initial goal was to have these two testchips use Apogee's RDL router,
however the solution was not delivered in time for these two chips, so we
will hope to deploy them on the next round. We do have working beta code
we are working with in-house, so they are fairly close.
AND EVEN TWO MORE CHIPS:
Our third round of Apogee chips focused more on cleaning up the issues we
had in our first round. Again, 2 larger 65 nm SOCs were used. There's
nothing really earthshaking to report here, as a lot of this focus was to
continue to improve overall QOR and baseline features. A lot of the effort
was removing hacks or workarounds we had in our first round of tapeouts...
to transition to a flow which is production worthy and can be leveraged
across all of our chips.
We are now done with what we consider Phase 1 of our deployment of Apogee.
We have proven the ability to get baseline functionality of the tool up,
with good out of the box QOR at both 65 nm and 40 nm across a variety of
chips. Apogee will now be the default tool for floorplanning and chip
assembly in our business unit at 40 nm and we have phased out support for
our legacy tool (although other groups in our company are still using it.)
We are now in Phase 2 of our deployment plans. This includes supporting
widescale deployment within our group, and from an R&D standpoint to focus
on advanced Apogee features:
- Feedthru (over the block routing) - Our designs use a semi-abutted
methodology using some homebrew scripts. We have been working with
Atoptech to use native physical driven feedthru insertion. While
we chose not to use these on our initial chips, another group
internally has been working with Atoptech on this functionality and
show that this function is production worthy, providing solid QOR for
standard scenarios as well as our corner case scenarios. We plan to
leverage their results and will be migrating to Apogee's native
solution for this function going forward.
- Timing Budgeting - In our methodology, our IP providers generate the
constraints for their blocks, and work with the SOC team to derive
the chip-level constraints. This leaves a gap on IO timing budgeting.
Our first approach towards chip timing closure involves coming up with
a methodology to propagate IO constraints to the PNR blocks. In the
past we have used some homebrew scripts to derive these constraints
(with mixed results). We have been Q/A'ing Apogee's native timing
budgeting features and the results, while not production worthy for
our needs yet, look promising, and we plan to deploy them on a trial
chip in the next few months.
- Timing Closure - even with good timing budgets, some timing issues will
still exist at the top level. Apogee has a very neat hierarchical
database which allows for a very naturally hierarchical full chip
timing closure flow. Their tool can automatically extract interface
logic models on the fly to reduce runtime and avoiding costly
incremental updates. We have already correlated their full chip timing
to our signoff full chip STA and we are approaching the accuracy we
feel necessary to deploy this feature in production (this includes
abstract generation, timing graph and exception interpretation,
extraction correlation, timing correlation using CCS noise/delay and
AOCV in 40 nm). We plan to deploy this feature on a tapeout happening
in Q3 of this year (with initial work starting in late Q1).
- RDL routing - We've been working with Atoptech's R&D to provide an
optimized RDL signal router to replace our in-house solution. Although
this Apogee solution wasn't utilized in the last round of tapeouts, we
have been provided code and continue to work with them to handle all of
our corner cases. We feel it should be production worthy in our next
round of tapeouts.
- Top-level CTS - This is still a work in progress but something we hope
they deploy for our Q3 tapeouts.
- Hierarchy manipulation - Our chips are designed bottom up base on the
IP we are provided. As our chips continue to grow in size, we need
the ability to implement blocks that combine numerous IP blocks, rather
than requiring each IP is its own physical partition. We have worked
with Atoptech to specify the baseline functionality needed for this
functionality (from a floorplanning, partitioning and constraint
generation standpoint) and are in the process of testing this out with
a hope to deploy it on a trial basis on one of our next round of chips.
So in total, our business unit has done 10 tapeouts using Apogee, both at
the 40 nm and 65 nm nodes, with working silicon back. We've completed our
initial transition to Apogee and it's our default floorplanning/assembly
environment for 40 nm and below.
- [ Iron Man ]
Join
Index
Next->Item
|
|