( ESNUG 486 Item 4 ) -------------------------------------------- [10/08/10]
From: Thierry Sejourne <thierry.sejourne=user domain=st got calm>
Subject: A first look at Atrenta SpyGlass Physical for pre-RTL/RTL design
Hi, John,
I know you like hearing from users about new tools, so I thought I'd
write about Atrenta's new SpyGlass Physical which was announced this June.
We've used it for the past 18 months to help us define the best physical
architecture for our 32 nm, 15 M instance design (in both the pre-RTL and
RTL stages) that we're currently taping out. This design has an on-chip
bus that connects all of its IP blocks.
It's tough to define a physical architecture at the pre-RTL/RTL stage
because we don't have any real physical information then. Yet different
architectural implementations have a major impact on our design specs;
and some architectures are simply not suitable for P&R. This is
particularly true for our on-chip bus design.
We used Spyglass Physical to define the best physical architecture for our
design at RTL or Pre-RTL instead of at the netlist level. By using it at
these stages we were able to avoid costly respins at the end of our design
process (RTL and later). SpyGlass Physical effectively gives us an early
top-level floorplan at the RTL level that includes big block assignment and
pad location. It can provide preliminary design results based on SOC
abstraction (blackbox based, with many ways to defined timing of the
blackbox). Spyglass Physical helped us optimize how we:
- group all our IP blocks into defined Physical Units.
(e.g. 4 or 8 physical units.)
- define our on-chip bus physical architecture to allow
the split into the defined Physical Units.
SPYGLASS PHYSICAL'S KEY FUNCTIONALITY:
1. Predict performance using a higher level of abstraction.
- We can use incomplete RTL for Spyglass as long as our individual
modules can link.
- Timing constraints can be restrained to only places we want to
verify. For example, SpyGlass Physical didn't go inside our
design's black boxes, which were represented with physical
envelopes and timing models above the wire level.
- We had the option to expand our RTL netlist, for example for the
on-chip bus.
2. A "physical toolbox" which is a graphical environment that can
generate tables, bar charts, graphs and text reports for making
implementation trade-offs:
- Clock browser tool for exploring clock distribution (driver pins,
slack observed, period, latency, and the cluster that is fed by
the clock domain).
- Physical Unit browser (can list instance count, the number of
hard macros, the number of IOs of each Physical Unit and the way
they are connected to other Physical Units).
3. We used its physical toolbox to analyze our floorplan partitioning
options based on our own metrics:
- Timing; in our case, high-speed clocks were the dominant drivers
- Congestion; managing the number of nets crossing our 'Physical
Units' (grouping of IP) to be able to reduce congestion later
during the place and route stage
- Area (linked to instance count)
- Power (you can localize power domains, but this wasn't a factor
for our set-top box)
INPUTS AND OUTPUTS
Inputs: SpyGlass Physical can work with incomplete RTL or constraints
(good black-box handling, deals with incomplete SDC and so on).
Outputs: SpyGlass Physical reports congestion, timing, grouping between the
blocks. Since this was our first use of the tool, we used the text reports
as input and still manually replicate the grouping in Cadence environment.
TOOL RUNTIMES
We used Synopsys DC/DC-Topo for synthesis. We used Cadence SOC Encounter or
Mentor Olympus for place and route, as well as Synopsys IC Compiler for some
macros. The maximum size of our defined Physical Units (block groupings)
was at about 2 M instances to allow for reasonable P&R runtimes.
The benchmark data below is based on the following design:
Size: 7 M placed instances, 209 M Transistors, 500 signal pads
Clocks: 230 total clocks (80 created, 150 generated)
Blocks: 53 RTL IPs, 40 RTL blocks/glues and 160 Hard IPs,
8 different IO libraries, 477 memory cuts
Our Spyglass Physical runtimes were as follows:
- Data prep (tech pre-compilation done once): 4 hours
- RTL floorplanning & prototyping: 4 hours per run
- SOC Encounter netlist floorplanning & prototyping: 8 hours per run
- SOC Encounter or Olympus P&R to timing: 1 week per Physical Unit
DESIGN FLOW AND PROJECT TIME COMPARISONS WITH AND WITHOUT SPYGLASS PHYSICAL
Methodology A: (no SpyGlass Physical)
No prototyping/partitioning: 37 weeks
1. RTL Design
2. Synthesize design, analyze results and iterate at RTL level
3. Run place and route, analyze and iterate multiple times
(potential RTL changes discovered at the end of the P&R).
May never converge if physical architecture is wrong.
Methodology B: (no SpyGlass Physical)
Prototyping/partitioning at gate-level only: 28 weeks
1. RTL design
2. Synthesize design analyze results and iterate at RTL level
3. Use SOC Encounter for gate level floorplanning and partitioning
choices. Analyze the results, modify the design and iterate
(potential RTL changes).
4. Implement the design using 1 set of place and route runs (though
it can take a second set).
Methodology C: (using Spy Glass Physical)
Prototyping/partitioning at pre-RTL/RTL level: 22 weeks
1. RTL design
2. Optimize physical architecture with SpyGlass Physical at pre-RTL
through full RTL level. Stabilize the architecture, e.g. split
the on-chip bus and define Physical Units (groups of blocks).
3. Run SOC Encounter floorplanning. Analyze the results and refine
the partitioning changes.
4. Implement the design using 1 set of place and route runs (though
it can take a second set).
We estimated that our total project time savings using SpyGlass Physical
was:
- 6 weeks or 20% vs gate level partitioning & prototyping.
- 15 weeks or 40% vs pre-partitioning with no physical information.
However, the typical savings, while important, was not nearly as important
as our schedule predictability. It was important to be confident early
on that we have a successful implementation. It's extremely hard to make
changes late in the design cycle.
The complexity of our design made doing this pre-RTL/RTL optimization
with SpyGlass Physical a must. For 3-5 M instances, you think you can
understand your chip with your brain and diagrams. Above that level, SW
automation is needed to take care of your corner cases. You can hit the
wall because what you physically implemented simply may never work.
SPYGLASS PHYSICAL'S LIMITATIONS
1. We had to go through a learning cure to use the tool; it's wasn't
magic at first. We got better results with time.
2. We feel users must guide the tool vs letting the tool find the optimal
solution, causing us to always refine its initial placement (which can
always be improved by user guides). Because of this, we want the tool
to do incremental placement for the refinements rather than starting
from scratch each time -- which is less predictable. This is not yet
available in SpyGlass Physical.
3. Because it lacks an incremental process, when we add new physical
constraints, the results from SpyGlass Physical can be unpredictable.
4. Atrenta uses a cluster model for placement with non-rectilinear shapes
for black boxes -- which gives optimistic or pessimistic results. A
workaround exist by allowing black-box overlaps, but this means manual
control.
5. There were some situations where Spyglass Physical hadn't optimized the
RTL as well as Design Compiler had. Its estimates were noticably off
in these few circumstances.
With Spyglass we are able to move our physical partitioning decisions up to
the RTL/Pre-RTL level instead of the netlist level. Its two big advantages
were the ability to direct our partitioning choice by our own defined
metrics with downstream predictability, and its runtime was only 4 hours.
Additionally, Atrenta R&D support was good; we had a local AE and typically
had fixes within 3 days. We've 2 tapeouts with Spyglass-Physical, including
one in 32 nm. We are now deploying it in other groups of our division.
- Thierry Séjourné
STmicroelectronics Grenoble, France
Join
Index
Next->Item
|
|