( ESNUG 550 Item 6 ) -------------------------------------------- [05/22/15]
Subject: BRCM engineer evals Ausdia Timevision for STA constraints cleanup
> Our management decided we had to clean up and streamline our flow. They
> required us to benchmark our 7 SDC constraints tools in order to pare it
> down to 1 or possibly 2 tools. ... After those first cuts, we decided to
> proceed benchmarking Fishtail, Ausdia, Excellicon and Atrenta. ...
>
> - "User evals Fishtail, Ausdia, Atrenta, Excellicon, Blue Pearl"
> http://www.deepchip.com/items/0537-01.html
From: [ Mehul Mistry of Broadcom ]
Hi, John,
I work for Broadcom in chip integration in our Network Switching group.
I wanted to provide user feedback on how we use Ausdia Timevision for STA
and constraints clean up at the fullchip RTL level.
BACKGROUND
Our chips are massive and complex. Typically the designs we implement are
100 M+ insts (and close to 1 B gates), have 1000+ clocks & generated clocks,
many complex IP blocks delivered by globally spread out design teams, and
multiple different modes of operation.
Our fullchip STA work involves two major tasks:
1. Constraints Development
2. Timing Closure
The constraints development phase is very challenging for SoC's of our size
and complexity. It takes time to grasp the various clock domains, inter-IP
interactions, design mode settings etc. to develop a robust, clean set of
signoff constraints. And if this is lacking, it directly has a negative
impact on our timing closure phase.
In our PrimeTime flow, fullchip STA work used to start when a fullchip
gate-level database (netlists or ETM's + spef's) was released and available.
And when this happened, the primary focus of the project team & management
immediately shifted to the "Timing Closure" phase -- somehow we had to
achieve timing closure and tapeout this chip!
This work flow created a lot of pressure on the fullchip STA team to:
- quickly analyze root-cause issues on timing paths with negative slack.
- provide feedback to the backend and IP teams.
- look into inter-IP timing paths etc.
During this chaotic, fast moving process, the fullchip STA team was often
identifying and fixing constraint issues on the fly -- being totally in a
"reactive" mode. There was no time or very little time available to really
focus solely on constraints development.
This led to multiple, incremental constraint releases, resulting in numerous
P&R and STA iterations, using more compute severs/licenses. Overall, it
caused a negative impact on schedule, resources, and QoR.
WASTING 2 TO 3 MONTHS
After several brutal fullchip timing closure & tapeout cycles, we decided to
take a step back, and think about how to improve our constraints development
process. What we realized is that as soon as a fullchip RTL snapshot is
released, the functional verification team gets going right away. However,
the fullchip STA/integration team can't start working until a gate-level
netlist is available which links correctly in PrimeTime. This typically
takes 2 to 3 months, since it involves putting together a fullchip gate
database, and trying to link it cleanly with several IP blocks which have to
go through synthesis, etc.
In effect, 2 to 3 months are wasted after a RTL snapshot is released, when
nothing meaningful is done for fullchip STA and constraints.
OUR AUSDIA TIMEVISION FLOW
We wanted to start "constrainsts development" right after an RTL snapshot is
released, instead of wasting those 2 to 3 months.
The basic "Timevision" fullchip RTL flow we deployed was:
1. Load fullchip RTL from a flist. This already exists and used by
verification teams.
2. Load .lib models for all non-synthesizable logic (IO pads, memory
macros, phys etc.
3. Elaborate & Link the fullchip design
4. Read basic toplevel constraints, such as primary clock defined at
top level ports, PLL output etc.
5. Read block-level / IP constraints if they were available. We do
a "current_instance" for each block instantiation, and read the
Block / IP constraints used for synthesis/P&R using (block instance
prefix I used wherever applicable). Primary clock etc. are ignored,
since they come from the top level constraints. Our block / IP
constraints have if/else constructs which make sure that depending
on some variable settings, these constraints can be either read
in the block/IP context or fullchip context.
6. Build the Fullchip Timing Graph
7. Use Timevision to:
a. Verify the correctness of any constraints that we provided
b. Identify missing constraints
c. Debug and analyze the details of 7(a) and 7(b)
d. Fix/Add constraints, or apply waivers for don't care issues
Depending on fullchip gate/netlist STA or constraints verification, Step 1
is "read_hdl" (read the design RTL), instead of "read_verilog" (read gate
netlist). Hence, the basic fullchip RTL flow was up and running in less
than a week. We spent another week or two for getting the proper RTL
flist/database and .lib's to make sure that there are no blackboxes after
linking the design. This was a first time setup, and now we have automated
the RTL flist/database and .lib finding mechanism for the Timevision flow.
WHAT WE FOUND
Next, I will go over the specific pluses and minuses we found:
- Capacity, Memory and Runtime: We had no interest in hierarchical,
grey-box or ILM based flows which involve model approximations,
cumbersome flow setup, and induce inaccuracies. Our goal was to
do STA constraints development work on a fullchip, flat RTL database.
Timevision was able to do that with a reasonable memory and runtime
footprint on this ~100 M instance design. Here is a break-up of
runtime/memory for each step:
Read .lib files : 5 min 0.1 GB
Read & Elaborate fullchip RTL : 8 min 22.5 GB
Link fullchip design : 38 min 110.0 GB
Load Tcl constraints : 4 hr 24 min 4.2 GB
Build timing graph : 1 hr 44.0 GB
Verify constraints : 3 hr 15 min 7.5 GB
Within 9.5 hours from when latest fullchip RTL flist was available,
we were at "tv_shell" prompt with complete constraints verification
done, and ready for debug & analysis on full chip STA constraints
work. Memory was 188 GB, which is around ~20% more than what
Timevision uses when running at the gate/netlist-level for the same
design. This was because after RTL elaboration, Timevision does not
remove unused registers/logic, unlike Design Compiler. We've told
Ausdia to get this corrected and improve memory further, and they
are working on it.
- Compatibility with gate/netlist PrimeTime signoff: Our fullchip and
block/IP constraints used for signoff STA are coded in complex Tcl,
with several custom procedures etc. Reading these Tcl constraints
into Timevision fullchip RTL flow was plug-n-play.
Initially, there were some issues related to pins mapping, but that
was quickly resolved. For example, some constraints referred to
mapped flop pin names, such as a register clock pin called "phi" in
our .libs. During RTL level elaboration however, Timevision names
the flop clock pins as "CP" by default. So constraints referring to
"phi" pins were not read properly. Ausdia gave us variables to name
the clock, data, reset, set, flop output pins to any name of our
choice, which resolved all gates/RTL constraints mapping issues.
There were some other issues related to delimiters used for generate
loops in RTL -- also resolved using variable settings.
Bottom line is that there was 100% alignment between constraints used
at the fullchip RTL vs fullchip gate/netlist level. This also meant
that all constraints that were fixed, added, modified, and verified
using Timevision as the fullchip RTL level directly went into our
fullchip STA constraints that we eventually use for signoff. There
was no repeat work required.
- Constraint Verification Run: After loading the fullchip RTL design
into Timevision, we ran a exhaustive set of constraints verification
checks in it with "check_constraints". This one command verifies
150 constraint rules in several categories: clocks, inter-clock, mode
settings, IO constraints, coverage, and exception checks.
We inactivated rules that are not of interest to us. For most serious
rules, we set the severity level to "FATAL" and made sure they all
pass or carefully reviewed and waived. The entire "check_constraints"
flow is very configurable. We had to spend some initial time to
review and understand all the rules and configure our run properly.
In the very early RTL snapshots for example, we disabled all rules
except clocks and inter-clock, since our focus was to get these
constraints streamlined first. And as the RTL and design matures,
we turned-on more and more checks.
- Constraint Verification Results: As expected, Timevision did identify
several constraint issues. Some specific examples:
Missing clocks: Timevision found several primary and generated clock
sources on which there were no clocks defined in the constraints.
The majority originated from IP blocks, and were sent to their IP
teams for review. A detailed debug report from Timevision was also
provided to the IP team, which showed a trace of how the reported
clock source was tracing through buf/inv/mux logic to leaf clock pins
of registers. If these clocks were relevant for the IP configuration
on our chip, the IP teams fixed the constraints. Otherwise, we added
waivers. What was interesting here was that almost all the leaf pins
reported by Timevision for these missing clock sources were getting
some other clock -- maybe through some other part of the clock mux
logic. But just because every register is clocked in the design does
not mean that all clocks/generated are defined.
Missing interclock exceptions: Timevision reported several categories
of missing interclock "set_clock_groups" exceptions. Ones we really
focused on were "logically_exclusive" clocks pairs, and asynchronous
clock pairs with "non-harmonic" periods. Debug commands in Timevision
justified relationships between a reported clock pair, where they
converge, and how many timing paths exist between the clock pair in
unique/common domains, etc.
Phase re-convergent clocks: Timevision found how/where the negative
and positive edges of the same clock converged. This would cause
1/2 cycle timing paths, and so we reviewed if any case analysis was
missing that was allowing this phase re-convergence.
IO constraints: Timevision checks to see if "set_input_delay" and
"set_output_delay" constraints on the chip IO ports were completely
missing, incorrect, or incomplete. A sample timing path from/to
these IO ports was available through a debug command. We reviewed
and fixed some of these, since it was important for accurate
interface timing closure.
Unclocked Registers: This is one of the most basic checks. Timevision
splits this reporting into several rules. For example, if a register
clock pins are completely floating, register clock pins are tied-off,
register clock pins driven by logic but no clock reaching it, etc.
This helps us focus on the important rules -- and ignore known issues
upfront. For example we have 1000's of spare flops with their clocks
tied-off intentionally, but since the tied clock pins are reported
under a different rule we could completely ignore them.
Exception Conflicts: Timevision found several exception conflicts
in our constraints. Root cause was discrepancies between Blocks/IP's
and top constraints. For example, we found several cases where an
identical set of timing paths were covered by a "set_multicycle_path"
block/IP exception we pulled up -- but those paths were also covered
by a "set_false_path" exception in the top constraints. It's critical
to resolve these and make sure both the blocks/IP and top are using
identical constraints. Timevision identified several classes of
exceptions conflicts such as fp_vs_mcp, mcp exceptions w/ conflicting
values, fp_vs_max_delay, and so on.
- Debug: Timevision provides a good set of Tcl debug commands/attributes
to find root cause reported violations. Ausdia has some stuff that's
not found in PrimeTime. For example, we can trace the entire logic
from a constant at a port/pin to any to a leaf clock pin. Or we can
identify all timing startpoint/endpoint pairs covered by an exception.
We had to spend some time to get our feet wet into all these debugging
capabilities, and the Ausdia team supported/helped us a lot. They
told us about "check_constraints", which does 150 constraint rules.
We will try this soon, and expect this to make debugging easier for
designers who are not Tcl shell/command experts.
- Waivers: Timevision has a waiver mechanism, which "waives" specific
violations after the first set of reviews. This helps reduce noise in
the subsequent runs. The designers can review these waivers at any
time. Applying waivers for some rules required several object (pins,
clocks, etc.) and we found it cumbersome to write the waivers. Based
on our feedback, Ausdia has a feature where the designer can simply
add a waiver by specifying violation rather than manually writing
object names. When the waivers are written out, all object details
are automatically updated by the tool. It simplifies adding waivers.
NOISE AND OTHER AREAS TO IMPROVE
IP vs Top level: As I mentioned before, a large portion of our chip consists
of IP modules, which are owned by different design teams. So we wanted to
only focus on "top" violations, and hand over our "IP" violations to the IP
design teams. We had to spend a lot of time disambiguating "top" constraint
violations vs "IP" constraint violations, since it was all mixed up and
became a bit noisy. As a result, Ausdia implemented a new IP marker scheme,
where we can define IP modules upfront, and the violations would be auto-
binned for each IP and top level automatically. We are looking forward to
use this. It should help save a lot of time.
IP results repeated: Most IP's we use are instantiated several times, maybe
20 times. This caused one actual IP violation to be repeated 20 times; once
for each instance. That was noisy and cumbersome to waive. Ausdia team is
addressing this issue also, and will provide an option to report an internal
IP violation just once.
Promotion of IP waivers: We have the teams creating our lower level blocks
and IP run Timevision themselves. They fix or waive all their issues in
their level. The problem is when we run Timevision at fullchip level it
rereports these exact same issues that were already dealt with by the lower
level block/IP owners. Adusia needs a way to promote these lower level
fixes and waivers up to fullchip level.
---- ---- ---- ---- ---- ---- ----
We like using constraints cleanup tools like Ausdia Timevision. A lot can
and should be done on fullchip STA much earlier in the design cycle, without
solely depending on Primetime, Tempus, or other signoff STA tools.
- Mehul Mistry
Broadcom, Inc. San Jose, CA
Editor's Note: With 14 years of chip design experience at both Seagate
and Broadcom, Mehul has deep experience with PrimeTime, Design Compiler,
ARM cores, CTS, chip integration, VHDL, Actel, Tcl and Perl. - John
---- ---- ---- ---- ---- ---- ----
Related Articles:
User benchmarks Fishtail, Ausdia, Atrenta, Excellicon, Blue Pearl
Atrenta frustrated by user's flawed eval of 7 constraints tools
Join
Index
Next->Item
|
|