( ESNUG 550 Item 6 ) -------------------------------------------- [05/22/15]

Subject: BRCM engineer evals Ausdia Timevision for STA constraints cleanup

> Our management decided we had to clean up and streamline our flow.  They
> required us to benchmark our 7 SDC constraints tools in order to pare it
> down to 1 or possibly 2 tools. ...  After those first cuts, we decided to
> proceed benchmarking Fishtail, Ausdia, Excellicon and Atrenta. ...
>
>     - "User evals Fishtail, Ausdia, Atrenta, Excellicon, Blue Pearl"
>        http://www.deepchip.com/items/0537-01.html


From: [ Mehul Mistry of Broadcom ]

Hi, John,

I work for Broadcom in chip integration in our Network Switching group.

I wanted to provide user feedback on how we use Ausdia Timevision for STA
and constraints clean up at the fullchip RTL level.


BACKGROUND

Our chips are massive and complex.  Typically the designs we implement are
100 M+ insts (and close to 1 B gates), have 1000+ clocks & generated clocks,
many complex IP blocks delivered by globally spread out design teams, and
multiple different modes of operation.

Our fullchip STA work involves two major tasks:

  1. Constraints Development
  2. Timing Closure

The constraints development phase is very challenging for SoC's of our size
and complexity.  It takes time to grasp the various clock domains, inter-IP
interactions, design mode settings etc. to develop a robust, clean set of
signoff constraints.  And if this is lacking, it directly has a negative
impact on our timing closure phase.

In our PrimeTime flow, fullchip STA work used to start when a fullchip
gate-level database (netlists or ETM's + spef's) was released and available.
And when this happened, the primary focus of the project team & management
immediately shifted to the "Timing Closure" phase -- somehow we had to
achieve timing closure and tapeout this chip!

This work flow created a lot of pressure on the fullchip STA team to:

  - quickly analyze root-cause issues on timing paths with negative slack.
  - provide feedback to the backend and IP teams.
  - look into inter-IP timing paths etc.

During this chaotic, fast moving process, the fullchip STA team was often
identifying and fixing constraint issues on the fly -- being totally in a
"reactive" mode.  There was no time or very little time available to really
focus solely on constraints development.

This led to multiple, incremental constraint releases, resulting in numerous
P&R and STA iterations, using more compute severs/licenses.  Overall, it
caused a negative impact on schedule, resources, and QoR. 


WASTING 2 TO 3 MONTHS

After several brutal fullchip timing closure & tapeout cycles, we decided to
take a step back, and think about how to improve our constraints development
process.  What we realized is that as soon as a fullchip RTL snapshot is
released, the functional verification team gets going right away.  However,
the fullchip STA/integration team can't start working until a gate-level
netlist is available which links correctly in PrimeTime.  This typically
takes 2 to 3 months, since it involves putting together a fullchip gate
database, and trying to link it cleanly with several IP blocks which have to
go through synthesis, etc.

In effect, 2 to 3 months are wasted after a RTL snapshot is released, when
nothing meaningful is done for fullchip STA and constraints. 


OUR AUSDIA TIMEVISION FLOW

We wanted to start "constrainsts development" right after an RTL snapshot is
released, instead of wasting those 2 to 3 months.

The basic "Timevision" fullchip RTL flow we deployed was:

    1. Load fullchip RTL from a flist.  This already exists and used by
       verification teams. 

    2. Load .lib models for all non-synthesizable logic (IO pads, memory
       macros, phys etc.

    3. Elaborate & Link the fullchip design

    4. Read basic toplevel constraints, such as primary clock defined at
       top level ports, PLL output etc. 

    5. Read block-level / IP constraints if they were available.  We do
       a "current_instance" for each block instantiation, and read the
       Block / IP constraints used for synthesis/P&R using (block instance
       prefix I used wherever applicable).  Primary clock etc. are ignored,
       since they come from the top level constraints.  Our block / IP
       constraints have if/else constructs which make sure that depending
       on some variable settings, these constraints can be either read
       in the block/IP context or fullchip context.

    6. Build the Fullchip Timing Graph

    7. Use Timevision to:

        a. Verify the correctness of any constraints that we provided
        b. Identify missing constraints
        c. Debug and analyze the details of 7(a) and 7(b)
        d. Fix/Add constraints, or apply waivers for don't care issues 

Depending on fullchip gate/netlist STA or constraints verification, Step 1
is "read_hdl" (read the design RTL), instead of "read_verilog" (read gate
netlist).  Hence, the basic fullchip RTL flow was up and running in less
than a week.  We spent another week or two for getting the proper RTL
flist/database and .lib's to make sure that there are no blackboxes after
linking the design.  This was a first time setup, and now we have automated
the RTL flist/database and .lib finding mechanism for the Timevision flow.


WHAT WE FOUND

Next, I will go over the specific pluses and minuses we found:

    - Capacity, Memory and Runtime: We had no interest in hierarchical,
      grey-box or ILM based flows which involve model approximations,
      cumbersome flow setup, and induce inaccuracies.  Our goal was to
      do STA constraints development work on a fullchip, flat RTL database.
      Timevision was able to do that with a reasonable memory and runtime
      footprint on this ~100 M instance design.  Here is a break-up of
      runtime/memory for each step:

           Read .lib files                :  5 min             0.1 GB
           Read & Elaborate fullchip RTL  :  8 min            22.5 GB
           Link fullchip design           :  38 min          110.0 GB
           Load Tcl constraints           :  4 hr 24 min       4.2 GB
           Build timing graph             :  1 hr             44.0 GB 
           Verify constraints             :  3 hr 15 min       7.5 GB
 
      Within 9.5 hours from when latest fullchip RTL flist was available,
      we were at "tv_shell" prompt with complete constraints verification
      done, and ready for debug & analysis on full chip STA constraints
      work.  Memory was 188 GB, which is around ~20% more than what
      Timevision uses when running at the gate/netlist-level for the same
      design.  This was because after RTL elaboration, Timevision does not
      remove unused registers/logic, unlike Design Compiler.  We've told
      Ausdia to get this corrected and improve memory further, and they
      are working on it.


    - Compatibility with gate/netlist PrimeTime signoff: Our fullchip and
      block/IP constraints used for signoff STA are coded in complex Tcl,
      with several custom procedures etc.  Reading these Tcl constraints
      into Timevision fullchip RTL flow was plug-n-play.

      Initially, there were some issues related to pins mapping, but that
      was quickly resolved.  For example, some constraints referred to
      mapped flop pin names, such as a register clock pin called "phi" in
      our .libs.  During RTL level elaboration however, Timevision names
      the flop clock pins as "CP" by default.  So constraints referring to
      "phi" pins were not read properly.  Ausdia gave us variables to name
      the clock, data, reset, set, flop output pins to any name of our
      choice, which resolved all gates/RTL constraints mapping issues.

      There were some other issues related to delimiters used for generate
      loops in RTL -- also resolved using variable settings.

      Bottom line is that there was 100% alignment between constraints used
      at the fullchip RTL vs fullchip gate/netlist level.  This also meant
      that all constraints that were fixed, added, modified, and verified
      using Timevision as the fullchip RTL level directly went into our
      fullchip STA constraints that we eventually use for signoff.  There
      was no repeat work required.


    - Constraint Verification Run: After loading the fullchip RTL design
      into Timevision, we ran a exhaustive set of constraints verification
      checks in it with "check_constraints".  This one command verifies
      150 constraint rules in several categories: clocks, inter-clock, mode
      settings, IO constraints, coverage, and exception checks.

      We inactivated rules that are not of interest to us.  For most serious
      rules, we set the severity level to "FATAL" and made sure they all
      pass or carefully reviewed and waived.  The entire "check_constraints"
      flow is very configurable.  We had to spend some initial time to
      review and understand all the rules and configure our run properly.

      In the very early RTL snapshots for example, we disabled all rules
      except clocks and inter-clock, since our focus was to get these
      constraints streamlined first.  And as the RTL and design matures,
      we turned-on more and more checks.


    - Constraint Verification Results: As expected, Timevision did identify
      several constraint issues.  Some specific examples:
 
      Missing clocks: Timevision found several primary and generated clock
      sources on which there were no clocks defined in the constraints.
      The majority originated from IP blocks, and were sent to their IP
      teams for review.  A detailed debug report from Timevision was also
      provided to the IP team, which showed a trace of how the reported
      clock source was tracing through buf/inv/mux logic to leaf clock pins
      of registers.  If these clocks were relevant for the IP configuration
      on our chip, the IP teams fixed the constraints.  Otherwise, we added
      waivers.  What was interesting here was that almost all the leaf pins
      reported by Timevision for these missing clock sources were getting
      some other clock -- maybe through some other part of the clock mux
      logic.  But just because every register is clocked in the design does
      not mean that all clocks/generated are defined.

      Missing interclock exceptions: Timevision reported several categories
      of missing interclock "set_clock_groups" exceptions.  Ones we really
      focused on were "logically_exclusive" clocks pairs, and asynchronous
      clock pairs with "non-harmonic" periods.  Debug commands in Timevision
      justified relationships between a reported clock pair, where they
      converge, and how many timing paths exist between the clock pair in
      unique/common domains, etc.  

      Phase re-convergent clocks: Timevision found how/where the negative
      and positive edges of the same clock converged.  This would cause
      1/2 cycle timing paths, and so we reviewed if any case analysis was
      missing that was allowing this phase re-convergence.

      IO constraints: Timevision checks to see if "set_input_delay" and
      "set_output_delay" constraints on the chip IO ports were completely
      missing, incorrect, or incomplete.  A sample timing path from/to
      these IO ports was available through a debug command.  We reviewed
      and fixed some of these, since it was important for accurate
      interface timing closure.

      Unclocked Registers: This is one of the most basic checks.  Timevision
      splits this reporting into several rules.  For example, if a register
      clock pins are completely floating, register clock pins are tied-off,
      register clock pins driven by logic but no clock reaching it, etc.
      This helps us focus on the important rules -- and ignore known issues
      upfront.  For example we have 1000's of spare flops with their clocks
      tied-off intentionally, but since the tied clock pins are reported
      under a different rule we could completely ignore them.

      Exception Conflicts: Timevision found several exception conflicts
      in our constraints.  Root cause was discrepancies between Blocks/IP's
      and top constraints.  For example, we found several cases where an
      identical set of timing paths were covered by a "set_multicycle_path"
      block/IP exception we pulled up -- but those paths were also covered
      by a "set_false_path" exception in the top constraints.  It's critical
      to resolve these and make sure both the blocks/IP and top are using
      identical constraints.  Timevision identified several classes of
      exceptions conflicts such as fp_vs_mcp, mcp exceptions w/ conflicting
      values, fp_vs_max_delay, and so on.


    - Debug: Timevision provides a good set of Tcl debug commands/attributes
      to find root cause reported violations.  Ausdia has some stuff that's
      not found in PrimeTime.  For example, we can trace the entire logic
      from a constant at a port/pin to any to a leaf clock pin.  Or we can
      identify all timing startpoint/endpoint pairs covered by an exception.

      We had to spend some time to get our feet wet into all these debugging
      capabilities, and the Ausdia team supported/helped us a lot.  They
      told us about "check_constraints", which does 150 constraint rules.
      We will try this soon, and expect this to make debugging easier for
      designers who are not Tcl shell/command experts.


    - Waivers: Timevision has a waiver mechanism, which "waives" specific
      violations after the first set of reviews.  This helps reduce noise in
      the subsequent runs.  The designers can review these waivers at any
      time.  Applying waivers for some rules required several object (pins,
      clocks, etc.) and we found it cumbersome to write the waivers.  Based
      on our feedback, Ausdia has a feature where the designer can simply
      add a waiver by specifying violation rather than manually writing
      object names.  When the waivers are written out, all object details
      are automatically updated by the tool.  It simplifies adding waivers.


NOISE AND OTHER AREAS TO IMPROVE

IP vs Top level: As I mentioned before, a large portion of our chip consists
of IP modules, which are owned by different design teams.  So we wanted to
only focus on "top" violations, and hand over our "IP" violations to the IP
design teams.  We had to spend a lot of time disambiguating "top" constraint
violations vs "IP" constraint violations, since it was all mixed up and
became a bit noisy.  As a result, Ausdia implemented a new IP marker scheme,
where we can define IP modules upfront, and the violations would be auto-
binned for each IP and top level automatically.  We are looking forward to 
use this.  It should help save a lot of time.

IP results repeated: Most IP's we use are instantiated several times, maybe
20 times.  This caused one actual IP violation to be repeated 20 times; once
for each instance.  That was noisy and cumbersome to waive.  Ausdia team is
addressing this issue also, and will provide an option to report an internal
IP violation just once.

Promotion of IP waivers: We have the teams creating our lower level blocks
and IP run Timevision themselves.  They fix or waive all their issues in
their level.  The problem is when we run Timevision at fullchip level it
rereports these exact same issues that were already dealt with by the lower
level block/IP owners.  Adusia needs a way to promote these lower level
fixes and waivers up to fullchip level.

        ----    ----    ----    ----    ----    ----    ----

We like using constraints cleanup tools like Ausdia Timevision.  A lot can
and should be done on fullchip STA much earlier in the design cycle, without
solely depending on Primetime, Tempus, or other signoff STA tools.  

    - Mehul Mistry
      Broadcom, Inc.                             San Jose, CA


  Editor's Note: With 14 years of chip design experience at both Seagate
  and Broadcom, Mehul has deep experience with PrimeTime, Design Compiler,
  ARM cores, CTS, chip integration, VHDL, Actel, Tcl and Perl. - John

        ----    ----    ----    ----    ----    ----    ----

Related Articles:

    User benchmarks Fishtail, Ausdia, Atrenta, Excellicon, Blue Pearl
    Atrenta frustrated by user's flawed eval of 7 constraints tools

Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.












Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2025 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)