Dan Joyce's 29 tips for gate-level simulation

( ESNUG 569 Item 2 ) -------------------------------------------- [03/31/17]

From: "Dan Joyce" <user=danj domain=correctdesigns not calm>
Subject: Dan Joyce's 29 cost-effective gate-level simulation tips (pt 2)

        ----    ----    ----    ----    ----    ----    ----

 7. Running GLS SDF with Timing Checks
    This is where GLS pays off.  SDF gatesims target four critical types of
    bugs that cannot be found with any other method.

    a) Incorrect/Incomplete Timing Constraints in Physical Design
       In synthesis, your timing constraints tell the tool how to time
       all the paths in the design, and how fast they need to be.  But
       those same constraints also are used by Static Timing Analysis
       (STA) to check that those same paths were timed correctly.  So
       it's a false check.  This is why there are massive engineering
       reviews of timing constraints.  GLS SDF finds these errors,

    b) GLS SDF and Clock Glitches
       Your chip's clocks must be extremely clean.  DFT, Backend place
       and route, gate-level fixes, power gate insertion, BIST, BISR can
       all introduce deadly glitches -- that GLS SDF can catch.

    c) CDC, Asynchronous Clocks, and GLS SDF
       Signals and buses crossing from one clock to another asynchronously
       are always critical because bugs of this type are unlikely to be
       fixed without a respin, and because simple 0-delay RTL and simple
       GLS testing will not find most of them.  Although even GLS with SDF
       is not guaranteed to find all asynchronous clock crossing issues,
       it does tend to find many that get through all other checks.

    d) Multi-Cycle Paths, Assertions, and GLS SDF
       Chip designers use multi-cycle paths to make timing more easily
       without requiring more gates or power.  Unfortunately MCPs done
       incorrectly can cause metastability or incorrect functionality
       that cannot be detected in 0-delay RTL or simple GLS.  You must use
       assertions to check for both set-up & hold multi-cycle bugs.  Doing
       GLS SDF catches MCP bugs.  WARNING: you're putting your chip at risk
       if you don't do GLS SDF and you humanly miss just one assertion on
       any of your sources or on any of your MCP destination DFFs.

    Again, notice how GLS SDF catches these four types of timing bugs above.
    Two other tips:

    - Metastability Propagates X's from Qout to Fail Tests
      Timing violations are flagged when the D-input of a DFF changes inside
      the setup or hold window around the clock edge.  Warning messages are
      printed to the log files, and the outputs of the DFF are driven to X
      for the entire clock period that the DFF could be metastable.

      Usually this X will propagate to the entire design and fail the test.
      If so, this is a real issue and must be debugged.

      If the X doesn't propagate, it still doesn't hurt to manually see if
      your design's X reduction logic is working or if you just got lucky.

    - X's are Easier to Debug
      It is quite easy to debug these failures because you don't have to
      trace back to the origin of the issue. The timing violation warning
      in the log file takes you to the source.  Even if there are too many
      warnings, X-tracing is much easier to debug in waves than trying to
      compare a passing to a failing gatesim without X's.

        ----    ----    ----    ----    ----    ----    ----

 8. You MUST Hand Review ALL the Timing Violations in your Log Files
    I know it can be mind numbing, but your engineer must sort through and
    review ALL timing violation warnings even when the tests pass before
    tapeout to find the suble timing problems that did not propagate to
    fail a test.  Create a script to filter out timing violations *before*
    reset is released; and filter out repeats of the same endpoint.

    You must pull up each endpoint in a wave showing the timing violation
    to figure why the test warned about it.  Identify the startpoint and
    endpont and review with STA to see why that endpoint should not have
    violated timing and why it actually did.

    Even when the X generated by a timing violation does not propagate to
    the rest of the design to fail a test, there may be a real problem.
    Gatesim regressions are extremely limited in coverage.  It is very
    likely that a timing violation detected by your timing checks will not
    propagate into any of your small GLS regression tests due to limited
    coverage.  This timing violation could still cause a fatal chip failure
    in silicon in a mode not tested in GLS.

(click on pic to enlarge image)

          Notice how only SDF TimingChecks finds Set-up Violations

(click on pic to enlarge image)

          Notice how only SDF TimingChecks finds Hold Violations

    These 2 diagrams above show the difference between SDF annotated timing
    simulations with timing checks enabled and disabled.  Both apply delays
    to the signals in your design, but notice how it's only by enabling
    timing checks causes the Timing Violation warning to print to the
    log file and the Qout to go X.

        ----    ----    ----    ----    ----    ----    ----

 9. One Mindful Shortcut for GLS Performance
    Most chips will have block that require very long reset initialization
    procedures - DDR initialization, PLL stabilization, BISR run, BIST run,
    Boot Code Load, etc.  These only need to be run in one GLS test.  The
    bulk of your tests can shortcut these reset initialization procedures
    and instead do a backdoor boot load to get to the "interesting" part of
    your GLS tests quicker.

        ----    ----    ----    ----    ----    ----    ----

10. Lint, LEC, STA, and GLS
    Please!  Don't do painful GLS debug with netlists that are known broken
    due to earler timing or functional equivalence -- unless you just want
    the practice.

    - Do your Lint and LEC before 0-delay Gates
      Make sure your gate-level netlist is created from an RTL design that
      passes Lint checks and is LEC clean *before* running 0-delay gatesims.
      Netlists are likely to have functional differences before Lint and
      LEC are complete.

    - Do your STA as Hold-fixed before GLS SDF
      Do not run GLS with back-annotated SDF delays until your netlist is
      Hold clean, otherwise the tests will not pass at any clock speed.

    - Do your STA Set-up Clean before GLS SDF at Full Speed
      Set-up violations are OK as long as your GLS simulations are run with
      a slow enough  clock.  Run slow first, then full speed when netlist
      is ready.

        ----    ----    ----    ----    ----    ----    ----

11. Using "Fake" Hold Times for GLS with SDF
    It takes a lot of engineering man-hours to get SDF annotated GLS
    working.  And this work must be started *before* your netlist is
    anywhere hear hold clean.  But since, SDF Timing GLS cannot be
    run without a hold clean netlist, one trick is make a script to
    fake-fix hold on your netlist.  This script uses your STA timing
    report to list the DFFs that fail hold time.  The script then adds
    just enough delay at the Dinput of those DFFs to fix hold for
    those paths.

    - Verilog GLS SDF will Swallow Short Pulses
      The tools that do Verilog SDF simulations will not propagate a
      pulse through a cell that's shorter than the propagation delay
      of that cell.  For example, with a chip that has buffers that
      have 5 nsec delay, any 4 nsec pulse will disappear in your
      GLS SDF run.  To balance an unfinished clock tree, sometimes
      designers will just put a

            assign #20 clock_666_in = clock_666__delayed_out;

      thinking they just added 20 nsec to a branch of their clock tree.
      What this does is makes it so a pulse less than 20 nsec will
      not get propagated through!  So to get around this pulse
      swallowing behavior in your fake fix hold script, your added
      delay must be done with multiple cascaded smallers delays instead
      of one big delay.  Otherwise your SDF timing misses pulses.

        ----    ----    ----    ----    ----    ----    ----

12. Create Slow-Clock GLS Tests First
    On every project, I've found that the gate netlists only make their
    timing specs very late in the design process.  They are usually not
    "set-up clean" until just before tapeout.  But it's is necessary to
    be able to run SDF simulations earlier.  The workaround is to create
    a knob in your tests that cause all your gatesim tests to run at
    half-speed or quarter-speed clock frequencies.  This gives your
    project a head start on those early GLS bugs, yet you can go full
    speed near the end of your project when its needed.

        ----    ----    ----    ----    ----    ----    ----

13. Find those Gate Bugs easier and sooner in RTL Simulation
    You can use X-propagation in RTL simulation to find GLS bugs.  Anything
    that helps to find bugs in RTL instead of GLS is a win.  GLS finds bugs
    late in the design process -- where they are much more expensive to
    debug and fix at that time.

    A) Finding Reset Initialization Bugs in RTL Simulation
       Many designs reduce size and power by using non-resetable DFFs in
       their data paths.  Occasionally non-resetable DFFs are accidentally
       used for control logic -- and since in the real silicon these DFFs
       come out of power-on into either 0 or 1 randomly, these reset-init
       bugs can really be chip-killer bugs.

       WARNING: RTL simulations treat X's very optimistically, while gates
       treat them very pessimistically.  Lately simulators have added an
       X-propagation mode that causes the RTL simulation to propagate X's
       more like gates, with the benefit of running at RTL speeds on much
       easier to debug RTL code.  Use these new X-pessimistic RTL sims
       wherever you can.  It'll run 6x to 24x faster than GLS.  

       RTL simulations gobble X's from uninitialized SRAMs as well.  Since
       most SRAMs do not auto-initialize, bugs of this type need to be
       identified, and this can be done before GLS with x-propagation.

    B) CaseX or CaseZ Bugs in RTL Simulation
       Occasionally a designer needs to optimize code for speed and he uses
       a casex to tell his synthesis tool to not check some of the inputs.
       For this to work, the case must be non-overlapping.  If there is an
       overlap, it can cause multiple different synthesis results -- and
       since case statements are simulated as priority encoders the RTL
       simulation always picks just the first match, it's a bug.  This bug
       lets synthesis to pick a netlist that is never simulated until GLS.
       System Verilog now has "unique case" to help avoid this if used in
       the synthesizable RTL.

        ----    ----    ----    ----    ----    ----    ----

14. Delta-Delay Races aren't Usually Seen in RTL Simulation
    When your Veilog simulator does not simulate a DFF correctly it's called
    a delta-delay race.  Instead of taking the value on the D-input *before*
    the clock edge, it accidentally uses the D-input value *after* the clock
    edge.  It looks like a fast-path hold failure in simulation where the
    Din and Qout change at the same time.

    This is a silent chip-killer if it happens in your RTL simulation;
    because your RTL functionality is different from the functionality of
    your synthesized logic.  Lint doesn't catch this.  LEC doesn't catch
    this.  The only 100% sure way to catch this is through GLS SDF runs.

    One fix is your design team could place an assertion on every DFF in
    their design, but that would be a huge maintenance issue.  A more
    palatable solution would be to have simulators add an option to
    automatically place an assertion on each DFF in your design to check
    for delta-delays dynamically.

    This is overkill for such a rare event, so it's probably OK to just wait
    for your full GLS SDF run.  Your RTL sim won't see it.

        ----    ----    ----    ----    ----    ----    ----

15. Handling X-Propagation with GLS
    The biggest complaint I hear about GLS is having to deal with its
    pessimistic propagation of X's.  Each project has to go through the
    tedious search for all X sources -- and eliminating them in a way
    that does not invalidate the GLS task.  My advice:

    - Add Pull-ups or Pull-downs on All Floating Inputs
      A surprising number of testbenches do not drive all the inputs to
      their DUT.  And a Z quickly becomes an X in GLS.

    - Initialize all Non-Resetable Elements to All-0s, All-1s, and Random
      Use an initialization mode in your simulator if available.  This
      sets all registers in your design at time 0 to all-0's, all-1's, or
      random.  Pass in a known random seed so the random pattern will be
      different for each test run -- yet repeatble for you if you need
      to repeat it.  Then run all regressions with all-0's once, all-1's
      once, and random many times for each test.

    - Treat X-gobblers as Sketchy
      Engineers like to put X-gobblers on their gate simulation models like
      RAMs, fuses, and PLLs because the RAM model authors love to drive X's
      out of their RAM.  This is OK in RTL simulation, but with GLS it
      causes everything to go X.

      Only add X-gobbler code to RAM models that you know as causing X's to
      propagate incorrectly.  Your X-gobblers should be coded to *only*
      gobble X's on gatesims.  And have plus-arg created by your simulator
      initialization command also control how the X-gobblers in these models
      convert X's to all-0s, all-1s or random.

      Note: X-gobblers in GLS do not just swallow X's like RTL sims.  They
      change the X's over to 0's, 1's, or random -- so that tests which
      correctly check for incorrect behavior will fail.  Another method
      includes depositing random values on the RAMs at time 1 psec.

    - Disable those 0-time Checkers for GLS SDF
      Some library models have checks that fail at time 0 when inputs to the
      model are X.  On SDF timing simulations the inputs to your chip don't
      make it to any modules deeper into the design at time 0 -- so those
      inputs are X until the signals propagate through the delays.  You must
      edit your models to disable those checks until the signals have time
      to propagate; typically around 50 psec into your GLS run.

    - Wave Debuggers with Good X-tracing Don't Exist
      Most GLS debugging is performed by debugging wave files.  Many tools
      exist on the market, and they all have X-tracing capabilities.  I have
      spent too much of my life tracing X's.  I'm pretty good at it, but I
      would be willing to give it up for good if someone could create a good
      X-trace feature in a wave debug tool.  CAVEAT EMPTOR: while your EDA
      salesman will claim their wave debug tool has x-trace ability, it
      doesn't. 

      Wave debuggers X-tracing fails because waves are usually not dumped
      with the full hierarchy in GLS because it is too slow.

      When I am tracing an X and drop below the level that is dumped, I move
      up a level until I have waves again, then drop all the inputs to that
      module into the viewer and look for an X coming in at about the right
      time, and continue. If there is no X on an input, then I need to rerun
      with a deeper dump on that module.

      A tool that could automate this simple algorithm for X-tracing with a
      partial dump would be a huge win for GLS debug.

        ----    ----    ----    ----    ----    ----    ----

16. Tricky Delta-Delay Races in 0-delay GLS runs
    As I explained in Tip #14 (above), delta-delay races occur when your
    simulator does not simulate a DFF correctly.  Instead of taking the
    value on the D-input *before* the clock edge, it uses the value *after*
    the clock edge -- and looks like a "fast-path" hold failure in sim.

    These issues often occur in 0-delay GLS because with 0-delay, all the
    functional signals are changing at the same time as the clock edges.
    Simulators deal with this by putting the rising edge of the clock in
    the blocking portion of the zero time window (early) while the
    functional signals are restricted to the NBA (non-blocking assign, or
    late) half of the zero time window.  The way the simulator knows how
    to do this is by looking at how the signal is generated.  Clocks are
    supposed to be generated by a blocking assign (=), while functional
    logic Qout's are generated by a non-blocking assign (<=).

    Unfortunately since most clocks today are generated by a clock divider
    which uses DFF's, the clocks are generated by the Qout of a DFF.  And
    these are modeled in gates with a DFF UDP (User Defined Primitive).
    Originally UDP DFF models defaulted to blocking assigns, but a few years
    ago, the Verilog LRM changed the default to non-blocking assign.

    Either way, having the clocks and functional DFFs driven by the same DFF
    simulation model leaves a problem.  How do you tell the simulator which
    is a clock and which is not?  Simulators typically get it right most of
    the time.  Which makes me suspect the tool is identifying clocks during
    the compile of the design, and making sure those signals are always in
    the Blocking Assign region.

    But there always seem to be a few cases -- especially with hard macros,
    or weirdly coded RAM models, where 0-delay GLS fails due to delta-delay
    races.  Typically when these fails are found, they can be fixed with a
    *separate* model for DFFs driving clocks.   Another fix is to put a
    #delay on the input of the D-in, but the *separate* model approach will
    fix all DFFs on clock nets.  WISH LIST: It would be nice to have a
    compile option that would automatically place an assertion on all the
    DFFs in the RTL design that would flag any delta-delay failures
    dynamically during simulation run time. Having to do this manually
    is a maintenance headache.

        ----    ----    ----    ----    ----    ----    ----

Related Articles

    Dan Joyce's 16 bug types only found with gate-level simulation
    Dan Joyce's 29 cost-effective gate-level simulation tips (pt 1)
    Dan Joyce's 29 cost-effective gate-level simulation tips (pt 2)
    Dan Joyce's 29 cost-effective gate-level simulation tips (pt 3)

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)