Ciena user reviews Calypto Catapult HLS design and verification

( ESNUG 541 Item 4 ) -------------------------------------------- [05/23/14]

From: [ Boris Hristov of Ciena ]
Subject: Ciena user reviews Calypto Catapult HLS design and verification

Hi, John,

I am an ASIC designer at Ciena and also responsible for digital ASIC
methodology.  For the DSP portion of our ASICs, which accounts for about
1/2 of the digital core of our recent WaveLogic 3 (WL3) ASIC, we used the
Calypto Catapult High Level Synthesis (HLS) tool in our production flow.

I wanted to share with your readers our methodology/feedback on Calypto.

For confidentiality reasons, I cannot provide the usual timing/area/power
numbers for our blocks.  However, since Ciena's WL3 chips are coherent
optical processors used in networking, I can say performance and power is
critical for us.  Our target bit-rates are 100 Gbit/sec or higher.

          ----    ----    ----    ----    ----    ----    ----

INITIAL EVAL -- CATAPULT VS HAND CODED RTL

I ran our initial Catapult evaluation about 2 years ago, against a block
that I had previously hand-coded in RTL.  

I picked this particular block because it is representative of the types of
the DSP designs we have and I was very familiar with it.  Catapult has a
scheduler to help visualize the loop and change parameters for latency and
throughput.  I wanted to be able to explore various implementations and get
a quick feedback on actual implementation area, timing and power.  

Block A

Description: 

    Single loop, highly parallel pipeline design with 48 total
    operations: 32 multipliers, 16 adders.

Catapult's QoR Results: 

It's important to note that Catapult gives feedback based on your target
ASIC libraries.

    - Timing - Catapult's RTL matched my RTL code timing for throughput
      and latency.

    - Area - Catapult met my area requirements, and was within 10%.

    - Power - Catapult does power estimation, and we used it to compare
      relative power between different implementations.

Time savings:

    - RTL hand-coding: Took me 1 week to hand code this one RTL block.

    - Catapult HLS: It took me 2 days to set up Catapult the first time
      (with the help of their FAE), then only minutes to generate several
      implementation cases for the block.  

Catapult passed our evaluation and we purchased it.  Its major benefits to
us were: 1) a fast design exploration of the possible solution space, 2) the
visualization from the scheduler and 3) the potential verification time
savings -- more on this when I discuss our production flow.

          ----    ----    ----    ----    ----    ----    ----

CATAPULT PRODUCTION DESIGN RESULTS

Next, we used Catapult for two types of designs in our production blocks.

All instances of variants of these two blocks accounted for about 50% of our
total production ASIC gate count.

Block B & Block C

Descriptions:

    - Block B was a FFT Fast Fourier Transformation

    - Block C was a Digital Filter

    - The two blocks had between ~80 and 5,000 operations for various
      implementations

    - In most cases blocks were integrated into the design as a triplet,
      but were each designed separately with their own parameters.  

    - We had 16 total instances across our design, with different
      parameters for each instance.

Catapult's QoR Results: 

    - Timing - Met tight performance targets for all instances of
      each block.

    - Area - met our objectives for all instances of each block.

    - Power - We focused on sub-chip (sub-system) level power targets
      and once we had selected the block implementation we did not use
      Catapult power numbers later in the flow.  

Time savings:

    - Using Catapult saved us substantial time here.  We created
      only a few C++ models and synthesized RTL variants by changing
      our Catapult parameters.

    - Time to synthesize each RTL block: Varied from 20 min to 2 hours.  

    - We did not RTL hand-code these blocks; however based on experience,
      I expect hand-coding each instance according to the specified
      parameters would have taken approximately 1 week for each variant.  

Verification:

    - We still simulated Catapult's RTL output using a digital simulator
      running in a System Verilog test bench.  However, because it was
      machine-generated vs. hand-coded RTL we focused on integration
      simulations; however we did C to RTL equivalency simulations as
      part of the System Verilog test plan.

    - We primarily used scheduler visualization for throughput/latency
      and resource sharing analysis and to synthesize our SystemC to RTL.
      As a result of the synthesis we get timing/area and power that was
      very useful to compare implementations.

Low latency Inverse FFT implementation:

High latency Inverse FFT implementation:

Using the schedule visualizer we can see each operation use and resource
sharing -- depending on the particular implementation parameters.  From the
Catapult snapshots above it was immediately visible that our Low Latency
implementation had many more parallel operations while our High Latency
implementation had much more resource reuse -- hence a smaller area design.

This kind of information is crucial for selecting our best architecture on
a case-by-case basis.

There is a "sc_verify" flow for C-to-RTL equivalency simulations that we
did not try since we used our traditional System Verilog flow.  Mentor
Questa was running underneath "sc_verify" for all our RTL, so we could
use its code-coverage and functional coverage.  (Questa is built-in to
the Catapult tool suite.)

I also liked its interface synthesis.  By selecting type of the interface
in Catapult and in relation to the throughput pragmas RTL interface for our
DSP block is also synthesized.

          ----    ----    ----    ----    ----    ----    ----

OUR HLS/RTL/VERIFICATION PRODUCTION DESIGN METHOLOGY

We have modified our design methodology since we added Catapult to our
production flow.  Below is our current approach:

    1. All our DSP models are written in C++.

    2. System designers work on our untimed system algorithms in C++.
       These high level designs are primarily architectural.  They use
       MatLab and C testbenches to test our C++ models.  

    3. DSP development group modify the C++ for the DSP sub-models which
       get implemented inside our ASIC.  These C++ models are more
       detailed, and a bit closer to hardware than the system models,
       but are still untimed and implementation generic.  It is
       useful to have a single C++ model for our system simulations
       and Catapult implementation whenever possible.

    4. We simulate our system-level C++ models using an in-house
       simulator.

    5. We use Calypto's Catapult high level synthesis to synthesize
       our C++ into RTL (Verilog).

    6. We verify the Verilog RTL code using standard simulators and a
       System Verilog testbench.

    7. We lint the RTL that Catapult produces as we do for any RTL code.

    8. We run the linted and functionally verified RTL through our
       standard RTL synthesis flow.

    9. We are a fabless semiconductor company, so we then send our
       gate-level design to our ASIC vendor for physical implementation.

          ----    ----    ----    ----    ----    ----    ----

CHANGES IN HOW WE DESIGN AND VERIFY

Below is where we've been evolving our methodology to include C-to-Verilog
high level synthesis:

    - Our RTL designers now get involved earlier.  Our RTL folks work
      closely with the system C++ team to make sure that the design will
      be implementable.  As we get into the interfaces and data
      structure, our RTL designers also need to make sure that the
      models are testable.

We are actually seeing RTL guys modifying C code as they get it from DSP
guys, but it is still a collaborative effort of the DSP and RTL person on
each block.  Ultimately the DSP engineer is responsible for algorithm and the
RTL engineer for gate-level implementation.

    - We get our RTL designed faster.  In general, for a well understood
      block that would take 1 week to develop by hand-coding RTL, it will
      take Catapult 1 day to finalize RTL implementation, including all
      the design exploration.  

    - We do more design exploration.  For example, two major parameters
      that Catapult can control during scheduling are throughput and
      latency.  Catapult can graphically represent the logic for every
      tick of the loop, so we can analyze the resource sharing of the
      algorithms.  Every time we change one of the parameters, we get a
      new design with estimated timing, area, and power, based on our
      target library.  

We raised the level of abstraction for the bulk of our designs with C even
before we used Catapult.  With Catapult we can now get to RTL implementation
much faster and explore various implementation characteristics much more
easily.  

From what I'm hearing about their SLEC, it'll let us move our verification
efforts up.  Our RTL test benches are not used for C-to-RTL equivalency
testing any more, but for sub-system integration testing.

          ----    ----    ----    ----    ----    ----    ----

TESTBENCHES, SLEC, ASSERTIONS, POWER

Automated C Testbenches and Optimal Stimulus Generation

Catapult has a "sc_verify" flow to aid the DSP block C verification.  It's
unusual to not use it for C-to-RTL equivalency (formal methods are more
effective for proving equivalency) -- but instead to use it for C block
verification with Mentor inFact graph-based stimulus generation.

InFact allows for graphical representation of this stimulus and the various
interactions to be investigated -- with a minimal set of test cases created
to fully cover your desired stimulus using an automatically generated
Catapult "sc_verify" SystemC testbench.

The key value of inFact is it can generate random stimulus for C/SystemC-
based verification, using constraints in a similar way to SV constrained
random -- with the increased efficiency of its coverage closure algorithms,
and then to report the coverage achieved.

In addition, the same graph used for the C models in the "sc_verify" flow
can be targeted at the RTL if desired, with consistent stimulus applied in
each case.  Even with formal methods used for equivalency the same graph-
based stimulus could still be useful for debugging differences if detected.

C-to-RTL formal equivalency

Formal equivalency checking is a great productivity enhancer for both hand
written RTL or synthesized RTL.  If one can formally verify the blocks at C
level and check formal equivalency with RTL, there is no more need to create
individual block-level testbenches.  This could greatly reduce the amount of
System Verilog co-simulations required.

Calypto SLEC is particularly interesting because of its obvious tight hooks
with Catapult.  

    - Catapult automatically produces the entire set up for SLEC, such
      as interface properties, latency/throughput, loop-related
      information, intermediate equivalence points.  

    - We put pragma's in our C++ code that Catapult synthesizes, with
      specific instructions to guide the synthesis.  SLEC understands
      these pragmas.  

There will always be some RTL that doesn't have a C model, so RTL simulation
will still be required, but mostly for SoC/block integration.  

C-level assertion-based verification

Over the past decade, RTL verification has shifted from primarily directed
tests and random simulation to also include assertion-based verification.

As we shift our design focus to C++, we want to use the same rigor as was
used on the RTL.  We began using assertion-based verification at the C level
to increase our confidence in the C++ code.

After you code your assertions/properties and coverage points into your
C model, HLS tools can synthesize and propagate them all into your RTL in
standard SVA or OVL formats.  You can then not only use these properties at
the C level for formal proof or dynamic simulation, but can benefit from
having them automatically embedded in the RTL, rather than having to create
them manually.  

With this flow we our DSP blocks are IP that has two functionally identical
"views": C and RTL, where a single C view can have multiple functionally
equivalent RTL implementations.

The assertions and coverage points increase the quality for our IP block.
Plus the assertions travel with the IP and provide white box visibility into
the IP during system verification -- regardless of what "view" is used for
the particular testing.

C level power optimization

Ciena is looking into using C-level power optimization tools; for stuff
beyond algorithm and bit widths optimizations, that are propagated to the
RTL in the form of clock-gating enables and memory access optimizations.

          ----    ----    ----    ----    ----    ----    ----

WHAT'S MISSING FROM CATAPULT

When you have a lot of test cases, assertions, and functional coverage, it
can be hard to visualize them -- or measure and track coverage.  I would
love to see a graphical representation of the merged functional coverage
at the C level.  You can do it now by synthesizing RTL and co-simulating in
a "sc_verify" flow and collecting coverage in Questa, but I'd like to have
a pure "sc_verify" C functional coverage flow.

Although Catapult can currently tackle most designs using a hierarchical
approach, I'd be happier if it could handle even larger flat designs.  

          ----    ----    ----    ----    ----    ----    ----

CONCLUSION

It is hard to quantify exactly, but I estimate Catapult cut our overall
development effort by 25% for our DSP block level designs.

In addition, I expect some further savings in verification time with SLEC,
though it's too early for me to predict that.

Looking forward, we hope to expand our HLS use to synthesize almost all of
our DSP block level designs.  We are also looking into using TLM interfaces
to build even deeper hierarchy in C.

    - Boris Hristov
      Ciena Corp.                                Ottawa, Canada

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)