( ESNUG 478 Item 8 ) -------------------------------------------- [12/18/08]

Subject: ( ESNUG 477 #3 ) Power Opto and Linting in CatapultC and Spyglass

> - Tradeoff between performance and area.  CatapultC gives us timing and
>   area estimates, then later we run our Verilog RTL through Synopsys 
>   Design Compiler for the final result.


From: Dale Pollek <dalep=user domain=atrenta not calm>

Hi John,

CatC users can do power tradeoffs and optimization using our Atrenta
SpyGlass-Power tool.  (ST is doing this right now.)  It is only a few
steps within the Catapult environment to get all the data you want.

To explain this, I have an example block I'll call "Beetlejuice" - it's a
simple FIR filter circuit block that is used many times in a much larger
design.  It has 65 registers and roughly 3 K gates.  It can use 2 to 8
multipliers, depending on how you implement it.  It uses 18 bit busses
and does integer math.

This Beetlejuice block is written in a few lines of C++ code and I'll go
through 5 detailed iterations to find the tradeoffs between speed, area,
arichtectures, and power.  My two primary design goals are:

                       Sample time < 20 nsec
                             Power < 2 mWatts

in a generic 90 nm process technology.

SETUP

To enable SpyGlass within Catapult, make sure to tell it where the SpyGlass
Compiler, libraries and setup information reside.  Once within the Catapult
environment everything else is a push-button operation (or a TCL command)
within the GUI to create and view power results.

LOADING AND PREPARING THE DESIGN

To load and ready the design for making iteration selections is only a few
clicks of the CatC environment before synthesizing the C++ code into RTL.

 1. Add the design file and testbench
 2. Enable the SystemC (SCVerify) flow.  This allows the C++ testbench to
    be used to drive all variants of the C design that are turned into RTL,
    irrespective of timing.
 3. Enable the SpyGlass-Power Flow
 4. Enable the Spyglass Linting Flow
 5. Pick the target process technology in Catapult (for example, the
    built in 90nm library)
 6. Set the target operating frequency of the design (e.g. 50 MHz)
 7. Apply architectural constraints (in first case full unrolling and
    all cases Initiation Interval=1).
 8. Generate the RTL and view performance and area score
 9. Use Spyglass RTL lint/optimization
10. Verify the design using SCVerify and the testbench
11. Use SpyGlass-Power to get the power score for each architecture
12. Repeat steps 6 to 11 two times for different design architectures
    with faster frequencies of 100 and 200 MHz and respectively unroll
    4 multipliers and 2 multipliers.

Below is a more detailed description for each of the steps listed above:

1. Add the design and testbench

   - Start Catapult
   - Click the "Set Working Directory" icon in the Task Bar
   - Set the working directory.  This is where the C++ design and
     testbench are located.
   - Click the "Add Input Files" icon in the Task Bar
   - Select both "Beetlejuice.cpp" and the "testbench_Beetlejuice.cpp",
     the latter which is to be excluded from compilation.  The testbench
     is excluded because it is not part of the design to be synthesized
     into RTL.  Catapult will only use it for verification.  More
     details in Section 10 below.
   - From this point the design is ready for setting up user selected
     alternatives.  It is important to note that the multiple different
     resulting RTL designs that Catapult will generate from this point
     forward are all generated from the very same C++ description.

2. Enable the SystemC Verify (SCVerify) flow

   - In Catapult Flow Manager window, double click on "SCVerify"
   - The "Flow Package Enable" dialog box will open.  Click on "Yes"
   - The "Flow Package Properties" dialog box will open.  Click "OK"

3. Enable the Atrenta SpyGlass-Power Flow

   - Double click on "SpyglassPower" in Flow Manager
   - The "Flow Package Enable" dialog box will open.  Click on "Yes"
   - The "Flow Package Properties dialog box will open.  Click "OK"

4. Enable the Atrenta Spyglass Linting Flow

   - Double click on "SpyglassLint" in the Flow Manager
   - The "Flow Package Enable" dialog box will open.  Select the
     appropriate rules and policies (Spyglass Rule Deck) and then click
     on "Yes" to enable the flow
   - The "Flow Package Properties dialog box will open.  Click "OK"

5. Pick the target process technology in Catapult

   - In the Constraint Editor tabbed window, select "Design Compiler"
     as the synthesis tool
   - Select a target process technology library such as the built in
     90 nm.  Typically, a user would first characterize their own
     libraries using another tool called Catapult Library Builder which
     creates a process technology library that can be used for C synthesis
     from the process library files provided by the foundry.  For
     production designs, it is highly recommended to do this.  However,
     for design exploration and prototyping, Catapult does offer sample
     libraries to quickly try out a design.  For the purposes of this
     description, the user can select "Sample -> 90 nm" in the Synthesis
     Tool window.
   - If any specific memories are needed for the design, this is where
     these are enabled, but for this simple design, we didn't use any.

6. Set the target operating frequency of the design

   - While still in the Synthesis Tool window, set the design frequency.
     For this example, we will set it to 50 MHz to start (that gives one
     20 nsec clock period).
   - Next, click on "Interface Control" from the sidebar, select
     "Handshake" and check the "Transaction Done Signal" box.

7. Apply architectural constraints

   - Select "Architecture Constraints" icon in the Task Bar.  From here,
     the user can decide whether to fully unroll the loop, partial unroll,
     or pipeline the design.  For the first iteration we unroll to 8 and
     hold pipeline to 1 with an Initiation Interval=1.

8. Generate the RTL and view performance and area score

   - Click the "Generate RTL" icon in the Task Bar.  Catapult will now run,
     automatically assigning resources and scheduling based upon the
     defaults given for memory assignment and available technology libs.
   - The user can see the "performance" and "area" scores for the design
     in the "Table" window.

9. Use Spyglass for basic RTL linting rules and optimization

   - Run linting by double-clicking on the SpyGlassLint makefile.
   - Results of the linting are viewed in the Catapult "Message" window.

10. Verify the design using SCVerify and the testbench

  - Simulate the RTL by double-clicking on the "Verification->Modelsim"
    directory.  This will compile all the RTL files, create and compile
    the necessary C++ test infrastructure and launch Questa.
  - When Questa is launched, type "run -all" at the Questa command line.
    This will run the simulation in Questa.
  - From here, the user can view the waveforms of the signals.
  - Once the user is satisfied with the simulation-based verification,
    they can exist Questa and return back to Catapult.
  - Note that Questa is able to simulate the RTL using the original C++
    testbench.  You can also run this in batch to just get a "Simulation
    Passed" message.

11. Use SpyGlass-Power for power estimate of current design architecture

  - Run Power Analysis by double-clicking on the SpyGlass-Power makefile
    that will calculate the power on the RTL generated by Catapult.
  - The power estimation values are in the Catapult GUI table view

12. Repeat steps 6 to 11 to try different design architectures and select
    the best implementation for a target specification

  - To get the three different RTL iterations of Beetlejuice that meet the
    same system-level performance repeat these 4 steps two times by doubling
    frequency and halving the unroll parameter.

After going through the above steps on Beetlejuice, we have the results of
the 3 different solutions with tradeoffs in area and power for the same
system-level performance.

For Beetlejuice, the first iteration has room to improve performance (see
table below) and still meet power requirements so repeat the above
steps 6 through 11 and only modify frequency to 200 MHz and 400 MHz.

RESULTS SUMMARY OF DESIGN ITERATIONS

In a matter of minutes we have 5 design implementation iterations of
Beetlejuice with the first three iterations focusing on attaining the same
system performance to see the impacts of area and power.  The fourth and
fifth are used to investigate what area and power is required when
increasing the performance (frequency).  The results found from remaining
completely inside the Catapult environment are:

   Fmax       Multipliers    Throughput   Total Power     Area Score
    50 MHz      8              20 ns         0.61 mW        13,402
   100 MHz      4              20 ns         2.43 mW         8,390
   200 MHz      2              20 ns         3.58 mW         5,162
   200 MHz      8               5 ns         2.30 mW        13,402
   400 MHz      8             2.5 ns         4.54 mW        15,714

From the results the fourth iteration is selected as it is closest to target
goals.  But it will now go thru further RTL-specific power reduction with
SpyGlass-Power and more detailed lint before implementation.

NOTE: The RTL for the 5 nsec result was essentially the same as the 50 MHz
solution, with everything occurring in one clock cycle.  The 400 MHz
implementation needed faster multipliers which pushed the area up as
Design Complier will use faster Designware multiplier parts.

FINISHING THE DESIGN - LINTING AND OPTIMIZING THE CHOSEN RTL

To complete the design process and ready the RTL for implementation we start
from the fourth iteration (200 MHz 5 nsec architecture) already exceeds the
performance criteria and the power is very close to the design goal.  This
further checking of lint and optimization of power without impacting
performance will use the same SpyGlass-Power as used above.

Only basic linting was applied in step 9 above for pre-simulation and
synthesis during the Catapult architecture selection.  Now that we have
selected the specific RTL, it needs to be further optimized and linted
before implementing.  Now we will verify and clean the design using the
SpyGlass-GUI that can be invoked either from within CatC or outside of it:

1. In SpyGlass-GUI, at the top left corner select "..." option across
   Templates.

2. This should show the different methodologies like New_RTL, Detailed_RTL,
   etc. in a new sub-window.  As the RTL is just created, open the New_RTL
   by clicking the "+" across it and select the two goals under

                 Ensure_RTL_Block_is_simulation_ready

   by selecting the radio buttons across the same.  The motivation for these
   goals is to make sure these blocks are simulation ready.

3. Press Okay button, to close the sub-window and click on the "Run" button
   to perform the analysis.

4. At the bottom of the GUI, select the "Message Tree" tab to view different
   messages generated by SpyGlass.  Running Beetlejuice through this
   process produces the following message.  In this case, SpyGlass is
   complaining about "possible assignment overflow":

      Possible assignment overflow: lhs width 13 (Expr: 'acc_n13_l33_itm')
      should be greater than rhs width 13 (Expr: '(conc_n15_l33_5_itm +
      xor_n15_l33_5_itm)') to accommodate carry/borrow bit, [Hierarchy':
      fir_filter:fir_filter_fir_filter_proc_1@fir_filter_fir_filter_proc'],
      rtl.v, 155

5. Double-clicking on this message will take you to the relevant RTL code to
   debug.  Here, the bit-width of both addition operands is 13 bits, which
   is the same as the LHS bit width, and SpyGlass is complaining about a
   carry overflow in the addition operation.  In this Beetlejuice example,
   this is an acceptable condition, so it does not require any more effort.
   With a few more clicks, Spyglass can catch many such lint related issues,
   so that the user can clean the RTL before handing it off to synthesis.


ADDITIONAL POWER REDUCTION

Now that we have verified that the RTL is clean, the final step if to use
SpyGlass-Power to find added power saving opportunities that lie within this
RTL code.  SpyGlass-Power will determine where added power reduction
opportunities can be found with simple structural reduction techniques as
well as advanced and complex sequential/formal power reduction techniques.

The following steps will determine how much additional power can be
saved on an RTL design:

1. Select the button "..." across templates will bring up a sub-window,
   showing all different goals related to different phases of design flow.
2. Unselect the goals that were selected and open the goals under
   Detail_RTL phase by clicking "+" sign across it.
3. Select the Review_power_reduction_opportunities design goal and select
   the radio button across Power-Reduction.
4. Add the VCD (or FSDB) file that has been used for power analysis, for
   accurate power reduction suggestions.  Click on the constraints tab,
   select spyglass_setup.sgdc and press <E> button on your key board to
   edit the constraints file.  Add the VCD file using the following
   command: activity_file -format vcd -file <vcd-filename>
5. Click on Run to find different power reduction opportunities.
6. Click on the Message Tree tab and open the Info folder to view
   different messages.
7. In the results of the SpyGlass-Power run on Beetlejuice (fourth iteration
   RTL) it identified about a dozen different power reduction opportunities
   to provide ~28% potential power savings.
8. With each power reduction suggestion, it computes 'differential gains' of
   the potential power saving for each suggestion.  This helps the user to
   determine if the change for the design is giving adequate ROI.  To review
   these opportunities, double-click the message and pressing <I> button on
   your keyboard.  This will bring up the incremental schematic sub-window
   which will help debugging the reduction suggestion.  The user selects
   which changes to make.
9. On the incremental schematic, you can see the register that was not gated
   and by gating this register you will reduce the power number, as this
   register is not highly active based on the simulation vector details.

Based on the above we got a total 28% further power reduction.  But I only
needed to do half of the suggested changes to hit our 2 mW goal.

    - Dale Pollek
      Atrenta, Inc.                              San Jose, CA
Index    Next->Item








   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)