( ESNUG 478 Item 8 ) -------------------------------------------- [12/18/08]
Subject: ( ESNUG 477 #3 ) Power Opto and Linting in CatapultC and Spyglass
> - Tradeoff between performance and area. CatapultC gives us timing and
> area estimates, then later we run our Verilog RTL through Synopsys
> Design Compiler for the final result.
From: Dale Pollek <dalep=user domain=atrenta not calm>
Hi John,
CatC users can do power tradeoffs and optimization using our Atrenta
SpyGlass-Power tool. (ST is doing this right now.) It is only a few
steps within the Catapult environment to get all the data you want.
To explain this, I have an example block I'll call "Beetlejuice" - it's a
simple FIR filter circuit block that is used many times in a much larger
design. It has 65 registers and roughly 3 K gates. It can use 2 to 8
multipliers, depending on how you implement it. It uses 18 bit busses
and does integer math.
This Beetlejuice block is written in a few lines of C++ code and I'll go
through 5 detailed iterations to find the tradeoffs between speed, area,
arichtectures, and power. My two primary design goals are:
Sample time < 20 nsec
Power < 2 mWatts
in a generic 90 nm process technology.
SETUP
To enable SpyGlass within Catapult, make sure to tell it where the SpyGlass
Compiler, libraries and setup information reside. Once within the Catapult
environment everything else is a push-button operation (or a TCL command)
within the GUI to create and view power results.
LOADING AND PREPARING THE DESIGN
To load and ready the design for making iteration selections is only a few
clicks of the CatC environment before synthesizing the C++ code into RTL.
1. Add the design file and testbench
2. Enable the SystemC (SCVerify) flow. This allows the C++ testbench to
be used to drive all variants of the C design that are turned into RTL,
irrespective of timing.
3. Enable the SpyGlass-Power Flow
4. Enable the Spyglass Linting Flow
5. Pick the target process technology in Catapult (for example, the
built in 90nm library)
6. Set the target operating frequency of the design (e.g. 50 MHz)
7. Apply architectural constraints (in first case full unrolling and
all cases Initiation Interval=1).
8. Generate the RTL and view performance and area score
9. Use Spyglass RTL lint/optimization
10. Verify the design using SCVerify and the testbench
11. Use SpyGlass-Power to get the power score for each architecture
12. Repeat steps 6 to 11 two times for different design architectures
with faster frequencies of 100 and 200 MHz and respectively unroll
4 multipliers and 2 multipliers.
Below is a more detailed description for each of the steps listed above:
1. Add the design and testbench
- Start Catapult
- Click the "Set Working Directory" icon in the Task Bar
- Set the working directory. This is where the C++ design and
testbench are located.
- Click the "Add Input Files" icon in the Task Bar
- Select both "Beetlejuice.cpp" and the "testbench_Beetlejuice.cpp",
the latter which is to be excluded from compilation. The testbench
is excluded because it is not part of the design to be synthesized
into RTL. Catapult will only use it for verification. More
details in Section 10 below.
- From this point the design is ready for setting up user selected
alternatives. It is important to note that the multiple different
resulting RTL designs that Catapult will generate from this point
forward are all generated from the very same C++ description.
2. Enable the SystemC Verify (SCVerify) flow
- In Catapult Flow Manager window, double click on "SCVerify"
- The "Flow Package Enable" dialog box will open. Click on "Yes"
- The "Flow Package Properties" dialog box will open. Click "OK"
3. Enable the Atrenta SpyGlass-Power Flow
- Double click on "SpyglassPower" in Flow Manager
- The "Flow Package Enable" dialog box will open. Click on "Yes"
- The "Flow Package Properties dialog box will open. Click "OK"
4. Enable the Atrenta Spyglass Linting Flow
- Double click on "SpyglassLint" in the Flow Manager
- The "Flow Package Enable" dialog box will open. Select the
appropriate rules and policies (Spyglass Rule Deck) and then click
on "Yes" to enable the flow
- The "Flow Package Properties dialog box will open. Click "OK"
5. Pick the target process technology in Catapult
- In the Constraint Editor tabbed window, select "Design Compiler"
as the synthesis tool
- Select a target process technology library such as the built in
90 nm. Typically, a user would first characterize their own
libraries using another tool called Catapult Library Builder which
creates a process technology library that can be used for C synthesis
from the process library files provided by the foundry. For
production designs, it is highly recommended to do this. However,
for design exploration and prototyping, Catapult does offer sample
libraries to quickly try out a design. For the purposes of this
description, the user can select "Sample -> 90 nm" in the Synthesis
Tool window.
- If any specific memories are needed for the design, this is where
these are enabled, but for this simple design, we didn't use any.
6. Set the target operating frequency of the design
- While still in the Synthesis Tool window, set the design frequency.
For this example, we will set it to 50 MHz to start (that gives one
20 nsec clock period).
- Next, click on "Interface Control" from the sidebar, select
"Handshake" and check the "Transaction Done Signal" box.
7. Apply architectural constraints
- Select "Architecture Constraints" icon in the Task Bar. From here,
the user can decide whether to fully unroll the loop, partial unroll,
or pipeline the design. For the first iteration we unroll to 8 and
hold pipeline to 1 with an Initiation Interval=1.
8. Generate the RTL and view performance and area score
- Click the "Generate RTL" icon in the Task Bar. Catapult will now run,
automatically assigning resources and scheduling based upon the
defaults given for memory assignment and available technology libs.
- The user can see the "performance" and "area" scores for the design
in the "Table" window.
9. Use Spyglass for basic RTL linting rules and optimization
- Run linting by double-clicking on the SpyGlassLint makefile.
- Results of the linting are viewed in the Catapult "Message" window.
10. Verify the design using SCVerify and the testbench
- Simulate the RTL by double-clicking on the "Verification->Modelsim"
directory. This will compile all the RTL files, create and compile
the necessary C++ test infrastructure and launch Questa.
- When Questa is launched, type "run -all" at the Questa command line.
This will run the simulation in Questa.
- From here, the user can view the waveforms of the signals.
- Once the user is satisfied with the simulation-based verification,
they can exist Questa and return back to Catapult.
- Note that Questa is able to simulate the RTL using the original C++
testbench. You can also run this in batch to just get a "Simulation
Passed" message.
11. Use SpyGlass-Power for power estimate of current design architecture
- Run Power Analysis by double-clicking on the SpyGlass-Power makefile
that will calculate the power on the RTL generated by Catapult.
- The power estimation values are in the Catapult GUI table view
12. Repeat steps 6 to 11 to try different design architectures and select
the best implementation for a target specification
- To get the three different RTL iterations of Beetlejuice that meet the
same system-level performance repeat these 4 steps two times by doubling
frequency and halving the unroll parameter.
After going through the above steps on Beetlejuice, we have the results of
the 3 different solutions with tradeoffs in area and power for the same
system-level performance.
For Beetlejuice, the first iteration has room to improve performance (see
table below) and still meet power requirements so repeat the above
steps 6 through 11 and only modify frequency to 200 MHz and 400 MHz.
RESULTS SUMMARY OF DESIGN ITERATIONS
In a matter of minutes we have 5 design implementation iterations of
Beetlejuice with the first three iterations focusing on attaining the same
system performance to see the impacts of area and power. The fourth and
fifth are used to investigate what area and power is required when
increasing the performance (frequency). The results found from remaining
completely inside the Catapult environment are:
Fmax Multipliers Throughput Total Power Area Score
50 MHz 8 20 ns 0.61 mW 13,402
100 MHz 4 20 ns 2.43 mW 8,390
200 MHz 2 20 ns 3.58 mW 5,162
200 MHz 8 5 ns 2.30 mW 13,402
400 MHz 8 2.5 ns 4.54 mW 15,714
From the results the fourth iteration is selected as it is closest to target
goals. But it will now go thru further RTL-specific power reduction with
SpyGlass-Power and more detailed lint before implementation.
NOTE: The RTL for the 5 nsec result was essentially the same as the 50 MHz
solution, with everything occurring in one clock cycle. The 400 MHz
implementation needed faster multipliers which pushed the area up as
Design Complier will use faster Designware multiplier parts.
FINISHING THE DESIGN - LINTING AND OPTIMIZING THE CHOSEN RTL
To complete the design process and ready the RTL for implementation we start
from the fourth iteration (200 MHz 5 nsec architecture) already exceeds the
performance criteria and the power is very close to the design goal. This
further checking of lint and optimization of power without impacting
performance will use the same SpyGlass-Power as used above.
Only basic linting was applied in step 9 above for pre-simulation and
synthesis during the Catapult architecture selection. Now that we have
selected the specific RTL, it needs to be further optimized and linted
before implementing. Now we will verify and clean the design using the
SpyGlass-GUI that can be invoked either from within CatC or outside of it:
1. In SpyGlass-GUI, at the top left corner select "..." option across
Templates.
2. This should show the different methodologies like New_RTL, Detailed_RTL,
etc. in a new sub-window. As the RTL is just created, open the New_RTL
by clicking the "+" across it and select the two goals under
Ensure_RTL_Block_is_simulation_ready
by selecting the radio buttons across the same. The motivation for these
goals is to make sure these blocks are simulation ready.
3. Press Okay button, to close the sub-window and click on the "Run" button
to perform the analysis.
4. At the bottom of the GUI, select the "Message Tree" tab to view different
messages generated by SpyGlass. Running Beetlejuice through this
process produces the following message. In this case, SpyGlass is
complaining about "possible assignment overflow":
Possible assignment overflow: lhs width 13 (Expr: 'acc_n13_l33_itm')
should be greater than rhs width 13 (Expr: '(conc_n15_l33_5_itm +
xor_n15_l33_5_itm)') to accommodate carry/borrow bit, [Hierarchy':
fir_filter:fir_filter_fir_filter_proc_1@fir_filter_fir_filter_proc'],
rtl.v, 155
5. Double-clicking on this message will take you to the relevant RTL code to
debug. Here, the bit-width of both addition operands is 13 bits, which
is the same as the LHS bit width, and SpyGlass is complaining about a
carry overflow in the addition operation. In this Beetlejuice example,
this is an acceptable condition, so it does not require any more effort.
With a few more clicks, Spyglass can catch many such lint related issues,
so that the user can clean the RTL before handing it off to synthesis.
ADDITIONAL POWER REDUCTION
Now that we have verified that the RTL is clean, the final step if to use
SpyGlass-Power to find added power saving opportunities that lie within this
RTL code. SpyGlass-Power will determine where added power reduction
opportunities can be found with simple structural reduction techniques as
well as advanced and complex sequential/formal power reduction techniques.
The following steps will determine how much additional power can be
saved on an RTL design:
1. Select the button "..." across templates will bring up a sub-window,
showing all different goals related to different phases of design flow.
2. Unselect the goals that were selected and open the goals under
Detail_RTL phase by clicking "+" sign across it.
3. Select the Review_power_reduction_opportunities design goal and select
the radio button across Power-Reduction.
4. Add the VCD (or FSDB) file that has been used for power analysis, for
accurate power reduction suggestions. Click on the constraints tab,
select spyglass_setup.sgdc and press <E> button on your key board to
edit the constraints file. Add the VCD file using the following
command: activity_file -format vcd -file <vcd-filename>
5. Click on Run to find different power reduction opportunities.
6. Click on the Message Tree tab and open the Info folder to view
different messages.
7. In the results of the SpyGlass-Power run on Beetlejuice (fourth iteration
RTL) it identified about a dozen different power reduction opportunities
to provide ~28% potential power savings.
8. With each power reduction suggestion, it computes 'differential gains' of
the potential power saving for each suggestion. This helps the user to
determine if the change for the design is giving adequate ROI. To review
these opportunities, double-click the message and pressing <I> button on
your keyboard. This will bring up the incremental schematic sub-window
which will help debugging the reduction suggestion. The user selects
which changes to make.
9. On the incremental schematic, you can see the register that was not gated
and by gating this register you will reduce the power number, as this
register is not highly active based on the simulation vector details.
Based on the above we got a total 28% further power reduction. But I only
needed to do half of the suggested changes to hit our 2 mW goal.
- Dale Pollek
Atrenta, Inc. San Jose, CA
Index
Next->Item
|
|