Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS

( ESNUG 501 Item 4 ) -------------------------------------------- [04/04/12]

Subject: Four more chip designers direct evals of Atrenta Spyglass Power

> We would like to have some user perspective on RTL-level power analysis
> and optimization.  Specifically Apache PowerArtist/PowerArtistXP and
> Atrenta Spyglass Power.  We are seriously considering using one in our
> next design.
>
>    - [ Puss in Boots ]
>      http://www.deepchip.com/items/0495-07.html


From: [ Robot Chicken ]

Hello John,

We also use Spyglass Lint and CDC.  Our designers are already familiar with
the common GUI and design setup, so it was easy to add Spyglass Power for
early power estimation.  (In fact, we have a regression mechanism with which
we run all these analysises and designers look at all the results with every
revision of RTL/simulation.)

SpyGlass Power can merge multiple simulation files for different operations;
we can feed them and weight them to get power numbers based on the actual
functionality of our chip.

For example: Design is in "idle" mode for 30%, "transmit" mode for 30% and
"receive" mode for 40% of the time, then we can weight this along with
different simulation files and get the power number.

One key advantage with SpyGlass is it does its own real synthesis and maps
to the technology cells.  The result is much closer power behavior to the
final output from Design Compiler.

SpyGlass generates a modified RTL with optimized power.  This new RTL has
the same look and feel of the original RTL with comments inserted on where
changes have been made.  The new RTL can be formally verified with the
sequential equivalence engine (SEC) built-in SpyGlass to make sure that
the functionality of the design has not changed.

In a preliminary evaluation on one of our blocks we saw a 16% reduction.

We have used Spyglass Power on two complete designs.

  Design 1: 7 M placeable instances, 21 M gates, 2.7 M registers
           14 clock domains from 62.5 to 900 Mhz
           65 nm, 4G FSDB (100 usec sim time)

           Overall RTL power estimated within 15% of final silicon.

  Design 2: 24 M gates, 1.5 M registers
             4 clock domains from 62.5 to 900 Mhz
            45 nm, 10.4G VCD (7 usec sim time)

           Overall RTL power estimated within 7% of PrimeTime PX.

The SpyGlass Power estimation reports are very detailed.  Helped us identify
power-centric blocks; clock-gating efficiency/power for each clock domain;
and memory access rate/activity graphs to make sure that our memories are
operating as expected.

If you do plan to publish the comments I would request you keep me anon.

    - [ Robot Chicken ]

         ----    ----    ----    ----    ----    ----    ----

From: [ The Silver Surfer ]

Hi, John,

I hearby request you to keep me anonymous.

For two "regular" designs (90 nm and 65 nm) of 36% combinatorial logic,
33% sequential logic, and 31% memory, the power numbers Spyglass calculated
at RTL stage matched with the power numbers calculated with Magma PNR
gate-level netlist.

The power consumption was within 10% of the actual silicon power numbers.

GOTCHA #1:

For a heavily combinatorial oriented design of 68% combinatorial logic,
3% sequential logic, and 29% memory, the Spyglass power numbers deviated
considerably at RTL level.  Going from Design Compiler to Magma PNR, our
netlist size increased by ~45%.  In this case, our design's combinatorial
area grows huge for post-CTS and timing-fix as compared to the pre-CTS
netlist and  comparatively large clock buffer tree in layout to accommodate
the huge combinatorial logic placement.

For such designs Spyglass Power was not be able to accurately calculate
powers at early RTL stage.  Instead we required a PNR netlist, gate-level
simulation dump and actual parasitic file (SPEF).

Power Reduction:

At the early RTL phase, Spyglass Power has some tricks.  This includes
finding explicit enables, opportunities for missing enables by forward
traversal and reverse traversal of the logic, merging of enables etc.

In each case, each "potential" power savings opportunity in the design
is reported separately, along with the exact power savings due to the
opportunity.  You choose any or all.

The tool generates a clock-gating script to guide Power Compiler.  It's
a Spyglass created Power Compiler template to guide synthesis with clock-
gating at the block level.  The Power Compiler runtime was a bit penalty
here but worth seeing the dynamic power number reducing by considerable
margin.

For the two "regular" designs explained above, we could see overall
switching power improved by ~15-20%.
 
GOTCHA #2:

Sometimes Spyglass does suggest to add the clock-gating "enables" without
considering power increase if any.  This happens for single-bit multiple
pipeline stages of flops.  However the tool suggested these changes for
RTL modifications, when the same is run with RTL modifications (AutoFix
feature), it generates script for Power Compiler where it says don't do
clock-gating for these elements.  In such scenarios designer's insight
into the functionality is required than going alone with the suggestions
from the tool.  For backward tracing of all "enables" it could trace to
limited logic levels below the hierarchy, incremental runs might be
required to trace all backward missing enables.

Auto RTL fixing:

Spyglass supports Automatic RTL fixing for power reduction capability.  It
has embedded "SEC (Sequential LEC)" tool to check original RTL and modified
RTL with Sequential formal verification methodology.  SEC helps bypassing
functional verification stage for the modified RTL.  RTL quality related
requests were addressed in the timely manner.  For one design, overall area
increased by 17% with respect to CG without RTL Auto Fix and for other
block it is within the 5% limit.  We could see overall ~10-15% switching
power improved.

GOTCHA #3: Peak Power Analysis

The Spyglass tool lacks capability to calculate the event-accurate peak
power and de-cap requirement for IR drop analysis.

Crossing of Clock Domains:

We also use the Spyglass CDC for clock domain crossing analysis.  With the
CDC analysis engine, the tool supported generating a clock-gating script.
During DC synthesis, this avoids clock-gating during crossing of a clock
domain.  Though there might not be direct functional implications in case
the signal crossing clock domain is static, but in case of designs sensitive
to peak power this is useful as the possible glitches on the clock tree are
avoided by this.

    - [ The Silver Surfer ]

         ----    ----    ----    ----    ----    ----    ----

From: Denis Dutoit <denis.dutoit=user domain=cea dot fr>

Hi, John,

We have been using Spyglass Power for the past few months.  We've seen
significant power reduction at the RTL stage compared to optimization
done at the synthesis and layout stages.

We have used Spyglass for two major stages in the design flow namely:

    1. Design importation and RTL power estimation.

          a. Existing clock gate analysis
          b. Audit design/simulation/technology analysis
          c. Early optimization
          d. Activity analysis
          e. Average and cycle-based power estimation at RTL

    2. Design optimizations at RTL.

          a. Power optimization for registers and memories
          b. Fixing selections at RTL
          c. Verification of modified RTL
          d. Power estimation on new design


1. Design importation and RTL power estimation:

Before starting power analysis, we have to import the design's VHDL (or
Verilog) plus the fab technology library files.

Next step is the "Existing clock gate analysis" power savings of existing
clock gates and power savings achieved if such clock gates are removed from
the design.  A simulation file is not a prerequisite for this task, but all
subsequent steps require simulation files to access relevant activity
information.

Next "Audit design/simulation/technology analysis" library for consistency.
This step allows for review of the inputs (that then could be given to the
tool) before running power estimations.  Examples of inputs are clock
information, a wireload model, library cells, etc.

Next, from all the imported information, the aim of the "Early optimization"
which can be added to the design by modifying RTL files.

Next "Activity analysis" is an average activity for each hierarchical unit
of the design.  This step identifies the time period that the design
is highly active, which could imply the presence of power bugs.

Finally "Average and cycle-based Power Estimation" each block in the design.
If more detail in power consumption is needed, a second estimation can be
done, which analyzes the power consumption at every cycle.  At the expense
of accuracy, this last analysis takes more executing time than "average"
Activity Analysis.  Choose the best time window for power estimation.


2. Design optimization at RTL:

The second part of the Spyglass Power design flow is power optimizations.
It aims to analyze design files, simulation files and technology library
information in order to provide power optimizations and possible RTL changes
to the design that will save power.  Optimizations can be focused on
registers or memories.  (This last kind of optimization applies specific
sequential and formal techniques to reduce register and memory power and
consequently provide modifications to RTL files.)  Gating more registers
or chip select memory signal is an example of these techniques.

Once optimizations are implemented, the "Fixing Selection" those proposed
by the tool.  RTL files will be automatically fixed in order to add power
saving logic and implement the selected proposals.

The "Verification of Modified RTL" sequential behavior of the modified
design is the same as the original "sequential equivalence checking".
If this verification concludes that both designs are equivalent, an
improved RTL is obtained as a result.

The last step conducts a final power estimation in order to determine the
optimizations" impact on power consumption.


Spyglass Power results on two IP Blocks:

Here is our data from running Spyglass Power on two designs:

 - DMDI (De-mapping De-Interleaving core) and a BCJR algorithm
   turbo decoder.  The DMDI core receives and decodes 3GPP-LTE
   protocol data flow.  After clock-gating was done, power
   estimation of the modified design was performed.  The total
   power of new design was 60.1 mW.  Power reduction due to
   manual clock gating was 7.3 mW, 9% of total power.

   After Spyglass RTL power optimization was done, the IP total
   power was 58.66 mW.

   Both power reductions resulted in net power saving of 8.74 mW
   and represented 13% of total power.

 - BCJR turbo decoder.  Power register reductions together with
   memory power reductions were carried out in an automated way
   using Spyglass Power.

   RTL power estimation revealed that memory power consumption
   for this IP was a considerable percentage 67% of total power.

   The power saving from memory gating alone was 18.3% of total.


IMPROVEMENTS NEEDED:

Even though Spyglass Power presents the possibility of finding and
implementing new enables for additional clock gating, it has some
difficulties to automatically generate RTL for all power reduction
opportunities.

The new power reduction opportunities are also not timing-aware and
could impact critical timing paths.

The readability of the generated RTL is not always good due to complex
enable equations.

The sequential equivalence engine has some limitations for partially
proven cases, which is an inherent limitation of formal engines.

My co-worker, Ahmed Jerraya, contributed significantly to this review.

    - Denis Dutoit
      CEA-Leti                                   Grenoble, France

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)