Editor's Note: Like most American-trained engineers, I have this horrible
  guilt associated with my education.  For many students, getting a BSEE
  is a study in survival.  At my school, for every 100 students going for a
  BSEE, only 33 got any sort of engineering degree after 4 or 5 years of
  study.  Who had time for "fluffy" classes like history or psychology when
  you had a killer exam in E&M field theory coming up?  "Enjoying" a
  philosophy class meant "enjoying" not being an engineer later on.

  So I must now *shamefully* confess years later that when I took the
  *required* Literature classes in college, I relied almost exclusively on
  those condensed Cliff Notes to quickly tell me what the stories were about
  rather than "waste" all that time reading 500 pages of some endless
  Russian novel.  I felt guilty, but it was either that or flunk Solid State
  Physics.  And now, with college students flooding Boston every September,
  it's every September I relive this guilt.  It's a pattern for me now.  I
  go buy one of the classics and tell myself I'm going to read it.  This time
  it was a collection of Melville's stories.  I start reading it at home.

    "The late John Jacob Astor, a personage little given to poetic
     enthusiasm, had no hesitation in pronouncing my first grand point
     to be prudence; my next method.  I do not speak it in vanity, but
     simply record the fact, that I was not unemployed in my profession
     by the late John Jacob Astor; a name which, I admit, I love to
     repeat, for it hath a rounded and orbicular sound to it, and rings
     like unto bullion.  I will freely add, that I was not insensible to
     the late John Jacob Astor's good opinion."

  I ask myself: "What the HELL did I just read?  This punctuation is
  murder!  Will I be a better person if I decypher it?"  I read some more.

    "As during the telling of the story, Captain Delano had once or twice
     started at the occasional cymballing of the hatchet-polishers,
     wondering why such an interruption should be allowed, especially in
     that part of the ship, and in the ears of an invalid; and, moreover,
     as the hatchets had anything but an attractive look, and the handlers
     of them still less so, it was, therefore, to tell the truth, not
     without some lurking reluctance, or even shrinking, it may be, that
     Captain Delano, with apparent complaisance, aquiesced in his host's
     invitation."

  That was just one sentance and I have NO CLUE what went on in it!  WHO DID
  WHAT TO WHOM AND WHY???  AAAARGH!  ....  And then I quietly put the book
  "away" (until next September when I start feeling guilty again.)

                                           - John Cooley
                                             the ESNUG guy

( ESNUG 251 Item 1 ) -------------------------------------------- [9/96]

Subject: ( ESNUG 249 #7 250 #2) What's The Best Way To Synth Multipliers?

> DO: constrain multipliers *accurately* (don't over or under constrain)
> and Design Compiler will do a good job meeting that constraint.  DON'T:
> flatten or remove hierarchy in a design with a multiplier.  Also, you
> will need a DesignWare licence to get the faster multipliers.


From: kurt@wsfdb.com (Kurt Baty)

Hi, John,

Here is a set of compiles of the DW02_mult, comparing the gate-count and
speed of the Carry-Save-Addition (CSA) multipliers versus the Wallace
architecture.  The two ASIC vendors were both 0.6 micron CMOS processes.
Each were compiled against both ASIC libs with worst-case industrial 
conditions and with set_input_drive, set_input and set_output loads and a
wire load table.  (These tables represent over 100 hours of SPARC 10/51
compute time.)

The peak differences between the two multiplier architectures happen when
they're multiplying two same sized (bit-wise) vectors.  This all goes away
when you have greatly different bit widths.  For example, there won't be
much differences between a Wallace tree and a CSA implementation if you
were multiplying 32 bits by 5 bits.  Therefore, all the data is based on
A_width and B_width being equal.

ASIC Vendor 1			

  A_width    CSA Multipliers        Wallace Trees
  B_width    area    speed          area   speed  faster %
    2 bits     64     4.68           100    7.41   -58%
    3         187     8.44           262    9.31   -10%
    4         271    11.68           392   11.32     3%
    5         433    12.93           550   12.54     3%
    6         568    15.01           720   15.18    -1%
    7         724    17.05           824   17.41    -2%
    8         904    18.96           984   18.46     3%
    9        1080    23.05          1290   18.25    21%
   10        1430    22.18          1619   19.55    12%
   11        1776    24.1           1898   19.76    18%
   12        1831    27.03          2178   22.11    18%
   13        2696    25.39          2439   21.23    16%
   14        2951    27.36          2736   24.68    10%
   15        3116    29.65          3185   24.08    19%
   16        3493    31.55          3658   25.51    19%
   17        3688    44.72          4077   27.41    39%
   18        4115    34.73          4541   25.84    26%
   19        4531    36.87          5055   26.27    29%
   20        4944    39.84          5589   26.08    35%
   21        5491    40.46          5804   29.02    28%
   22        5970    40.79          6382   29.99    26%
   23        6541    42.4           7135   29.31    31%
   24        6957    45.41          7389   34.69    24%

        [ No data for 25 to 27 bit widths. ]

   28        9493    50.97         10145   29.99    41%

        [ No data for 29 to 31 bit widths. ]

   32       12328    56.11         12791   36.35    35%


ASIC Vendor 2			

  A_width    CSA Multipliers        Wallace Trees
  B_width    area    speed          area   speed  faster %
    2 bits     73     4.86           114    6.92   -42%
    3         217     9.03           187    9.77    -8%
    4         293    12.43           338   12.72    -2%
    5         371    14.12           475   13.19     7%
    6         511    17.41           598   16.28     6%
    7         722    18.66           858   16.83    10%
    8         890    21.08          1137   17.71    16%
    9         995    23.17          1282   17.81    23%
   10        1409    25.75          1522   20.79    19%
   11        1411    29.27          1986   19.65    33%
   12        1818    29.57          2072   21.94    26%
   13        2009    32.29          2452   21.65    33%
   14        2292    34.19          2833   22.04    36%
   15        2550    37.08          3072   22.46    39%
   16        3168    37.63          3604   25.09    33%
   17        3150    41.29          3776   25.98    37%
   18        3807    41.36          4409   24.94    40%
   19        4054    44.31          4622   26.56    40%
   20        4475    46.92          5051   27.67    41%
   21        5073    48.02          5443   27.9     42%
   22        5135    51.74          5806   29.02    44%
   23        5895    52.22          6575   27.97    46%
   24        6409    54.57          7177   29.85    45%

        [ No data for 25 to 27 bit widths. ]

   28        8676    61.91          9177   30.44    51%

        [ No data for 29 to 31 bit widths. ]

   32       10871    70.69         12124   30.43    57%


These tables show that, starting at about eight bits, the Wallace tree 
architecture has a significant speed difference and has only up to about
ten percent increase in gate count.  (What's not shown is that I know the
effect of A_width not being equal to B_width would slightly diminish the
advantages of the Wallace architecture, though.)

The reason why you see a variation between these two ASIC libraries is the
relative difference in the speed of doing the majority veruses doing the
inputs to carry out on their adders.  As that ratio tightens there is less
speed gain.

  - Kurt Baty
    WSFDB Consulting


( ESNUG 251 Item 2 ) -------------------------------------------- [9/96]

Subject: (ESNUG 249 #5 250 #7)  I've Got CLI or SWIFT or VSS Memory Leaks!

> Yep!  I have seen memory leaks as well, but they have been related to 
> problems in OpenWindows.  I don't ever recall a memory leak problem in VSS,
> but if you have WAVES running, that could really suck the memory up.
> Check with your sysadmin to see if you have all of the OpenWindows patches
> that you need.  ...  We experienced the same memory problem in June, when
> we tried to use the release V3.4a of VSS.  A STAR was issued, but up to
> now this has not been solved yet.


From: ryan@fsd.com (Ken Ryan)

John,

This sounds like *exactly* what's happening to me.  Indeed, the problem shows
up in long runs that use SWIFT models, but does not in another version that
uses some old SourceModels instead.   I tried all the OpenWindows patches,
but it didn't help.  Sigh...  I guest I have to wait for v3.5 (or v3.6 or
v3.7...)

Thanks for confirming I'm not crazy!

  - Ken Ryan
    Orbital Sciences Corp.


( ESNUG 251 Item 3 ) -------------------------------------------- [9/96]

Subject: ( ESNUG 250 #10 ) Electrical Problems With "Translate" Command

> I was wondering if you have any experience with the translate command.  I
> have tried to use it on a gate-level design to convert from one standard
> cell library to another and I am running into a slight snag.  From small
> test cases I have run, the logical function of the final design appears to
> match the original design.  However, the electrical characteristics do not
> match.  The new design uses many minimal output drive gates in place of the
> higher output drive gates of the original design.  I have tried to use the
> derive_timing_constraints command before using translate, but the drive
> substitution still occurs.  After the translate, I have tried to run an
> incremental_mapping on the design, but not all of the minimal drive cells
> got upgraded.


From: Kusuma Arkalgud <kusuma@BayNetworks.COM>

John,

While converting a netlist from one ASIC Vendor Library to another, the
"translate" command only just does a logical one-to-one mapping of the cells
from the target library to the Source library.  "Translate" does not fix the
design rules for the converted netlist.  In other words, "translate" does
not not take into consideration the driving capabilities of the target
library cells.

On the other hand, a "translate" command followed by a "compile" command
fixes all the design rules and also optimizes your design.  If you don't
want to optimize your design but just want to fix all the design rules you
can use the command "compile -only_design_rule" after the "translate"
command.  This command will pick the cells with appropriate drive strengths.

  - Kusuma Arkslgud
    Bay Networks

      ----    ----    ----    ----    ----    ----    ----    ----

From: ryo.inoue@analog.com (Ryo Inoue)

John,

This user may have tried this already to eliminate the use of minimum size
gates, but just in case: use "set_max_transition" on the whole design.
If that does not help, use a brute force w/ "dont_use [library_element]"

  - Ryo Inoue
    Analog Devices


( ESNUG 251 Item 4 ) -------------------------------------------- [9/96]

Subject: (ESNUG 250 #6) *Always* Want "Selecting Critical Implementations"

> If I synthesize a 20 bit adder I get around 30ns performance, but if I
> use set_max_delay and compile, the timing is reduced to just over 7ns.
> This option enabled 'Beginning Resource Allocation' to use 'Selecting
> critical implementations', and so a Carry-Look-Ahead adder was picked
> from DW01, instead of a ripple adder.
>
> Why didn't I get the fastest adder from DW01 without set_max_delay?
> And what if the adder is buried around other logic, how do I use
> set_max_delay so that I don't get a ripple adder but Carry-Look_ahead?
> Or is there some secret Synopsys switch known only to the High Priests
> of Synopsys that will always enable 'Selecting critical implementation'?


From: celiac@teleport.com (Celia Clause)

John,

This Verilog example shows how to force a Carry Look Ahead incrementer, but
you can do the same thing for an adder:

   always @ (posedge clk or posedge reset)
   begin : b1
     /* synopsys resource r0:
        map_to_module = "DW01_inc",
        implementation = "cla",
        ops = "inc1";
     */
    if (reset) begin
        count = 0;
      end
    else if (enable) begin
        count = count + 1;   // synopsys label inc1
    end
   end

Synopsys uses designware to implement counters, adders, comparators, etc.
You can control the type of designware function used by inserting compiler
directives into your code.  This example forced a DW01_inc block to be
implemented as a Carry Look Ahead incrementer

  - Celia Clause
    RadiSys


( ESNUG 251 Item 5 ) -------------------------------------------- [9/96]

Subject: (ESNUG 249 #6 250 #5) FSM Treatment Doesn't Seem Coherent

> To check Design Compiler, using an identical FSM coding, I swapped
> the columns of my state coding.  I was sure to get an identical result,
> where the synthesized flipflops (SIG_st_sm_next_reg[0][1][2][3]) just
> changed their order.
> 
>   attribute ENUM_ENCODING of sm_next_state_type : type is 
>   -- order ABCD
>   --   "0011 0010 0001 1010 0000 0100 0101 1101 1100 1011 1111 1001";
>   -- order ADBC
>        "0101 0001 0100 1001 0000 0010 0110 1110 1010 1101 1111 1100";
> 
> I am very unhappy to see that the IDENTICAL synthesis script on an
> IDENTICALLY coded state machine produces DIFFERENT results!


From: Victor Preis <Victor.Preis@zfe.siemens.de>

Hi John,

One possible explanation for the different results could be the handling of
unspecified states.  The example used 12 states.  Coding these states with 4 
bits gives 16 states.  There are diffrent ways of handling the functionality
of these states depending on the modeling style.

Using flatten, Synopsys can otimize the dont care description for these
states.  To generate equivalent results this user must specify equivalent
functionality for these unspecified states.  Otherwise the optimization by
Design Compiler is not predictable.

  - Viktor Preis
    Siemens R&D


( ESNUG 251 Item 6 ) -------------------------------------------- [9/96]

Subject: (ESNUG 249 #3) Design Compiler Puts In Regs w/ D Inputs Tied Low???

>In one of my designs, I have a 32-bit registered output, the top 16 bits of
>which happen to be zero, e.g:
>
>     IF clk'EVENT AND clk = '1' THEN
>        IF cond THEN
>           output <= "0000000000000000" & val_a;
>        ELSE
>           output <= "0000000000000000" & val_b;
>        END IF;
>     END IF;
>
>Synopsys seems to insist on synthesising 16 registers with inputs tied to
>ground for the top 16 bits.  Is there any way to get Synopsys to blow the
>registers away i.e. have 'output(31 DOWNTO 16)' directly tied to ground? 

From: peer@iis.fhg.de (Dieter Peer)

John,

I like hardware description languages, and am happy to see Synopsys behave
like this.  Your statements are placed inside the the if...endif.  So the
16 zeros can *only* appear after the first (clk'EVENT AND clk =3D '1'), as
you described it. This is the correct result of synthesis and fortunately
remains so after incremental compiles.

On the other hand: What output of your 32-bit-bus would you like to see
*before* the very first clock event?

  - Dieter Peer
    Fraunhofer-Gesellschaft

      ----    ----    ----    ----    ----    ----    ----    ----

From: Martin Radetzki <radetzki@offis.uni-oldenburg.de>

Dear John,

Registers seem to be holy to logic synthesis - once inferred, they last
forever. I've tried, for example, retiming without success.  It should be
possible to write a dc_shell script to detect & remove registers tied to
GND/VDD.

  - Martin Radetzki
    OFFIS Research Institute

      ----    ----    ----    ----    ----    ----    ----    ----

From: Oren Rubinstein <oren@waterloo.hp.com>

Hello again, John.

I agree DC should get rid of the constant flops, but it doesn't.  There are
two cases when it can do it:

   1. Logic minimization of *combinational* gates.
   2. Eliminating gates whose outputs are unconnected.

Your example doesn't fall into the first category, because the MUX is before
the flops.  To achieve what you want, you need to re-arange your code, so
the selector works only on the lower bits.  (I assume you didn't do this
because you wanted a more general case; if so, you can have a second
selector after the flops to make some bits "read-only".)

In other words, you want to have a row of flops, followed by a row of
2->1 MUXes which select between the flop and a constant (for each bit)
If the controls are also constant, DC eliminates the MUXes and the flops
that were not selected.

  - Oren Rubinstein
    Hewlett-Packard (Canada) Ltd.

      ----    ----    ----    ----    ----    ----    ----    ----

From: chang@elvis.ds.boeing.com (Kou-Chuan Chang)

I think that is due to the VHDL code. The signal output get assigned inside
the "clock edge check" and it is 32-bit wide.  Also, the signal may be
assigned outside of the "clock edge test" for the asynchronous reset.

   if (RSTn = '0') then
      output <= (output'range => '0');
   elsif (CLK'event and CLK = '1') then
      if COND then
         output <= "0000000000000000" & val_a;
      else
         output <= "0000000000000000" & val_b;
      end if;
   end if;

To have the msb 16-bit without the flipflop, you can try
   
   if ((CLK'event and CLK = '1') then
      if COND then
         output16 <= val_a;
      else
         output16 <= val_b;
      end if;
   end if;

In another concurrent statement

   output <= "0000000000000000" & output16;

Hope this helps.

  - Kou-Chuan Chang
    Boeing

      ----    ----    ----    ----    ----    ----    ----    ----

From: Andy Chomyn <Andy.Chomyn@proteon.com>

John, It's a question of style (as always).  Try this outside your process
statement:

      output(31 downto 16) <= "0000000000000000" ;

Then your process becomes:

     IF clk'EVENT AND clk = '1' THEN
        IF cond THEN
           output(15 downto 0) <= val_a;
        ELSE
           output(15 downto 0) <= val_b;
        END IF;
     END IF;

To be more fancy (or neat) you could alias output(31 downto 16) as 
output_upper_word  and output(15 downto 0) as output_lower_word.   

  - Andy Chomyn
    Proteon


( ESNUG 251 Item 7 ) -------------------------------------------- [9/96]

Subject: ( ESNUG 250 #3 ) Need Control Of Synopsys SDF File Generation

> The problem is that Synopsys writes out an SDF file where all three timing
> parameters (min/typ/max) are identical because it calculates the delays
> based on the last "set_operating_conditons" command.  ... We want to write
> one SDF file where "min" delays are calculated w/ "best_case_commercial"
> operating conditions, "typ" delays are made from "typical_case" operating
> conditions, and "max" delays are calculated with "worst_case_conditions".


From: "David C. Black" <dblack@ink.apple.com>

John,

This user should be careful about using min/max values for ASIC simulations.
It may not be valid to run a min/max simulation depending on the type of
min/max data you use.

There are two types of min/max.  First, there is overall prodcut min/max as
considered over temperature, voltage, and process.  Second, there is single
device min/max within a single part for a specified operating condition.  A
min/max spread for with a single chip will be very small; whereas, it is
quite large for a chip to chip comparison.

Simulating one gate running fast at 5.5v and 110 degF on one corner of the
die with another running slow at 4.5v and -10 degF can be quite erroneous.

This is probably the reason Synopsys takes this approach.

On the other hand, it is nice to be able to quickly running an entire
simulation at min, and then run another simulation at max.  Using two files
is perhaps a hassle, but maybe safer in Synopsys' view.

  - David C. Black
    Apple Computer 


( ESNUG 251 Item 8 ) -------------------------------------------- [9/96]

From: Greg Gravunder <gregg@jdc.csg.mot.com>
Subject: Test Compiler Seems "Moody" Concerning Test_Mode and Reset_n

John,

I have a question about disabling asynchronous resets during test mode.  Test
Compiler requires all asynchronous resets to be disabled during test mode.
Should this include our primary input "reset_n" ?  I have included this
gating and I am still having problems with Test Compiler.  The problems
result on only a small amount of the flip-flops and seems to depend on how
Test Compiler generates the logic for this gating.  I have always thought
that strobing reset should reset all of the flip flops in the design even if
test_mode (another primary input) is a '1'.

Assume active low asynchronous reset for the flip flop and when signal1 = '1'
then we want to reset the flop.  There is no problem when:

           (reset_n and (test_mode or not signal1))

There is a problem when:

      ((test_mode nand reset_n) nand (reset_n nand signal1))

Even though Test Compiler is infering an asynchonous reset port on reset_n.
It is not able to realize that reset_n will be a '1' when shifting in and
out data.  I have been told by a Synopsys FAE that test_mode= '1' should not
allow reset_n to reset the flip flop although this has not been sitting well
with me.  Any advice on ESNUG would be greatly appreciated.

  - Greg Gravunder
    Motorola


( ESNUG 251 Networking Section ) -------------------------------- [9/96]

Austin, TX -- Crystal Semiconductor. seeks engineers w/ VHDL, Synopsys,
& Cadence for mixed signal design.  NO agencies!  "mmh@crystal.cirrus.com"

Boulder, CO -- Brooktree seeks Verilog/Synopsys ASIC designers to design
telecom products.  PLEASE, no headhunters/agencies.  "chrisk@brooktree.com"


 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)