Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS

   Editor's Note: Going through some e-mail archives, I stumbled across
   a very angry response from an EDA user to one of my columns.  Since
   turnabout is fair play, I felt it's my duty to republish the heart of
   that objectionable column along with this angry EDA user's response.

   That is, I believe that since I'm often the one publishing letters
   critical of EDA vendors; it's only fair that I publish the letters
   critical of me.  Sorry about the delay in publishing it.  (See Item 3.)

                                              - John Cooley
                                                the ESNUG guy

( ESNUG 274 Item 1 ) --------------------------------------------- [12/97]

From: [ Keeping My Synopsys Sales Person In The Dark ]
Subject: How To Use Latches Without Buying Pricey Copies Of DC-Expert

John,

I am currently working with XXXXXXXXXX.  He suggested that I post this
suggetion to ESNUG, but anonymously, to stay on good terms with our
Synopsys sales person (they probably don't look favorably on tips that
allow users to avoid having to buy licenses).

In some cases, a designer is forced to use latches (in our case, to
interface to our vendors RAM macros).  Synopsys, as you know, requires
the use of DC-Expert to analyze the timing of latches.  Its not worth
it to us to buy N DC-Expert licenses just to handle this one simple case.
Especially since the timing of the latch is our case is like that of a
master latch of a flip-flop and the timing of its D input can be
approximated by the setup of a D-flip-flop.

After the instantiation of the latch in your rtl code, you can embed
the following code:

//  dc_shell script commands to disable timing on latches so that 
//     DC-Expert won't get invoked
//  synopsys dc_script_begin
//  set_disable_timing find (cell , LATCH_INSTANCE_NAME*)
//  create_clock find(port, clk) -period CLK_PERIOD
//  set_output_delay 0.5 -clock clk find (pin, LATCH_INSTANCE_NAME*/LATCH_CELL_DATA_INPUT_PORT)
//  synopsys dc_script_end

The set_output_delay command checks that you'll meet the setup time
of the latch (roughly speaking ... of course you can fine tune this
value to more closely match that of the real thing if need be).

The create_clock command is needed here so that the set_output_delay 
command will work.  you can override the value of the clock period
later in a contraint script.

  - [ Keeping My Synopsys Sales Person In The Dark ]

( ESNUG 274 Item 2 ) --------------------------------------------- [12/97]

Subject: ( ESNUG 272 #9 273 #3 ) Floating DW Inputs, & Module Compiler

> I am using Design Compiler for my synthesis in VHDL.  I have a matrix
> multiplication in my design.  It is a 3 by 3 matrix.  My coefficients as
> well as the inputs are signed.  For one sum of partial products I use:
> 
>       a <= signed(c1) * signed(in1) +
>            signed(c2) * signed(in2) +
>            signed(c3) * signed(in3);
> 
> Q1) DC does not recognize "signed" and thus when it uses designware for
>     the mutlipliers, it does'nt connect the "TC" (two's complement) to high
>     -- it just lets it float.  (Floating inputs??!!!)  The way I got around
>     it is to instantiate the DW02_mult and "force" the TC bit high.  Is
>     there any other, more automatic way to solve this problem beyond hand
>     instantiating & baby-sitting DW parts?
> 
> Q2) When I use the above equation, I break it into partial products, i.e.
> 

From: "Peter A. Ruetz" <PeterRuetz@california.com>

John,

This does not answer the question directly, but have you tried using Module
Compiler?  It excels at arithmetic computations like FIR filters and matrix
multipliers and handles signed/unsigned data correctly and easily.

Something like the inner product above is coded as:

    wire signed [ ] in1,in2,in3;
    ...
    A=c1*in1+c2*in2+c3*in3;

MC does not use DesignWare, and hence there should be no trouble with TC
control signals.  It builds the multipliers optimized for the types of
inputs provided (any combination of signed and unsigned) rather than trying
to optimize a programmable multiplier with a TC input.    Hence, when you
change the formats in the wire declaration, the multipliers change
automatically.

I have used MC quite a bit and have had success with circuits like this.
Note: you can't enter VHDL, though, it has it's own language.  Although it
is a "non-standard" language, you can quite efficiently describe most
datapath type problems (like the one above).

Just an idea,

  - Peter Ruetz
    Ammonite Design Systems

( ESNUG 274 Item 3 ) --------------------------------------------- [12/97]

Subject: Late Angry Response To "I Have Found The Enemy" Column

jcooley@world.std.com (John Cooley) wrote:
>
> ... Nick Summerville of Ford Microelectronics, wrote: "Personally, I hope
> someone succeeds in jostling Synopsys.  Their algorithms are starting
> to show their age.  The compile sequence appears to get trapped in local
> minima a lot, and many of their latest enhancements are simply bolt-ons.
> Perhaps this is why they're being challenged?  Wounded animal, perhaps?"
>
>  ... at a Boston training class for all the new bells and whistles inside
> the latest 1997.01 rev of Synopsys, I suddenly realized how wrong Nick
> Summerville was.  As a fellow grizzled old veteran Synopsys user, over the
> years, I must have had to sit through this type of training for at least
> 20 revs of Synopsys.  It hit me when the instructor put up a slide showing
> the roughly 10 to 25 percent improvement (timing wise) rev 1997.01 had
> over rev 3.5a with the strongly worded caveat of "WARNING: you MUST update
> your synthesis scripts to get these results!"  When the instructor polled
> the audence of 200 engineers about how often they updated their scripts,
> the vast majority indicated they were still four to seven revs behind the
> current rev!
>
> Kung Fu flashback again: Joe's outlining how customers "punish" EDA
> companies with successful products.  "Once you have a successful product,
> EDA users won't allow the company that provides that product to reap any
> financial benefit from that product beyond initial sales," Joe effectively
> said, "But, if you took the exact same product that's been incrementally
> improved and sell it from a start-up, customers are happy pony up mondo
> dollars for it."
>
> Taking Joe's insight further, I realized we users also don't think twice
> about ramping up to use a new tool but will make all sorts of noise if we
> have to relearn how to use an old tool.  Four to seven revs behind!  Damn!
> I hate it when the EDA vendor bigwigs are right!  Damn!  Pogo was right,
> too!  "I have found the enemy, and it is us."
>
>   - John Cooley
>     EE Times Columnist

From: [ You've Hit My _HOT_ Button ]

Alright John, you've hit my _HOT_ button.

In printing this you'd better keep me anonymous as Synopsys is barely on
speaking terms with me as it is.  (Perhaps something to do with 38 SOLVIT
calls resulting in 23 STARs in the last 6 months...)

Anyway, Synopsys is taking a slightly different approach to _FORCING_
users to pay more money for things they are _ALREADY_ paying support for.

The first that I became aware of was the 'one-pass' compile for test.
The -scan option to the compile command that basically substituted scan
versions of sequential elements for the non-scan versions during the
initial mapping stage.  Thus, a user did not have to compile with the
associated time to meet constraints and then run insert_scan and have
to recompile to meet constraints again.  Although "compile -scan" did
_NOT_ hookup the scan chain; did _NOT_ perform any test checking; did
_NOT_ produce any test patterns; it _DID_ require a "test_compiler"
license.  Now our sales person made a policy of selling only 1 "test_
compiler" license for every 3 or 4 design_compiler licenses.  So he
comes to us and offers to sell us 'special' test_compiler licenses to
balance out the quantities for _ONLY_ $25,000 per license.  Gee, thanks!
We still use compile, insert_scan instead of "compile -scan" because of
the licensing issue.

In 1996 we received 4 software releases.  In 1997 there are going to be
only 2 software releases:  1997.01 and 1997.08.  Are we paying 1/2 the
maintenance?  No, we are just getting 1/2 of the updates.  [Then again,
maybe this isn't all bad.  They'll have time to test the software instead
of having the users do it.]

Now they are making a _BIG_ deal out of the new "PrimeTime" product.
They admit that the timing analyzer in Design Compiler just isn't good
enough for 0.25 micron and beyond.  So they are working to fix it, right?
WRONG, silicon breath, they are providing a _NEW_ tool for new licensing
fees and leaving the old timing analyzer in DC.  Now you run your synthesis,
and check your timing using PrimeTime; then go back to synthesis to fix the
problems discovered by PrimeTime; then ...  Now we have another synthesis
iteration loop because Synopsys won't embed PrimeTime in DC.  (If they did
that, they couldn't charge a separate license fee could they?)  ((Maybe
they could...  a "compile -time" semantic?  Without the -time switch the
old timing analyzer would be used.  The -time switch would get the new,
additional cost license for accurate results.))

A walled-city or a wounded animal?  I believe Synopsys is a rabid beast
and I cannot wait until they have real competition.

Other _HOT_ buttons:  Synopsys claims to be a leader; why are they still
not supporting the current VHDL standard in simulation or synthesis?

You mentioned Synopsys spent 25% of their income on R&D.  Why is this so
_LOW_?  What else do they have to spend the income on?  They have almost
0% manufacturing cost.  What's the ratio of a 5$ CDROM to a $250K software
order?  I'd expect a 40% to 50% R&D engineering investment in this type
of technology product.  If 1/2 of their employees are not in the engineering
organization then they are not trying to stay a technology leader.  They
should have 50% engineering; 25% customer support; 15% sales/marketing;
10% corporate overhead.

OK, enough of 1 geek's opinion.  Keep stirring the pot, John.  Eventually,
something interesting will be raked up from the bottom.

  -  [ You've Hit My _HOT_ Button ]

( ESNUG 274 Item 4 ) --------------------------------------------- [12/97]

From: mark_indovina@pts.mot.com (Mark Indovina)
Subject: Script To Call Synopsys On-Line Help W/O Tying Up A Licence

Hi John,

I don't know if this was discussed before, but a while back I got tired of
typing the explicit path to the Synopsys On-Line help (and chewing up a
Synopsys license just to start the help viewer).  Anyway, I thought this
script might come in handy to others; it's not much but it works.

  #!/bin/csh -f
  set ileaf = iview1/bin/iview
  if (${?SYNOPSYS} == 0) then
    set root = $0
    set root = ${root:h}
    set root = ${root:h}
    set root = ${root:h}
    set root = ${root:h}
  else
    set root = ${SYNOPSYS}
  endif
  set viewer = ${root}/${ileaf}
  ${viewer}

It should optimally be installed in the user's ${SYNOPSYS}/${ARCH}/syn/bin
directory.

 - Mark A. Indovina
   Motorola

( ESNUG 274 Item 5 ) --------------------------------------------- [12/97]

Subject: ( ESNUG 261 #7 ) Creating A Single Cycle Write With An Asynch Memory

> We see a problem with doing Asynchronous memory accesses in single-clock
> cycle.  The question is : Is it safe to use both clock edges to generate
> write enable (gate it with the clock) to the memory.
>  
>      0      1       2          3
>    --       ---------          ----------           
>      \_____/         \________/          \_______      Clock
> 
>    ------------------           ----------
>                      \_________/                        Wr_Enbl
> 
>    -------  ---------------------  ----------
>    _______X _____VALID___________X __________          ADDR/Data
> 
> 
> The problem is at edge 3 where hold time on addr/data will be entirely
> dependent on buffers/routing delays.  Another problem is that when we use
> both clock-edges, there's a restriction on the duty cycle of the clock.
> These problems could be dealt with by trying to meet these by adding delay
> lines/buffers - but this is not a reliable solution.
> 
> One solution we can think of is to double the memory width so that data
> from 2 clocks can be written at a time, allowing synchronous write enable.
> 
> Any ideas, (other than increasing the memory width), are greatly
> appreciated.
> 
>   - N. Chandra
>     Synopsys

From: Steve Masteller <masteller@crmail.indy.tce.com>

Hello, John,

Try using the other side of the clock for your write pulse for single cycle
asynchronous writes as shown below.

        0      1       2          3
      --       ---------          ----------
        \_____/         \________/          \_______      Clock

      ---------           --------------------------
               \_________/                                Wr_Enbl

      -------  ---------------------  --------------
      _______X _____VALID___________X ______________      ADDR/Data

The write enable can be generated glitch free with an active low latch on
the enable followed by a nand gate to generate the Wr_Enbl.  Hold time is
generally not a problem since almost half a clock cycle should be available.
An ideal clock can be used to calculate setup times since any delays in the
gated clock will help rather than hinder setup time.  Of course, the duty
cycle restriction remains.  Finally, if you wish to avoid gated clocks, I
believe your only options are to double the memory width, as you mentioned,
or double the clock frequency.

   - Steve Masteller
     Thomson Multi-Media

         ----    ----    ----    ----    ----    ----   ----

From: Michael Ericson <ericson@nexen.com>

John,

Using the falling edge of the clock to generate the asserting edge of the
write pulse may cause pulsewidth violations depending on the speed of your
clock and memories.  You may suffer pulsewidth shrinkage due to variations
of the duty cycle and differences between the rise and fall times of your
output buffers.

We just finished several chips with single-cycle memory accesses; we had
two solutions depending on what was available on the chip.  These memory
interfaces proved to be the most challenging areas to verify timing in
the designs, but I guess this would be expected considering the nature
of the problem.

Solution #1:

Create a delayed version of the clock and gate it with the chip clock
to build the pulse:

                               1
   --       ---------          ----------           
     \_____/         \________/          \_______      Clock

               2
   ------       ---------          ----------           
         \_____/         \________/          \___      Delayed_Clock

              ------------------
   __________/                  \________________      Write_Enable

                 3              4
   --------------               -----------------
                 \_____________/                       Wr_Enbl_Pulse

The logic for the pulse is:

	~ ((~ Clock | Delayed_Clock) & Write_Enable)

Edge 2 of Delayed_Clock creates the asserting edge (3) of the pulse; edge 1
of Clock creates the deasserting edge (4) of the pulse.  The critical path
is from Clock to Wr_Enbl_Pulse; the shorter you make this path, the better
your address and data hold margins will be.

The advantages of this method are easy implementation and the use of only
the rising edge of the clock to create both edges of the pulse.  The
disadvantage is the fact that the delay on the clock scales with process,
temperature, and voltage; because of this, you may run into address setup
problems if you make the delay shorter simultaneously with pulsewidth
problems if you make the delay longer.  If your clock is 50MHz or slower
and your process is 0.6 micron or smaller and your SRAM is 12ns or faster,
you should be able to avoid this situation.

When doing static timing verification, you must ensure that Write_Enable
is always faster than the rising edge of Delayed_Clock and slower than
the rising edge of Clock to avoid glitches.  Also, realize that similar
logic will be required for the data output enable pulses and that the
asserting edges of these pulses may come fairly close to the beginning
of the clock cycle in the min case.  Because of this, you may want to make
sure that a dead cycle is inserted when going from a read cycle to a write
cycle to avoid contention on the data bus (write->read is easier to
accomplish since it takes longer to turn on the SRAM data bus than to turn
off the ASIC data bus).  We avoided this dead cycle for one of our chips
because the interface was shared among three engines and we needed the
bandwidth, but the timing verification was time-consuming.

If you're using VTI, they created a programmable delay cell for us and are
familiar with this implementation in their Boston-area technical center.
The programmable delay cell made it convenient for us to tweak the timing
late in the backend process with minimal schedule impact.  Also, the
b1->zn path through their fn03d2 macrocell may be used to for very fast
timing between Clock and the input of the output pad for Wr_Enbl_Pulse.
(the fn03d2 cell needs to be placed next to the pad, though)

Solution #2:

If you are using a PLL, you can create a 2x version of the clock and
use its falling edge to create a delayed version of the clock (looks
much like the one above).  To create a 2x version of the clock, send
the output clock of the PLL through a divide-by-two circuit and use
the divided clock as the feedback clock.  The PLL will speed up its
output clock until the feedback clock matches the frequency of the
reference clock; the result will be a clock on the output of the
PLL with a frequency 2 times that of the reference clock.

The advantage of this method is tighter control over the skew of the
delayed clock through variations in process, temperature, and voltage.
The disadvantage is the complexity of the clock circuit and resolving
the skew between the two clock domains.  If you don't need the 2x clock
for anything else in your design, it may not be worth the effort; in
any case, I would try solution #1 first.

The good news is that both solutions are working in our labs right now.

  - Mike Ericson
    Nexion, Inc.

( ESNUG 274 Item 6 ) --------------------------------------------- [12/97]

Subject: (ESNUG 271 #4 273 #4)  How BC Handles Designs With Handshaking

>> Maybe it's been a while since I've looked at BC, or maybe BC has improved,
>> but here's the scenario that I remember which BC could not tackle and why
>> I say that BC fails in the basics of hierarchical design.  I code up a BC
>> block that may / may not generate a data item on every cycle, so I have a
>> signal called "valid" that accompanies the data output.  Since my blocks
>> are symmetrical, my next block can take input on every cycles, but once
>> in a while it can't.  So, my next block has a handshaking signal going
>> back to the first block called "got_it".

From: [ A Synopsys BC CAE ]
> There are two basic BC coding approaches to this scenario.
> 
> First, "Producing" block produces a data_valid output along with the
> output_data itself.  (This is very easy to do within BC.)  ...

From: Oren Rubinstein <oren@gigapixel.com>

The code for the "producing" block is not checking the "i_am_ready" signal,
so it doesn't do what is required.

From: [ A Synopsys BC CAE ]
> Second, the "receiving" block needs to let the "producing" block know
> whether it received the data.  (This is, again, very easy to do in BC.)

From: Oren Rubinstein <oren@gigapixel.com>

The code for the "receiving" block doesn't seem to generate the "i_am_ready"
signal, unless it's the inversion of "stall_producing_block"

More to the point, in order to be able to do double data pacing a la PCI, you
would need what is called "fast handshaking" in BC terminology, except:
1. You can't do fast handshaking for both input and output at the same time
2. Fast handshaking requires an additional clock after the "fast" one
Therefore, the best you can do with BC is one data on every other clock.

Just to clarify, here is a timing diagram of what PCI needs:
             _   _   _   _   _   _
   CLOCK   _| |_| |_| |_| |_| |_| |_
           __         _______     __
   IRDY#     \_______/       \___/
           __         ___         __
   TRDY#     \_______/   \_______/
                ^   ^           ^
                |   |           |
                  Data transfers

BC won't be able to do the second data transfer in the diagram, because it
needs at least one clock after the first data.

From: [ A Synopsys BC CAE ]
> We (the BC team) have been toying with the idea of allowing users to define
> a pragma/compiler directive that causes BC not to register particular
> outputs.  We have concerns that if users are not careful they can cause
> their designs to fail.  (For example if I remove the register for my
> data_valid signal (above) and not my output_data signal, then my
> handshaking protocol will no longer work.)
> 
> What do users think?  Is this something we should do, and if so why?

Yes, because the tool should not restrict the designers.
You have to assume people know what they're doing (if they don't, there
are so many ways to do bad designs that one more won't matter)

  - Oren Rubinstein
    GigaPixel Corp.

( ESNUG 274 Item 7 ) --------------------------------------------- [12/97]

Subject: ( ESNUG 273 #9 ) Anyone Create An Xemacs For dc_shell Scripting?

> As a new (but intensive) writer of dc-shell scripts, I'm wondering if
> anybody has ever found (or created) an Xemacs dc-shell editing mode (with
> highlighting, auto-indent, ...)
>
>  - Diego COSTE
>    Hewlett-Packard  Grenoble, France

From: "Hartley Horwitz" <harts@nortel.ca>

Dear John,

In response to Diego's request for an emacs/Xemacs mode for dc_shell
scripts: I have an (X)emacs mode that seems to work quite well for the type
of scripts that I write.  It does perform highlighting, indentation, comment
aligning, etc. About 5 other users from other companies and some of my
colleagues are currently using it.

I have not updated the mode for 97.01 or 97.08 releases.  This will be done
next year. If anyone is interested in this mode, send me an email, and its
yours, bugs and all!

  Hartley Horwitz
  Nortel,   Ottawa, Canada

( ESNUG 274 Item 8 ) --------------------------------------------- [12/97]

From: erez@taux01.nsc.com (Erez Naory)
Subject: What's Better?  "Rich" Or "Sparce" Synopsys Libraries?

Hi, John.

I have been involved in an effort of generating Synopsys libraries.

The target is to be able to compile a design to the fastest speed.  One of
the questions that arises is how "rich" the library should be?  Some
libraries out there have over 1000 cells, but, on the other hand, some have
much fewer cell selection (~300).  I know, theoretically, more is better,
but I have seen cases where it does not apply.  For example, we had a
complex gate that had (by mistake) only one size.  Removing the cell made
the design synthesize to a faster speed.  It seems like Design Compiler
picked up this cell, assuming it could bump up the size, but then when it
found out it cannot do that, it could not recover.

Does any one has any experiece with this?  Is there a DC runtime penalty
for using the rich libraries vs sparce libraries?

  - Erez Naory
    National Semiconductor   Hertzlia, Israel

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)