Editor's Note: Like most American-trained engineers, I have this horrible
guilt associated with my education. For many students, getting a BSEE
is a study in survival. At my school, for every 100 students going for a
BSEE, only 33 got any sort of engineering degree after 4 or 5 years of
study. Who had time for "fluffy" classes like history or psychology when
you had a killer exam in E&M field theory coming up? "Enjoying" a
philosophy class meant "enjoying" not being an engineer later on.
So I must now *shamefully* confess years later that when I took the
*required* Literature classes in college, I relied almost exclusively on
those condensed Cliff Notes to quickly tell me what the stories were about
rather than "waste" all that time reading 500 pages of some endless
Russian novel. I felt guilty, but it was either that or flunk Solid State
Physics. And now, with college students flooding Boston every September,
it's every September I relive this guilt. It's a pattern for me now. I
go buy one of the classics and tell myself I'm going to read it. This time
it was a collection of Melville's stories. I start reading it at home.
"The late John Jacob Astor, a personage little given to poetic
enthusiasm, had no hesitation in pronouncing my first grand point
to be prudence; my next method. I do not speak it in vanity, but
simply record the fact, that I was not unemployed in my profession
by the late John Jacob Astor; a name which, I admit, I love to
repeat, for it hath a rounded and orbicular sound to it, and rings
like unto bullion. I will freely add, that I was not insensible to
the late John Jacob Astor's good opinion."
I ask myself: "What the HELL did I just read? This punctuation is
murder! Will I be a better person if I decypher it?" I read some more.
"As during the telling of the story, Captain Delano had once or twice
started at the occasional cymballing of the hatchet-polishers,
wondering why such an interruption should be allowed, especially in
that part of the ship, and in the ears of an invalid; and, moreover,
as the hatchets had anything but an attractive look, and the handlers
of them still less so, it was, therefore, to tell the truth, not
without some lurking reluctance, or even shrinking, it may be, that
Captain Delano, with apparent complaisance, aquiesced in his host's
invitation."
That was just one sentance and I have NO CLUE what went on in it! WHO DID
WHAT TO WHOM AND WHY??? AAAARGH! .... And then I quietly put the book
"away" (until next September when I start feeling guilty again.)
- John Cooley
the ESNUG guy
( ESNUG 251 Item 1 ) -------------------------------------------- [9/96]
Subject: ( ESNUG 249 #7 250 #2) What's The Best Way To Synth Multipliers?
> DO: constrain multipliers *accurately* (don't over or under constrain)
> and Design Compiler will do a good job meeting that constraint. DON'T:
> flatten or remove hierarchy in a design with a multiplier. Also, you
> will need a DesignWare licence to get the faster multipliers.
From: kurt@wsfdb.com (Kurt Baty)
Hi, John,
Here is a set of compiles of the DW02_mult, comparing the gate-count and
speed of the Carry-Save-Addition (CSA) multipliers versus the Wallace
architecture. The two ASIC vendors were both 0.6 micron CMOS processes.
Each were compiled against both ASIC libs with worst-case industrial
conditions and with set_input_drive, set_input and set_output loads and a
wire load table. (These tables represent over 100 hours of SPARC 10/51
compute time.)
The peak differences between the two multiplier architectures happen when
they're multiplying two same sized (bit-wise) vectors. This all goes away
when you have greatly different bit widths. For example, there won't be
much differences between a Wallace tree and a CSA implementation if you
were multiplying 32 bits by 5 bits. Therefore, all the data is based on
A_width and B_width being equal.
ASIC Vendor 1
A_width CSA Multipliers Wallace Trees
B_width area speed area speed faster %
2 bits 64 4.68 100 7.41 -58%
3 187 8.44 262 9.31 -10%
4 271 11.68 392 11.32 3%
5 433 12.93 550 12.54 3%
6 568 15.01 720 15.18 -1%
7 724 17.05 824 17.41 -2%
8 904 18.96 984 18.46 3%
9 1080 23.05 1290 18.25 21%
10 1430 22.18 1619 19.55 12%
11 1776 24.1 1898 19.76 18%
12 1831 27.03 2178 22.11 18%
13 2696 25.39 2439 21.23 16%
14 2951 27.36 2736 24.68 10%
15 3116 29.65 3185 24.08 19%
16 3493 31.55 3658 25.51 19%
17 3688 44.72 4077 27.41 39%
18 4115 34.73 4541 25.84 26%
19 4531 36.87 5055 26.27 29%
20 4944 39.84 5589 26.08 35%
21 5491 40.46 5804 29.02 28%
22 5970 40.79 6382 29.99 26%
23 6541 42.4 7135 29.31 31%
24 6957 45.41 7389 34.69 24%
[ No data for 25 to 27 bit widths. ]
28 9493 50.97 10145 29.99 41%
[ No data for 29 to 31 bit widths. ]
32 12328 56.11 12791 36.35 35%
ASIC Vendor 2
A_width CSA Multipliers Wallace Trees
B_width area speed area speed faster %
2 bits 73 4.86 114 6.92 -42%
3 217 9.03 187 9.77 -8%
4 293 12.43 338 12.72 -2%
5 371 14.12 475 13.19 7%
6 511 17.41 598 16.28 6%
7 722 18.66 858 16.83 10%
8 890 21.08 1137 17.71 16%
9 995 23.17 1282 17.81 23%
10 1409 25.75 1522 20.79 19%
11 1411 29.27 1986 19.65 33%
12 1818 29.57 2072 21.94 26%
13 2009 32.29 2452 21.65 33%
14 2292 34.19 2833 22.04 36%
15 2550 37.08 3072 22.46 39%
16 3168 37.63 3604 25.09 33%
17 3150 41.29 3776 25.98 37%
18 3807 41.36 4409 24.94 40%
19 4054 44.31 4622 26.56 40%
20 4475 46.92 5051 27.67 41%
21 5073 48.02 5443 27.9 42%
22 5135 51.74 5806 29.02 44%
23 5895 52.22 6575 27.97 46%
24 6409 54.57 7177 29.85 45%
[ No data for 25 to 27 bit widths. ]
28 8676 61.91 9177 30.44 51%
[ No data for 29 to 31 bit widths. ]
32 10871 70.69 12124 30.43 57%
These tables show that, starting at about eight bits, the Wallace tree
architecture has a significant speed difference and has only up to about
ten percent increase in gate count. (What's not shown is that I know the
effect of A_width not being equal to B_width would slightly diminish the
advantages of the Wallace architecture, though.)
The reason why you see a variation between these two ASIC libraries is the
relative difference in the speed of doing the majority veruses doing the
inputs to carry out on their adders. As that ratio tightens there is less
speed gain.
- Kurt Baty
WSFDB Consulting
( ESNUG 251 Item 2 ) -------------------------------------------- [9/96]
Subject: (ESNUG 249 #5 250 #7) I've Got CLI or SWIFT or VSS Memory Leaks!
> Yep! I have seen memory leaks as well, but they have been related to
> problems in OpenWindows. I don't ever recall a memory leak problem in VSS,
> but if you have WAVES running, that could really suck the memory up.
> Check with your sysadmin to see if you have all of the OpenWindows patches
> that you need. ... We experienced the same memory problem in June, when
> we tried to use the release V3.4a of VSS. A STAR was issued, but up to
> now this has not been solved yet.
From: ryan@fsd.com (Ken Ryan)
John,
This sounds like *exactly* what's happening to me. Indeed, the problem shows
up in long runs that use SWIFT models, but does not in another version that
uses some old SourceModels instead. I tried all the OpenWindows patches,
but it didn't help. Sigh... I guest I have to wait for v3.5 (or v3.6 or
v3.7...)
Thanks for confirming I'm not crazy!
- Ken Ryan
Orbital Sciences Corp.
( ESNUG 251 Item 3 ) -------------------------------------------- [9/96]
Subject: ( ESNUG 250 #10 ) Electrical Problems With "Translate" Command
> I was wondering if you have any experience with the translate command. I
> have tried to use it on a gate-level design to convert from one standard
> cell library to another and I am running into a slight snag. From small
> test cases I have run, the logical function of the final design appears to
> match the original design. However, the electrical characteristics do not
> match. The new design uses many minimal output drive gates in place of the
> higher output drive gates of the original design. I have tried to use the
> derive_timing_constraints command before using translate, but the drive
> substitution still occurs. After the translate, I have tried to run an
> incremental_mapping on the design, but not all of the minimal drive cells
> got upgraded.
From: Kusuma Arkalgud <kusuma@BayNetworks.COM>
John,
While converting a netlist from one ASIC Vendor Library to another, the
"translate" command only just does a logical one-to-one mapping of the cells
from the target library to the Source library. "Translate" does not fix the
design rules for the converted netlist. In other words, "translate" does
not not take into consideration the driving capabilities of the target
library cells.
On the other hand, a "translate" command followed by a "compile" command
fixes all the design rules and also optimizes your design. If you don't
want to optimize your design but just want to fix all the design rules you
can use the command "compile -only_design_rule" after the "translate"
command. This command will pick the cells with appropriate drive strengths.
- Kusuma Arkslgud
Bay Networks
---- ---- ---- ---- ---- ---- ---- ----
From: ryo.inoue@analog.com (Ryo Inoue)
John,
This user may have tried this already to eliminate the use of minimum size
gates, but just in case: use "set_max_transition" on the whole design.
If that does not help, use a brute force w/ "dont_use [library_element]"
- Ryo Inoue
Analog Devices
( ESNUG 251 Item 4 ) -------------------------------------------- [9/96]
Subject: (ESNUG 250 #6) *Always* Want "Selecting Critical Implementations"
> If I synthesize a 20 bit adder I get around 30ns performance, but if I
> use set_max_delay and compile, the timing is reduced to just over 7ns.
> This option enabled 'Beginning Resource Allocation' to use 'Selecting
> critical implementations', and so a Carry-Look-Ahead adder was picked
> from DW01, instead of a ripple adder.
>
> Why didn't I get the fastest adder from DW01 without set_max_delay?
> And what if the adder is buried around other logic, how do I use
> set_max_delay so that I don't get a ripple adder but Carry-Look_ahead?
> Or is there some secret Synopsys switch known only to the High Priests
> of Synopsys that will always enable 'Selecting critical implementation'?
From: celiac@teleport.com (Celia Clause)
John,
This Verilog example shows how to force a Carry Look Ahead incrementer, but
you can do the same thing for an adder:
always @ (posedge clk or posedge reset)
begin : b1
/* synopsys resource r0:
map_to_module = "DW01_inc",
implementation = "cla",
ops = "inc1";
*/
if (reset) begin
count = 0;
end
else if (enable) begin
count = count + 1; // synopsys label inc1
end
end
Synopsys uses designware to implement counters, adders, comparators, etc.
You can control the type of designware function used by inserting compiler
directives into your code. This example forced a DW01_inc block to be
implemented as a Carry Look Ahead incrementer
- Celia Clause
RadiSys
( ESNUG 251 Item 5 ) -------------------------------------------- [9/96]
Subject: (ESNUG 249 #6 250 #5) FSM Treatment Doesn't Seem Coherent
> To check Design Compiler, using an identical FSM coding, I swapped
> the columns of my state coding. I was sure to get an identical result,
> where the synthesized flipflops (SIG_st_sm_next_reg[0][1][2][3]) just
> changed their order.
>
> attribute ENUM_ENCODING of sm_next_state_type : type is
> -- order ABCD
> -- "0011 0010 0001 1010 0000 0100 0101 1101 1100 1011 1111 1001";
> -- order ADBC
> "0101 0001 0100 1001 0000 0010 0110 1110 1010 1101 1111 1100";
>
> I am very unhappy to see that the IDENTICAL synthesis script on an
> IDENTICALLY coded state machine produces DIFFERENT results!
From: Victor Preis <Victor.Preis@zfe.siemens.de>
Hi John,
One possible explanation for the different results could be the handling of
unspecified states. The example used 12 states. Coding these states with 4
bits gives 16 states. There are diffrent ways of handling the functionality
of these states depending on the modeling style.
Using flatten, Synopsys can otimize the dont care description for these
states. To generate equivalent results this user must specify equivalent
functionality for these unspecified states. Otherwise the optimization by
Design Compiler is not predictable.
- Viktor Preis
Siemens R&D
( ESNUG 251 Item 6 ) -------------------------------------------- [9/96]
Subject: (ESNUG 249 #3) Design Compiler Puts In Regs w/ D Inputs Tied Low???
>In one of my designs, I have a 32-bit registered output, the top 16 bits of
>which happen to be zero, e.g:
>
> IF clk'EVENT AND clk = '1' THEN
> IF cond THEN
> output <= "0000000000000000" & val_a;
> ELSE
> output <= "0000000000000000" & val_b;
> END IF;
> END IF;
>
>Synopsys seems to insist on synthesising 16 registers with inputs tied to
>ground for the top 16 bits. Is there any way to get Synopsys to blow the
>registers away i.e. have 'output(31 DOWNTO 16)' directly tied to ground?
From: peer@iis.fhg.de (Dieter Peer)
John,
I like hardware description languages, and am happy to see Synopsys behave
like this. Your statements are placed inside the the if...endif. So the
16 zeros can *only* appear after the first (clk'EVENT AND clk =3D '1'), as
you described it. This is the correct result of synthesis and fortunately
remains so after incremental compiles.
On the other hand: What output of your 32-bit-bus would you like to see
*before* the very first clock event?
- Dieter Peer
Fraunhofer-Gesellschaft
---- ---- ---- ---- ---- ---- ---- ----
From: Martin Radetzki <radetzki@offis.uni-oldenburg.de>
Dear John,
Registers seem to be holy to logic synthesis - once inferred, they last
forever. I've tried, for example, retiming without success. It should be
possible to write a dc_shell script to detect & remove registers tied to
GND/VDD.
- Martin Radetzki
OFFIS Research Institute
---- ---- ---- ---- ---- ---- ---- ----
From: Oren Rubinstein <oren@waterloo.hp.com>
Hello again, John.
I agree DC should get rid of the constant flops, but it doesn't. There are
two cases when it can do it:
1. Logic minimization of *combinational* gates.
2. Eliminating gates whose outputs are unconnected.
Your example doesn't fall into the first category, because the MUX is before
the flops. To achieve what you want, you need to re-arange your code, so
the selector works only on the lower bits. (I assume you didn't do this
because you wanted a more general case; if so, you can have a second
selector after the flops to make some bits "read-only".)
In other words, you want to have a row of flops, followed by a row of
2->1 MUXes which select between the flop and a constant (for each bit)
If the controls are also constant, DC eliminates the MUXes and the flops
that were not selected.
- Oren Rubinstein
Hewlett-Packard (Canada) Ltd.
---- ---- ---- ---- ---- ---- ---- ----
From: chang@elvis.ds.boeing.com (Kou-Chuan Chang)
I think that is due to the VHDL code. The signal output get assigned inside
the "clock edge check" and it is 32-bit wide. Also, the signal may be
assigned outside of the "clock edge test" for the asynchronous reset.
if (RSTn = '0') then
output <= (output'range => '0');
elsif (CLK'event and CLK = '1') then
if COND then
output <= "0000000000000000" & val_a;
else
output <= "0000000000000000" & val_b;
end if;
end if;
To have the msb 16-bit without the flipflop, you can try
if ((CLK'event and CLK = '1') then
if COND then
output16 <= val_a;
else
output16 <= val_b;
end if;
end if;
In another concurrent statement
output <= "0000000000000000" & output16;
Hope this helps.
- Kou-Chuan Chang
Boeing
---- ---- ---- ---- ---- ---- ---- ----
From: Andy Chomyn <Andy.Chomyn@proteon.com>
John, It's a question of style (as always). Try this outside your process
statement:
output(31 downto 16) <= "0000000000000000" ;
Then your process becomes:
IF clk'EVENT AND clk = '1' THEN
IF cond THEN
output(15 downto 0) <= val_a;
ELSE
output(15 downto 0) <= val_b;
END IF;
END IF;
To be more fancy (or neat) you could alias output(31 downto 16) as
output_upper_word and output(15 downto 0) as output_lower_word.
- Andy Chomyn
Proteon
( ESNUG 251 Item 7 ) -------------------------------------------- [9/96]
Subject: ( ESNUG 250 #3 ) Need Control Of Synopsys SDF File Generation
> The problem is that Synopsys writes out an SDF file where all three timing
> parameters (min/typ/max) are identical because it calculates the delays
> based on the last "set_operating_conditons" command. ... We want to write
> one SDF file where "min" delays are calculated w/ "best_case_commercial"
> operating conditions, "typ" delays are made from "typical_case" operating
> conditions, and "max" delays are calculated with "worst_case_conditions".
From: "David C. Black" <dblack@ink.apple.com>
John,
This user should be careful about using min/max values for ASIC simulations.
It may not be valid to run a min/max simulation depending on the type of
min/max data you use.
There are two types of min/max. First, there is overall prodcut min/max as
considered over temperature, voltage, and process. Second, there is single
device min/max within a single part for a specified operating condition. A
min/max spread for with a single chip will be very small; whereas, it is
quite large for a chip to chip comparison.
Simulating one gate running fast at 5.5v and 110 degF on one corner of the
die with another running slow at 4.5v and -10 degF can be quite erroneous.
This is probably the reason Synopsys takes this approach.
On the other hand, it is nice to be able to quickly running an entire
simulation at min, and then run another simulation at max. Using two files
is perhaps a hassle, but maybe safer in Synopsys' view.
- David C. Black
Apple Computer
( ESNUG 251 Item 8 ) -------------------------------------------- [9/96]
From: Greg Gravunder <gregg@jdc.csg.mot.com>
Subject: Test Compiler Seems "Moody" Concerning Test_Mode and Reset_n
John,
I have a question about disabling asynchronous resets during test mode. Test
Compiler requires all asynchronous resets to be disabled during test mode.
Should this include our primary input "reset_n" ? I have included this
gating and I am still having problems with Test Compiler. The problems
result on only a small amount of the flip-flops and seems to depend on how
Test Compiler generates the logic for this gating. I have always thought
that strobing reset should reset all of the flip flops in the design even if
test_mode (another primary input) is a '1'.
Assume active low asynchronous reset for the flip flop and when signal1 = '1'
then we want to reset the flop. There is no problem when:
(reset_n and (test_mode or not signal1))
There is a problem when:
((test_mode nand reset_n) nand (reset_n nand signal1))
Even though Test Compiler is infering an asynchonous reset port on reset_n.
It is not able to realize that reset_n will be a '1' when shifting in and
out data. I have been told by a Synopsys FAE that test_mode= '1' should not
allow reset_n to reset the flip flop although this has not been sitting well
with me. Any advice on ESNUG would be greatly appreciated.
- Greg Gravunder
Motorola
( ESNUG 251 Networking Section ) -------------------------------- [9/96]
Austin, TX -- Crystal Semiconductor. seeks engineers w/ VHDL, Synopsys,
& Cadence for mixed signal design. NO agencies! "mmh@crystal.cirrus.com"
Boulder, CO -- Brooktree seeks Verilog/Synopsys ASIC designers to design
telecom products. PLEASE, no headhunters/agencies. "chrisk@brooktree.com"
|
|