( ESNUG 251 Item 1 ) -------------------------------------------- [9/12/96]
Subject: ( ESNUG 249 #7 250 #2) What's The Best Way To Synth Multipliers?
> DO: constrain multipliers *accurately* (don't over or under constrain)
> and Design Compiler will do a good job meeting that constraint. DON'T:
> flatten or remove hierarchy in a design with a multiplier. Also, you
> will need a DesignWare licence to get the faster multipliers.
From: kurt@wsfdb.com (Kurt Baty)
Hi, John,
Here is a set of compiles of the DW02_mult, comparing the gate-count and
speed of the Carry-Save-Addition (CSA) multipliers versus the Wallace
architecture. The two ASIC vendors were both 0.6 micron CMOS processes.
Each were compiled against both ASIC libs with worst-case industrial
conditions and with set_input_drive, set_input and set_output loads and a
wire load table. (These tables represent over 100 hours of SPARC 10/51
compute time.)
The peak differences between the two multiplier architectures happen when
they're multiplying two same sized (bit-wise) vectors. This all goes away
when you have greatly different bit widths. For example, there won't be
much differences between a Wallace tree and a CSA implementation if you
were multiplying 32 bits by 5 bits. Therefore, all the data is based on
A_width and B_width being equal.
ASIC Vendor 1
A_width CSA Multipliers Wallace Trees
B_width area speed area speed faster %
2 bits 64 4.68 100 7.41 -58%
3 187 8.44 262 9.31 -10%
4 271 11.68 392 11.32 3%
5 433 12.93 550 12.54 3%
6 568 15.01 720 15.18 -1%
7 724 17.05 824 17.41 -2%
8 904 18.96 984 18.46 3%
9 1080 23.05 1290 18.25 21%
10 1430 22.18 1619 19.55 12%
11 1776 24.1 1898 19.76 18%
12 1831 27.03 2178 22.11 18%
13 2696 25.39 2439 21.23 16%
14 2951 27.36 2736 24.68 10%
15 3116 29.65 3185 24.08 19%
16 3493 31.55 3658 25.51 19%
17 3688 44.72 4077 27.41 39%
18 4115 34.73 4541 25.84 26%
19 4531 36.87 5055 26.27 29%
20 4944 39.84 5589 26.08 35%
21 5491 40.46 5804 29.02 28%
22 5970 40.79 6382 29.99 26%
23 6541 42.4 7135 29.31 31%
24 6957 45.41 7389 34.69 24%
[ No data for 25 to 27 bit widths. ]
28 9493 50.97 10145 29.99 41%
[ No data for 29 to 31 bit widths. ]
32 12328 56.11 12791 36.35 35%
ASIC Vendor 2
A_width CSA Multipliers Wallace Trees
B_width area speed area speed faster %
2 bits 73 4.86 114 6.92 -42%
3 217 9.03 187 9.77 -8%
4 293 12.43 338 12.72 -2%
5 371 14.12 475 13.19 7%
6 511 17.41 598 16.28 6%
7 722 18.66 858 16.83 10%
8 890 21.08 1137 17.71 16%
9 995 23.17 1282 17.81 23%
10 1409 25.75 1522 20.79 19%
11 1411 29.27 1986 19.65 33%
12 1818 29.57 2072 21.94 26%
13 2009 32.29 2452 21.65 33%
14 2292 34.19 2833 22.04 36%
15 2550 37.08 3072 22.46 39%
16 3168 37.63 3604 25.09 33%
17 3150 41.29 3776 25.98 37%
18 3807 41.36 4409 24.94 40%
19 4054 44.31 4622 26.56 40%
20 4475 46.92 5051 27.67 41%
21 5073 48.02 5443 27.9 42%
22 5135 51.74 5806 29.02 44%
23 5895 52.22 6575 27.97 46%
24 6409 54.57 7177 29.85 45%
[ No data for 25 to 27 bit widths. ]
28 8676 61.91 9177 30.44 51%
[ No data for 29 to 31 bit widths. ]
32 10871 70.69 12124 30.43 57%
These tables show that, starting at about eight bits, the Wallace tree
architecture has a significant speed difference and has only up to about
ten percent increase in gate count. (What's not shown is that I know the
effect of A_width not being equal to B_width would slightly diminish the
advantages of the Wallace architecture, though.)
The reason why you see a variation between these two ASIC libraries is the
relative difference in the speed of doing the majority veruses doing the
inputs to carry out on their adders. As that ratio tightens there is less
speed gain.
- Kurt Baty
WSFDB Consulting
|
|