Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS


( ESNUG 360 Item 15 ) -------------------------------------------- [11/02/00]

Subject: ( ESNUG 359 #6 )  Design A 64-bit+ Multiplier-Accumulator (MAC)

> I would love to hear from your readers how they'd design a large
> Multiplier-Accumulator (MAC) for over 64-bit operands.  I'm considering
> Module Compiler.  Our implementation technology is not decided yet, but
> I'm guessing 0.18um or smaller.  We're targeting in excess of 133MHz.
>
> In terms of speed/power, pointers/numbers would be greatly appreciated,
> as also techniques to verify this type of circuit.  (Obviously, we're not
> circuit designers, and would probably do a very poor job at a custom
> multiplier.)  Ideas?  Pointers?
>
>     - Neel Das


From: Gil Herbeck <gilherbeck@home.com>

John,

There are a number of factors that can have a big influence on this design.
How big are the operands and the accumulator?  Do you need saturation logic?
If so, are there multiple (programmable) saturation points?  Do you have
both integer and fixed point, or just one data type?  Can you have latency
from the inputs to the accumulator?  Is your process / cell library
optimized for area and power?  It's hard to say much without more specific
info.

If performance becomes a problem, MC can provide a big advantage for
non-interleaved accumulators.  You may be able to leave the accumulator
itself in carrysave format and push the carry propagation to after the
accumulation register.

    - Gil Herbeck
      Radix20                                    Livermore, CA

         ----    ----    ----    ----    ----    ----   ----

From: [ A Synopsys Module Compiler CAE ]

Hi John,

It is fairly straight forward to implement a simple MAC in Module Compiler
(MC).  You get full operator merging (a single carry save reduction/Wallace
tree with just one carry propagate adder for the entire multiply and add
operation as well as any other addends), a choice of different multipliers
(booth/non-booth) and final addera (fast-carry-lookahead, carry-lookahead, 
carry-select, the Synopsys proprietary carry-lookahead-select and ripple)
micro-architectures to trade off area/timing.

You can also parameterize these options along with the input operand widths
and different implementations of the MAC to perform fast architectural
exploration.  This is shown in the first architecture (arch==0) of the 
following piece of Module Compiler Language (MCL) code.

   module MAC (Z,X,Y,R,w,ovf,mult,fa,arch);
   integer w    = 64;                      // Input width
   integer ovf  = 2;                       // Overflow accum. bits
   integer accw = 2*w + ovf;               // Accumulator width
   integer arch = 0;                       // MAC architecture
   string  mult = "booth";                 // Multiplier type
   string  fa   = "cla";                   // Final adder type

   directive(multtype=mult,fatype=fa,pipeline="off");

   input signed [1] R;                     // Accumulator reset
   input  [w] X,Y;
   output [accw] Z;

   if (arch==0){
       wire   [accw] ACCin = X*Y + (Z&R);
       Z = sreg(ACCin);
   }

   // arch-1 is also implemented by an MC built in function maccs()
   // arch-1 can be modified slightly to pipeline the multiplier
   // and the final adder to further speed up the MAC.

   if (arch==1){
      wire   [accw] ACC0,ACC1,ACCin0,ACCin1;
      directive local (carrysave="convert");
      wire   [accw] ACCin = X*Y + (ACC0&R) + (ACC1&R);
      csconvert(ACCin0,ACCin1,ACCin);
      ACC0 = sreg(ACCin0);
      ACC1 = sreg(ACCin1);
      Z = ACC0+ACC1;
   }

   endmodule

As all of us know, the critical path is from the inputs, thru' the merged
multiplier and propagate adder in the accumulator.  You can individually
access the output of the accumulator "carry" and "sum" terms to 'push' 
the final propagate adder out of the sequential feedback loop.  This will
speed up the design and may be done by setting the carrysave attribute in
MC to "convert" and using the csconvert() function.  That was shown in the 
second architecture (arch==1).

   o The second architecture can be easily modified to isolate the
     multiplier, so that it can be pipelined and retimed by MC along
     with the final adder.  This will further speed up the design
     without changing the basic functionality of the MAC.

   o After synthesis, MC will write out a bit and cycle exact RTL
     simulation model, either in Verilog and VHDL.  This can be used for
     running your fast functional simulations to verify your design.
     Of course, you'll use the gate-level netlist for full simulation.

To give you a flavor of what the results look like, I used the Synopsys
DesignWare Silicon Library (std. cell) developed for TSMC's 0.18G process
to run a couple of tests.  This is for a 64-bit operand MAC with out any 
pipelining.  Of course, your results will vary depending on the technology
library you use.

    Arch-0: # of instances= 5910; delay= 7.58ns (~132 MHz)
    Arch-1: # of instances= 6275; delay= 5.33ns (~188 Mhz)

The above delay numbers can be reduced significantly by pipelining the
multiplier and the final propagate adder, until you hit the limit of
the loop delay, which then will be the critical path.  Here's the results
for a pipelined and retimed MAC with 2 pipe stages in the multiplier and
one in the final carry propagate adder (for a total of 3 in the design):

    Modified Arch-1: # of instances=8999; delay=3.37ns (~300 MHz)

You may get aggressive delays with smaller process tech. libraries, but
without changing the functionality of the MAC, you'll always be bound by
the feedback loop delay.

Hope this helps.

    - [ A Synopsys Module Compiler CAE ]

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)