( ESNUG 482 Item 4 ) -------------------------------------------- [06/30/09]
Subject: ( ESNUG 480 #5 ) A second customer looks at Cadence C-to-Silicon
> CtoS produced good results, both in speed and area, and had the fastest
> turnaround in the competition. Results reported by DC (90 nm target):
>
> Area results excluding SRAM (registers/comb logic):
>
> C-to-Silicon Others
> 54 MHz 13414/68040 9981/86479 - 23435/93605
> 108 MHz 24809/68523 19357/87021 - 48394/101775
>
> Shortest achievable pipeline latency (clock cycles):
>
> C-to-Silicon Others
> 54 MHz 10 8 - 13
> 108 MHz 15 12 - 19
>
> Timing results (slack nsec):
>
> C-to-Silicon Others
> 54 MHz 0.00 0.52 - 1.14
> 108 MHz 0.00 0.00 - 0.10
>
> The C synthesis (from C to RTL) runtimes ranged from 5 min (54 MHz, ASIC)
> up to around 1.5 hours (120 MHz, Xilinx), depending on the tightness of
> clock constraint. CtoSilicon was generally on the faster end.
>
> - Gernot Koch
> Micronas GmbH Freiburg, Germany
From: J.C. Yeh <jcyeh=user domain=itri.org.tw>
Hi John,
There's been a lot of talk about C synthesis tools on DeepChip lately. It
sounds like many of the tools described are simply workarounds for not
supporting SystemC.
We like SystemC for high level synthesis since it includes everything in C++
and is the industry-standard way to describe hierarchy, concurrency, fixed-
point arithmetic, and bus protocols -- more importantly, for designs with
any significant control logic, the C-only-based tools simply do not work.
We chose Cadence's C-to-Silicon tool because it supports SystemC.
Our requirement to move to C synthesis is that it must at least match the
area and timing of hand written RTL synthesis. Our design started as an
abstract SystemC model of a DMAC with an AHB TLM slave interface. It was
originally written for performance and used tlm_fifo (OSCI TLM 1.0) to
connect the DMAC kernel with bus transactors. Again, it's attractive to
stick with industry standards.
SystemC vs. Verilog RTL:
The first step was to make the SystemC model synthesizable. As this was
our first project using CtoS, there was a ramp-up to learn the coding style.
All dynamic memory usages, such as 'new,' had to be rewritten to instantiate
the objects in SystemC code. Also, CtoS does not support sc_event, so we
had to change all SC_THREAD's to SC_CTHREAD's, plus add clock and reset
signals. Some coding style changes were made to improve the design area,
such as changing the structure of the register configuration.
Tweaking our SystemC source for all this took about 2 weeks to do.
Overall, these changes required some effort, but were still much smaller
than for traditional RTL design:
Our DMAC in # of lines of source code
----------- -------------------------
Verilog RTL 28,750
SystemC 6,718
This 4x compaction was what we had hoped for such control-intensive code;
datapath code would have been better.
Using CtoS:
Using CtoS consists of working in a GUI to view a control and dataflow graph
of your design. In the GUI, you decide whether to flatten arrays or to map
to various sized/ported memories. You are pointed to area vs. timing
decisions on all loops and function calls, plus given opportunities to add
states or add latency. It's simple to try different micro-architectures and
to have CtoS generate Verilog RTL (or faster simulation models) for each.
CtoS also uses short tcl scripts for batch mode.
Our design had a critical timing path that was the biggest headache for us;
yet CtoS reported that there was negative slack. (Huh?) It took some time
to determine the reason and how to improve it. For this, the Cadence AE was
helpful in showing potential coding changes or options to manually bind
operations to specific resources (for example to un-share an adder on the
critical path to remove the mux delay).
Verification was straightforward for us because the CtoS flow fit with our
existing virtual platform for SystemC TLM and Specman for the original
hand-edited RTL.
1.) First, we checked the SystemC model of just the DMAC kernel
by using an AHB cycle-accurate bus transactor connected to
the Bus API of virtual platform. Since the TLM interfaces did
not have any signals, they ran very fast.
2.) Next, we swapped the signal-level transactor into the same
environment. These slower simulations were used primarily to
test the bus protocol interface.
3.) After synthesis of the SystemC, CtoS generates standard Verilog RTL
and behavioral simulation models that we tested in our existing
Specman verification environment.
This approach gave us a seamless verification flow from SystemC to RTL,
and let us finish the total verification in 3 weeks.
Results:
Our evaluation criterion was to compare area, timing, power and design costs
with traditional RTL design using a 90 nm process technology.
Area (sq mm) Timing (ns) Power (mW)
hand written RTL 674,120 6.25 4.92
CtoS 589,279 6.25 3.78
Including verification time, in 5 weeks total, CtoS met timing and produced
a design 16% smaller and with 23% less power, compared to our hand RTL.
Although this was one test case, we believe C synthesis will become more
widely used when it is commonplace that it at least matches the area and
timing of handwritten RTL.
We plan to use CtoS on our next design project.
- J.C. Yeh
Industrial Technology Research Institute Hsinchu, Taiwan
Join
Index
Next->Item
|
|