( ESNUG 554 Item 1 ) -------------------------------------------- [12/10/15]
Subject: ANSS "Jolly" on why Big Data is a bad fit for EDA and chip design
TOLD YA SO! (part I): 6 days before it actually happened, DeepChip broke
the news that Ansys was acquiring John "Jolly" Lee's 6 engineer Big Data
start-up Gear Design Solutions. Booya! Jolly is known for Magma Mojove
Quartz DRC/LVS which gave MENT Calibre some trouble in ESNUG 445 #14.
Sources say Ansys paid $30 million for Gear -- which sounds hefty until
you realize Zacks Investment Research had downgraded ANSS from "hold" to
"sell" on May 10th -- along with most other Wall St analysts badmouthing
ANSS shares. Long story short, ANSS mechanical CAD sales aren't growing,
but for a few years Apache EDA sales were growing -- so Ansys had to bring
in John Lee's "Big Data techniques for IR-drop/EM!" as a last hope.
Rumor is some recent "ex-" Apache R&D have "unresigned" now that Jolly
is on the Ansys masthead as a VP & GM.
- from http://www.deepchip.com/items/0551-07.html (08/04/15)
From: [ John "Jolly" Lee of Apache Ansys ]
Hi, John,
While I have no comments about your Ansys business speculations, I want to
let your readers know how Ansys Gear "Big Data" is going to change EDA.
At Ansys we are tracking 9 customers who are starting 7nm designs. Our
focus is on power-integrity, thermal-integrity and signal-integrity. We
estimate chip complexity at 7nm will reach 4 billion instances, with the
number of physical geometries reaching 40 billion, and parasitics reaching
400 billion.
Put in these terms, chip design might appear to be a Big Data problem.
In many industries outside of chip design, Big Data systems are used to
process Petabytes of data, in seconds, using low-cost commodity Linux
hardware. Google search and Google maps are great examples.
But can Big Data be actually used for chip design? Why can you view the
world's maps on a phone instantly -- yet it takes hours to load a full
chip database into an EDA tool?
EXACTLY WHAT IS "BIG DATA"?
For starters, I recommend Dean Drako's excellent write-up on Big Data for
chip design in ESNUG 541 #2. To summarize, Big Data is:
- Data that is too large (and messy) to easily represent using
traditional databases, or computers.
- A set of techniques to analyze such big messy data. These
techniques were originally developed and published by Google,
and since popularized in open-source projects like Hadoop.
- The results typically deliver predictive analytics, machine learning,
or do some sort of computation.
Big Data is used to:
- Drive search engines, such as Google Search.
- Drive recommendation engines, such as Amazon and Netflix
("you might like this movie").
- Drive real-time analytics, like Twitter's "what's trending".
- Significantly reduce storage costs -- e.g. MapR's NFS
compliant "Big Data" storage system (versus Netapp).
Big Data systems require a key new concept. Keep all available data, as
you never know what questions you'll later ask.
HOW BIG IS BIG?
In practical terms, we can consider that data sets that exceed the memory of
a reasonable machine to be "Big Data". Today, the sweet spot for memory on
a server machine is greater than 512GB. Hence, we define "Big Data" today
as any data set that exceeds 512GB.
Traditional databases are structured (SQL == structured query language).
Such systems, like Oracle, require a large memory machine with a fast
enterprise-class disk system (e.g. Netapp). The performance of such a
system depends on how much memory you put on the machine ($$$) and how
fast (more $$$) your central storage appliance is.
EDA software like Synopsys IC Compiler, Cadence Virtuoso, Mentor Calibre
all use the exact same traditional (monolithic) database and data-model
systems that Oracle does. It's no surprise that SNPS Milkyway and CDNS
OpenAccess work this way because they were developed before Big Data
systems came out.
|
|
Classic EDA of past 30 years
|
Big Data
|
|
Data
|
Structured databases. Data is monolithic.
|
Unstructured, sharded, and distributed.
|
|
Compute
|
Runs great with more memory and more CPU's on same machine
|
Runs great on many low-end Linux boxes (e.g. 16 GB)
|
Distributed processing
|
Ad hoc. Each EDA application has a different approach.
|
Systematic. Built in formal methods (e.g. MapReduce)
|
|
Programming
|
Inflexible; done with C/C++ -- hard to develop; slow to fix.
|
Flexible. High-level abstractions in Java or Python.
|
|
Speed
|
Great for small 5 Gig to 10 Gig blocks; sucks for chips.
|
Great for any size data, even Petabytes of data.
|
|
Silos
|
Only work with specific structured EDA data that it creates.
|
Great at searching across varied, unstructured sets of data.
|
All Big Data systems share these common traits:
- Data is broken into many small pieces called "shards".
- Shards are stored and distributed across many smaller cheap disks.
- These cheap disks exist on cheap Linux machines.
Cheap == low memory, consumer-grade disks and CPU's.
- Shards can be stored redundantly across multiple disks, to
build resiliency. (Cheap disks and cheap computers have
higher failure rates).
Big Data software (like Hadoop) use simple, powerful techniques so the data
and compute are massively parallel.
- MapReduce (http://en.wikipedia.org/wiki/MapReduce) is used to
take any serial algorithm, and make it massively parallel.
- In-memory caching of data, Spark (http://spark.apache.org) to
make iterative algorithms fast.
- Machine learning packages, Mlib (http://spark.apache.org/mllib)
runs natively on these architectures.
A simple example: Consider a file with the name of every person on planet
Earth. What if you wanted to count the number of people named Aart in
the world? A serial old school EDA algorithm would be:
count = 0
For each name:
If name.first == "Aart" then count = count + 1
A traditional old school EDA data system would process the file serially
(read from the beginning, and scan through all 6 billion Earth names).
In a Big Data system, the ALL HUMANS ON EARTH file would be sharded into
many smaller pieces. For example, 100,000 shards with 60,000 names in it
each. If 100 computers were used, then each computer would store 10,000
shards on its cheap local disk. And because each shard is small (under
1 MB each), the memory required on each machine is very low.
MapReduce would then take the serial algorithm, and apply it in parallel
to each shard. The count from each shard would then be added up for the
final count.
This Big Data system provides a 100x improvement, using commodity hardware.
Moreover, as the data size grows, it's very easy just to add more cheap
computers -- with the runtime remaining constant.
It's worth noting that many of these ideas existed "pre-Google". However,
the techniques were always applied ad-hoc, in proprietary systems, and were
often hard-coded in C or C++. In contrast, Hadoop and MapReduce were
written flexibly to take non-compiled, interpreted queries, and parallelize
them automatically.
THE BIG TRADE-OFF
The generality that Big Data systems provides is in contrast to the highly
detailed micro-efficiency that EDA vendors obsess about. By writing the
most efficient C or C++ code, EDA programs strive to be optimal -- be the
most CPU efficient, and most memory efficient.
The EDA approach makes sense if HW resources are precious; which certainly
was the case 20 years ago. A Sun Sparc workstation or an IBM PowerPC server
were indeed precious and expensive in their day.
Google, in contrast, based their software on the belief that compute and
storage are infinitely cheap. Indeed they are right -- and the performance,
capacity, cheapness, and flexibility of their Big Data systems prove it.
BIG DATA IS NOT AN EASY FIT FOR CHIP DESIGN
Given that Big Data systems like Hadoop are changing whole industries, why
is it we've NOT seen a new wave of Hadoop-based EDA tools?
Here's what Big Data is good at:
- They reside on dedicated hardware.
- They are good at processing text files, search algorthms, and
key-value pairs.
- They are weak at computation.
For chip design, here's what Big Data is bad at:
- They do not run on standard LSF Linux clusters.
- They do not know about chip design constructs and formats
of things like transistors, wires, instances, voltages,
timing, currents, power, etc.
- They can not compute timing nor power. They can't simulate
functionality. They can't synthesize, etc.
So why bother with Big Data in chip design tools?
ELASTIC COMPUTE
At ANSYS we've determined that a purpose-built Big Data platform is needed
in chip design. This has been a multi-year effort --- we started in 2012
at Gear Design Solutions -- and it's now an integral part of the future for
our ANSYS product platform.
Our purpose-built platform handles some of the key problems with traditional
Big Data systems:
- It understands chip-design constructs, and EDA formats. This means
you can do Google search for chip design data. "Show me all high
power instances that are near any sensitive cell on a timing critical
path" -- and have it run instantly, for chip-scale data.
- It scales efficiently across standard LSF Linux clusters. No root
access needed; and the platform spawns itself easily, and co-exists
nicely with existing EDA tools.
- It solves the compute-problem that traditional Big Data tools have.
To solve the compute-problem, we've augmented the traditional MapReduce
algorithm to provide elastic-compute services.
Elasticity means that data can exist in any form. For example, the layout
of a chip can exist as a monolithic layout database. Or it can exist as
1000 different buckets (shards), across a 1000 different Linux machines.
Or any combination in-between.
Elastic-compute means the compute is done wherever the data is. Compute,
in EDA terms, can be boiled down to a few types; geometric compute (DRC and
RC extraction), graph compute (logic simulation and STA) and matrix compute
(SPICE and IR-drop). So an example of elastic-compute is a geometric engine
like Calibre that processes DRC data equally well, whether the data exists
on a single machine, or across 1000's of machines.
Why make a purpose-built platform with elastic-compute services?
- Instant access to all chip design data (netlist, layout, power,
timing, reliability) that lets designers to gain unique insights
into their designs. This Big Data visibilty lets in-design
optimizations be the brains to driving their existing EDA tools.
- Unlimited capacity. Since data is sharded and distributed, you
will no longer be constrained by the cost of a 1TB machine, or
the performance of a central Netapp.
- Elastic-compute. 100x? 1000x? The comutation scale will far
exceed what old ad-hoc EDA methods have been able to accomplish.
Our focus at Ansys is to provide physics-based simulation of all aspects of
chip design. This includes electrical simulation, thermal simulation,
mechanical simulation, fluid-dynamics simulation. Multi-physics-based
simulation across die/package/system is a requirement for all product
design (smart phones, networking equipment, automotive systems, etc.).
It's our belief that elastic-compute + Big Data will be the best way to
provide such die/package/system solutions.
I look forward to the opportunity to share more detail in 2016, and I
expect that you'll start seeing customer reports then, too.
- John "Jolly" Lee
Ansys-Apache, Inc. San Jose, CA
---- ---- ---- ---- ---- ---- ----
Related Articles
Drako launches new EDA tool type with Big Data tapeout predictor
Dean Drako warns "Big Data" analytics coming for chip design DM
Customer benchmark of Ansys/Apache RedHawk vs. Cadence Voltus
Join
Index
Next->Item
|
|