( ESNUG 399 Item 9 ) --------------------------------------------- [08/08/02]
From: Tomoo Taguchi <ttaguchi@amcc.com>
Subject: ( ESNUG 312 #1 ) Makefile/LSF Dependencies Are Driving Me Nuts!
Hi, John,
I'm trying to set up a makefile to kick off LSF jobs in parallel, but I'm
having difficulty getting the dependencies to work with LSF. I know this
problem has been solved at a previous employer using an internal LSF-like
tool, and I'm sure that this problem has been solved numerous times with
LSF at other companies. I poked through the DeepChip ESNUG archives and
didn't find what I was looking for.
Basically, if A instantiates B & C, I want a makefile that kicks off B and
C LSF compile jobs off in parallel, but holds off the A LSF compile job
until B and C are complete. I'm using plain vanilla make, and I'm
suspecting that I need nmake or some other make flavor to support the
parallel feature. The other problem is that since bsub returns after the
B and C jobs are queued, the makefile assumes that the dependency is
satisfied and kicks off the A job before B and C dbs are available.
I did come up with a non-bullet-proof, but simple/good-enough-for-me
solution to running parallel builds with make and LSF.
I ran into several problems.
Problem 1. LSF queued up my compile jobs and came back a few seconds
later, so the dependencies had no effect and everything (including jobs
that should have been held off until lower level blocks were done) were
kicked off in parallel.
Solution 1. bsub (the command that kicks off LSF jobs) has a -K option
that waits for the job to complete. So this solves the problem of
upper-level blocks kicking off before their dependencies are done.
Problem 2. If I use the -K option, then LSF kicks off jobs on different
servers, but the process becomes completely serial, since the bsub waits
for the LSF job to complete.
Solution 2. Out of the many flavors of make (imake, nmake, gnu make,
etc), I was using old, vanilla make, which doesn't support parallel
execution. I found that nmake and gnu make support the -j option which
will kick off jobs in parallel as long as it follows the dependencies
specified. I ended up using gnu make because it was already installed
locally and from what I could take of nmake, you had to pay for it.
Problem 3. We have a limited number of Design Compiler licenses, and if
I let make kick off a bunch of jobs in parallel, it has the potential to
gobble up all the licenses and make guys wanting to kick off interactive
or other jobs angry. Granted, this could be solved by completely limiting
the access to dc_shell through LSF, but that's not the situation I have
here. Even if it was, I think other designers would be upset if I ate up
all the licenses on long compile jobs. So, I needed a way to limit the
number of license that my make run ate up.
Solution 3. My first attempt was to write a perl script that would run
"lmstat -f Design-Compiler", and parse its output to determine how many
licenses were being currently being used. bsub has an -E <command>
option that will run <command> before kicking off a LSF job. If the
command returns a 0, it kicks off the command. If it returns a 1, it
puts the job back on the queue. I specify to my perl script how many
license I want to leave open. If the script figures out that if I run my
job, there will still be the specified number of license open or more,
then it returns a 0 and the job kicks off, otherwise, it goes back to a
pending status on the queue.
The first problem that I ran into with this approach, is that because of
the neat -j parallel exection option in gnu make, if I kicked off a build
where I could build X jobs in parallel before running into my first
dependency, all the jobs would run my perl script simultaneously, and
since no dc_shell jobs have been kicked off yet, they would all see that
there were plenty of license available, and all of them would kick off
their dc_shell jobs. What I really needed to do was stagger the kick
off of jobs (by about 10-15 seconds so that dc_shell has time to grab a
license, and for lmstat to execute, which takes a few seconds), so that
as each job kicks off, it has an accurate picture of how many license
were being used. So, I put different-valued sleep commands before
each target commands, so that each target would kick off at a different
time. This worked great when I initially executed the make command,
but if jobs went back on the queue because too many licenses were
being used, then LSF would determine when it would try to reexecute
the same job. Since I didn't have any control of when LSF would try to
reexecute all the jobs on the queue, most of the time, everything on
the queue eventually ended up kicking off and grabbing too many licenses.
So, the eventual solution that I came up with was to build a perl wrapper
around my make command that took advantage of the -j <parallel_jobs_num>
feature of gnu make. If a value to -j isn't specified, then it allows an
infinite amount of parallel jobs. But if a number is specified to the -j
option, then make will only kick off up to that number of parallel jobs.
So the perl wrapper runs lmstat, figures out how many licenses I can grab
given the number of installed licenses, the number of licenses used, and
the number of specified license to keep free, then kicks off the make job
with this number as the argument to the -j option.
I grant that this method control licenses by contolling make, but I figure
that unless licenses usage can be completely regulated (where the license
server or something with the same level of control can handle request to
only grant licenses if a specified number is kept open), any scheme would
have some short coming. By specifying the maximum number of parallel runs,
I set a maximum number of licenses I would ever grab at any given time.
I have to admit that I'm a newbie to writing makefiles or running LSF, so
I'm sure that those with more experience in both would come up with more
elegant and bullet-proof schemes to optimize their compile environment.
Also, I haven't extensively used it, so there might be some scenario that
my method screws up, but for now it seems to be working.
- Tomoo Taguchi
AMCC San Diego, CA
P.S. Ron Ranauro's white paper in ESNUG 312 #1 was helpful. I tried to
email him, but it bounced. Do you have a more recent email address?
============================================================================
Trying to figure out a Synopsys bug? Want to hear how 14,063 other users
dealt with it? Then join the E-Mail Synopsys Users Group (ESNUG)!
!!! "It's not a BUG, jcooley@TheWorld.com
/o o\ / it's a FEATURE!" (508) 429-4357
( > )
\ - / - John Cooley, EDA & ASIC Design Consultant in Synopsys,
_] [_ Verilog, VHDL and numerous Design Methodologies.
Holliston Poor Farm, P.O. Box 6222, Holliston, MA 01746-6222
Legal Disclaimer: "As always, anything said here is only opinion."
The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com
|
|