5.6 Command-Line Limits
When working with large projects, you occasionally
bump
up against limitations in the length of commands
make tries to execute. Command-line limits vary
widely with the operating system. Red Hat 9 GNU/Linux appears to have
a limit of about 128K characters, while Windows XP has a limit of
32K. The error message generated also varies. On Windows using the
Cygwin port, the message is:
C:\usr\cygwin\bin\bash: /usr/bin/ls: Invalid argument
when ls is given too long an argument list. On Red
Hat 9 the message is:
/bin/ls: argument list too long
Even 32K sounds like a lot of data for a command line, but when your
project contains 3,000 files in 100 subdirectories and you want to
manipulate them all, this limit can be constraining.
There are two basic ways to get yourself into this mess: expand some
basic value using shell tools, or use make itself
to set a variable to a very long value. For example, suppose we want
to compile all our source files in a single command line:
compile_all:
$(JAVAC) $(wildcard $(addsuffix /*.java,$(source_dirs)))
The make variable source_dirs
may contain only a couple hundred words, but after appending the
wildcard for Java files and expanding it using
wildcard, this list can easily exceed the
command-line limit of the system. By the way, make
has no built-in limits to constrain us. So long as there is virtual
memory available, make will allow any amount of
data you care to create.
When you find yourself in this situation, it can feel like the old
Adventure game, "You are in a twisty maze of
passages all alike." For instance, you might try to
solve the above using xargs, since
xargs will manage long command lines by parceling
out arguments up to the system-specific length:
compile_all:
echo $(wildcard $(addsuffix /*.java,$(source_dirs))) | \
xargs $(JAVAC)
Unfortunately, we've just moved the command-line
limit problem from the javac command line to the
echo command
line. Similarly, we cannot use
echo or printf to write the
data to a file (assuming the compiler can read the file list from a
file).
No, the way to handle this situation is to avoid creating the file
list all at once in the first place. Instead, use the shell to glob
one directory at a time:
compile_all:
for d in $(source_dirs); \
do \
$(JAVAC) $$d/*.java; \
done
We could also pipe the file list to xargs to
perform the task with fewer executions:
compile_all:
for d in $(source_dirs); \
do \
echo $$d/*.java; \
done | \
xargs $(JAVAC)
Sadly, neither of these command scripts handle errors during
compilation properly. A better approach would be to save the full
file list and feed it to the compiler, if the compiler supports
reading its arguments from a file. Java compilers support this
feature:
compile_all: $(FILE_LIST)
$(JAVA) @$<
.INTERMEDIATE: $(FILE_LIST)
$(FILE_LIST):
for d in $(source_dirs); \
do \
echo $$d/*.java; \
done > $@
Notice the subtle error in the for loop. If any of
the directories does not contain a Java file, the string
*.java will be included in the file list and the
Java compiler will generate a "File not
found" error. We can make bash
collapse empty globbing patterns by setting the
nullglob option.
compile_all: $(FILE_LIST)
$(JAVA) @$<
.INTERMEDIATE: $(FILE_LIST)
$(FILE_LIST):
shopt -s nullglob; \
for d in $(source_dirs); \
do \
echo $$d/*.java; \
done > $@
Many projects have to make lists of files. Here is a macro containing
a bash script producing file lists. The first
argument is the root directory to change to. All the files in the
list will be relative to this root directory. The second argument is
a list of directories to search for matching files. The third and
fourth arguments are optional and represent file suffixes.
# $(call collect-names, root-dir, dir-list, suffix1-opt, suffix2-opt)
define collect-names
echo Making $@ from directory list...
cd $1; \
shopt -s nullglob; \
for f in $(foreach file,$2,'$(file)'); do \
files=( $$f$(if $3,/*.{$3$(if $4,$(comma)$4)}) ); \
if (( $${#files[@]} > 0 )); \
then \
printf '"%s"\n' $${files[@]}; \
else :; fi; \
done
endef
Here is a pattern rule for creating a list of image files:
%.images:
@$(call collect-names,$(SOURCE_DIR),$^,gif,jpeg) > $@
The macro execution is hidden because the script is long and there is
seldom a reason to cut and paste this code. The directory list is
provided in the prerequisites. After changing to the root directory,
the script enables null globbing. The rest is a
for loop to process each directory we want to
search. The file search expression is a list of words passed in
parameter $2. The script protects words in the
file list with single quotes because they may contain shell-special
characters. In particular, filenames in languages like Java can
contain dollar signs:
for f in $(foreach file,$2,'$(file)'); do
We search a directory by filling the files array
with the result of globbing.
If the files array contains any elements, we use
printf to write each word followed by a newline.
Using the array allows the macro to properly handle paths with
embedded spaces. This is also the reason printf
surrounds the filename with double quotes.
The file list is produced with the line:
files=( $$f$(if $3,/*.{$3$(if $4,$(comma)$4)}) );
The $$f is the directory or file argument to the
macro. The following expression is a make
if testing whether the third argument is
nonempty. This is how you can implement optional arguments. If the
third argument is empty, it is assumed the fourth is as well. In this
case, the file passed by the user should be included in the file list
as is. This allows the macro to build lists of arbitrary files for
which wildcard patterns are inappropriate. If the third argument is
provided, the if appends
/*.{$3} to the root file. If the fourth argument
is provided, it appends ,$4 after the
$3. Notice the subterfuge we must use to insert a
comma into the wildcard pattern. By placing a comma in a
make variable we can sneak it past the parser,
otherwise, the comma would be interpreted as separating the
then part from the
else part of the if.
The definition of comma is straightforward:
comma := ,
All the preceding for loops also suffer from the
command-line length limit, since they use wildcard expansion. The
difference is that the wildcard is expanded with the contents of a
single directory, which is far less likely to exceed the limits.
What do we do if a make variable contains our long
file list? Well, then we are in real trouble. There are only two ways
I've found to pass a very long
make variable to a subshell. The first approach is
to pass only a subset of the variable contents to any one subshell
invocation by filtering the contents.
compile_all:
$(JAVAC) $(wordlist 1, 499, $(all-source-files))
$(JAVAC) $(wordlist 500, 999, $(all-source-files))
$(JAVAC) $(wordlist 1000, 1499, $(all-source-files))
The filter function can be used as well, but
that can be more uncertain since the number of files selected will
depend on the distribution within the pattern space chosen. Here we
choose a pattern based on the alphabet:
compile_all:
$(JAVAC) $(filter a%, $(all-source-files))
$(JAVAC) $(filter b%, $(all-source-files))
Other patterns might use special characteristics of the filenames
themselves.
Notice that it is difficult to automate this further. We could try to
wrap the alphabet approach in a foreach loop:
compile_all:
$(foreach l,a b c d e ..., \
$(if $(filter $l%, $(all-source-files)), \
$(JAVAC) $(filter $l%, $(all-source-files));))
but this doesn't work. make
expands this into a single line of text, thus compounding the
line-length problem. We can instead use eval:
compile_all:
$(foreach l,a b c d e ..., \
$(if $(filter $l%, $(all-source-files)), \
$(eval \
$(shell \
$(JAVAC) $(filter $l%, $(all-source-files));))))
This works because eval will execute the
shell command immediately, expanding to nothing.
So the foreach loop expands to nothing. The
problem is that error reporting is meaningless in this context, so
compilation errors will not be transmitted to make
correctly.
The wordlist approach is worse. Due to
make's limited numerical
capabilities, there is no way to enclose the
wordlist technique in a loop. In general, there
are very few satisfying ways to deal with immense file lists.
|