[Scons-dev] File-based build tools will never handle Java

Mark A. Flacy mflacy at verizon.net
Thu Sep 6 05:06:12 EDT 2012


On Wednesday 05 September 2012 21:20:47 Greg Ward wrote:

> OK, I finally sat down and did a little thought experiment, and have

> convinced myself that file-based buld tools will never handle Java.

> The reason? Dependency cycles.

>

> Short version: in Java, cycles between source files in the same

> package are commonplace and often impossible to avoid. Because

> interface and implementation are in the same file, that leads to DAG

> cycles, which any DAG-of-files-based build tool (make, SCons, tup,

> waf, ...) will reject. (Yes, I know that SCons nodes don't have to be

> files. But writing non-File Nodes is so painful that SCons might as

> well have that restriction.)

>

> Why? Because packages (directories of source files) fill much the same

> role in Java that modules do in Python or C.


In addition, the default visibility of class attributes and methods is the
package level. The closest C++ analog that I can think of would be that all
classes in a package have a "friend" relationship to each other.


> We don't sneer at Python

> code where two functions in the same module call each other, so don't

> mock Java developers who put a cycle in one package. Save your mockery

> for the ones who put cycles *between* packages. ;-)

>

> Concrete example:

>

> hg clone http://hg.gerg.ca/sample-java

>

> and just try to build the files in cycle/ using a file-based build

> tool *without* making fake nodes that represent the whole package, or

> a jar file, or what-have-you.

>

> Disclaimer: I haven't tried this yet. I wrote the code and drew the

> graph, so I just *know* it won't work with make, SCons, or tup. I'm

> still going to try just to prove a point, but it won't work.


Well, javac is designed to be given multiple files at once. Even if you
could give it a file at a time, that's a pretty heavy process to invoke more
often that you have to.

I had written a java build tool in Python for my company a while back. It's
very easy to tell in which package all the artifacts of any given java source
file will end up when you run it through javac: it's given by the one-and-
only-per-file package statement.

It's much less simple to find out which packages a given java source file
*uses*. Sure, you can use any import statements that you find but they are
not actually required since you can fully specify a class in the code.
(Sometimes you *must* do that in the cases where two packages have the same
named class.) In my case, I ran the code through a Java parser provided by
ANTLR that was also written in Python. You can then create a directed graph
at the package level and of course convert that graph into a DAG of strongly
connected components. You can then present the list of files in each strongly
connected component to javac.

We also ended up examining the generated class files to pull out the
signatures of any methods or attributes that were public or protected and
comparing that to previous successful compile's signatures. If those
signatures did not change, there was no need to recompile any strongly
connected component that used this one.

Finally, since you don't really want to spin up javac for each strongly
connected component (we had around a ~1,000 of them), we'd have a process
running that we could send the file list via a pipe which would run the javac
code for us. (We'd get the compile results back via a pipe too.) In fact,
we'd run several of those processes at a time just to speed up the compile.

We don't do any of this any more.

On a Windows box, it turned out to be faster to simply run javac over the
entire source structure at one time. A lot of that had to do with Windows
pipe flush semantics and the overall cr*ppy performance of NTFS. A linux box
running the same code could get some really big wins, especially if you had
not changed any public/protected signatures. But, most of our design
community ran Windows desktops so it made no sense to continue to use that
tool. We ended up moving to Ant and will eventually move to Gradle some day.

As it so happens, we still parse the source prior to compiling it. We have
other tools in place that enforce various package import rules as well as
forbidding package level import loops. As of java 1.6, you can get a handle
to the javac parser's AST which is preferable to using ANTLR (nice as ANTLR
is).


More information about the Scons-dev mailing list