[Scons-dev] Proposal for a more efficient version of the Parallel scheduler

Adam Gross grossag at vmware.com
Tue Jul 9 14:35:21 EDT 2019


In VMware builds, the sheer number of tasks (33000 leaf tasks in the first iteration of ESX builds, for example) means that Parallel.start can take up to 20 minutes simply collecting and preparing tasks over the course of a build. I started a project to look at batch task handling in order to support remote caching (e.g. asking the cache for 1000 nodes at a time instead of 1) and realized that the approach that I want to take actually makes normal builds more efficient as well. In this e-mail, I'd like to explain my proposal so I can get your thoughts on it.

--- Current Parallel.start performance problems ---

Reference: https://github.com/SCons/scons/blob/master/src/engine/SCons/Job.py#L369

In the current implementation, SCons collects just enough tasks to dispatch to the thread pool such that the number of active jobs is equal to the max number of jobs. It then waits for at least one job to be done, gathers all finished jobs, then repeats the process of collecting enough tasks to have jobs==self.maxjobs.

Waiting on at least one job to be done misses an opportunity to keep calling taskmaster.next_task() and task.prepare() while jobs are active. These calls are not cheap for many reasons, including that it initiates scanning of source nodes.

--- Proposal ---

A first rough draft is contained in draft pull request https://github.com/SCons/scons/pull/3404 . In this form it is an alternative child class of Parallel; it could just replace it if people felt strongly.

I would like to implement an alternative to the Parallel class that only waits for jobs to complete if there are no tasks left (i.e. taskmaster.next_task() returns None). It is optimized for keeping jobs==self.maxjobs but otherwise, will keep looking for more tasks. If there are no more tasks left, it waits for a job to complete and then rechecks whether there are any tasks left, just in case other tasks were unblocked by its completion.

One very useful side effect is that this class will be collecting lists of tasks instead of operating on one at a time, so it serves as a useful building block towards remote caching. The current one-at-a-time cache retrieval approach wouldn't work for remote caching due to network latency but this approach can.

Please let me know what you think either over e-mail or on the aforementioned pull request.

Thanks,
Adam Gross
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/scons-dev/attachments/20190709/f2d76228/attachment.html>


More information about the Scons-dev mailing list