[Scons-dev] SCons performance investigations

Mon Jul 24 16:45:36 EDT 2017

From performance point of view. I have found that the best performance is to do timestamp-csig checks as most of the time the concern is that the file did not change, and for various reasons the timestamp did. So the csig check is more about making sure we don’t rebuild stuff that system quirks might have causes us to waste time on. I have never seen the case of the timestamp did not change but the file content did.

Also the main time I have seen for csig checks is reading the contents, not processing the sig. ( this is total time as there is a delay form the OS to read the file data). So for me there is not going to be a major improvement of speed to gained here.

From a design point of view I agree this would be a nice improvement that should not be hard to add on top of the default decider logic. Keep in mind the decider logic is as it is to allow the user full control, it is not user friendly. Small tweaks as providing an interface impl to control the csig has creation based on given input would be nice in certain cases. Keep in mind the hash logic has to have the property to be reproducible between runs. Security hashes define this for us by default. A unique reversible hash would be just as useful and if it was faster would not hurt. However I don’t think it would make help much in terms of speed in the general case as again reading data of disk ( or main memory) is the main time limiter.

Another reason why this could be useful as it can be nice to allow a way to filter the content before it is hashed, allowing the removal of comments, for example, from  the hash csig which I have seen as a common request, or reason to define a more complex decider object. Such an interface would have a more functional usage I believe, vs a performance one.

Another tweak that might help is a configurable way to control how SCons reads the file data. At the moment it is hardcoded to a best guess block size, different sizes may help greatly with read times.

Jason

From: Scons-dev [mailto:scons-dev-bounces at scons.org] On Behalf Of Andrew Featherstone
Sent: Monday, July 24, 2017 2:38 PM
To: SCons developer list <scons-dev at scons.org>
Subject: Re: [Scons-dev] SCons performance investigations

Could SCons use a faster hashing algorithm if available in preference to the md5 default? I know the user can override the Decider, but it'd be nice if SCons did this by itself.

Andrew

On 24 July 2017 at 18:42, Jason Kenny <dragon512 at live.com<mailto:dragon512 at live.com>> wrote:
I believe we are all clear on why we Clone the environment. I did not understand you were asking if a new feature would be useful.

I would say that it would be useful to allow Read Only environments in certain cases. However My worry is that it would be used as a way to enforce values that might need to be tweaked. So this might lead to more cloning, or a feature request to prevent cloning. This goes against what I find useful in making component to build in a larger project in a easy pluggable way. I think a more useful feature would be to allow Keys to be set as read-only and warn if the value changes. Then has a build to set a error on warning feature to error when a change happens vs warn, and to also allow for section of code to be exception ( in either direction) to the rules that need to.

I believe most of what I understand is that a more aggressive copy on write environment would decrease build time and memory usage. The primary issue is dealing with update to list variables like CPPFLAGS or better yet CPPDEFINES directly which would require proxies to allow “native” updates to happen in a COW like way safely.

Jason

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Bill Deegan<mailto:bill at baddogconsulting.com>
Sent: Monday, July 24, 2017 10:18 AM
To: SCons developer list<mailto:scons-dev at scons.org>
Subject: Re: [Scons-dev] SCons performance investigations

Jason,
A somewhat common use model is to create a configured Environment, and then clone it to pass to a subordinate SConscript.
The only reason the clone is done is to prevent the subordinate SConscript from polluting the environment.
It is for this use that I was asking if a read-only Environment would be useful.
I'm also curious if in this case is it expected that the SConscript would or should not modify the Environment?   Is this a functional clone, or a protective clone..

-Bill

On Sun, Jul 23, 2017 at 8:41 PM, Jason Kenny <dragon512 at live.com<mailto:dragon512 at live.com>> wrote:
I am not sure what “read-only” mode is. But in Part I make lots of clones ( lots) so making a lot of clones is not the issue, it is done as when you have larger build you start to break up items in to groups/components ( ie Parts) and you want to make sure a user of one component does not change a value that would effect another Component. We could do better on how the data is copied and shared. By making Clones more of a copy on right setup.

Jason

From: Scons-dev [mailto:scons-dev-bounces at scons.org<mailto:scons-dev-bounces at scons.org>] On Behalf Of Bill Deegan
Sent: Sunday, July 23, 2017 7:38 PM
To: SCons developer list <scons-dev at scons.org<mailto:scons-dev at scons.org>>
Subject: Re: [Scons-dev] SCons performance investigations

Jonathon,
I've seen the clone before passing in other builds.
I'm wondering if you could put your environment in a read-only mode before passing it (not allow changes to be made), would that suffice and remove the desire/need to clone()?
-Bill

On Sun, Jul 23, 2017 at 4:51 PM, Jonathon Reinhart <jonathon.reinhart at gmail.com<mailto:jonathon.reinhart at gmail.com>> wrote:
I just wanted to add some quick anecdotes.  In some of our largest, most complicated builds, we have observed a lot of the same things as you all have.

One time we did some quick profiling, and saw that much CPU time during a null build was spent in the variable substitution.

Additionally, we also have a habit of cloning the environment before passing it to a SConscript. This is for safety - to ensure that a child SConscript can't mess up the environment for its siblings.

Jonathon Reinhart

On Sat, Jul 22, 2017 at 5:23 PM, Bill Deegan <bill at baddogconsulting.com<mailto:bill at baddogconsulting.com>> wrote:
Jason,
Any chance you could add these comments to the wiki page?
https://bitbucket.org/scons/scons/wiki/NeedForSpeed
-Bill

On Sat, Jul 22, 2017 at 10:09 AM, Jason Kenny <dragon512 at live.com<mailto:dragon512 at live.com>> wrote:
Some additional thoughts

Serial DAG traversal:
•       On the issue here as well is that the Dag for doing builds is based on nodes. There is a bit of logic to deal with handing side effects and build actions that have multiple outputs. Greg Noel had made a push for something called TNG taskmaster. I understand now the main fix he was going for is to tweak SCons to navigate a builder Dag instead of Node DAG, the node Dag is great to get the main organization but after that it is generally trivial to make a DAG based on builder at the same time, Traversing this is much faster, we require less “special” logic and will be easier to parallelize.
o   On big improvement this provides is that we only need to test if the sources or targets are out of date if the dependent builders are all up to date. If one of the is out of date, we just build, This vs we check each node and see if the build action has been done which requires extra scans and work in the current logic.
o   Given a builder is out of data you just mark all parents out of date. We only care about builders in a set that we don’t know are out of date yet. Simple tweaks on how we go through the tree can mean we only need to touch a few nodes.
Start up time:
•       Zero build time is going to be the worse case for a build up to date, as we have to make sure all items are in a good state. Time to start building on diff should be a lot faster. Scons spends a lot of time having to read everything on second passes. We can use our cache much better to store states on what builds what, etc to avoid even having to read a file. If the file did not change we already know the node/builder tree it will provide. We already know the actions. We can start building items as soon as a md5/time stamp check fails most of the time. Globs can store information about what it read and processed and only need to go off when we notice a directory timestamp. Avoiding processing build files and loading known state is much faster than processing the python code. My work in Parts has shown this. The trick is knowing when you might have to load a file again to make sure custom logic get processed correctly.
•       In the case of Parts it would be great to load file concurrently and in parallel. I think I have a way to go this concurrently which I have not done yet. The main issue is the node FS object tree is a sync point for being parallel.
CacheDir:
            100% agree..
SConsign generation:
•       I think this is a bigger deal for larger builds. I have found in Parts, as I store more data I would try to break up the items into different files. This helps, but in the end, at some point a pickle or JSON dump takes times. It also takes time to load them as in cases for builds I have had, loading 700mb files takes even the best systems a moment to do. This is a big waste when I only need to get a little bit of data. Likewise, the storing of the data could and should be happening as we build items. As noted we don’t have a good way to store a single item without storing all the file. If the file is large 100MB to GBs this can take time, as in many seconds, which in the end annoy users. I would say with what I do have working well in Parts that the data storage, retrieval is the big time suck. Addressing this would have the largest impact me.
Process spawning:
•       I add this as We had submitted a sub process fix for POSIX systems. The code effect larger builds more than smaller builds because of forking behavior. I don’t believe it been added to SCons as of yet.
•       As a side design note, If we did make a multiprocessing setup for SCons, This might be less of an issue, as the “process” workers only need information about a build to run on. Changing of nodes state would have to be synced with the main process via messages as there would be no fast efficient way to share the whole tree across all the process.
•       Another thought is we might want to look at some nested parallel strategies to make a task like setup that might allow us to use the TBB python library to avoid the GIL issue. However, given my time on SCons/Parts I think the change of a taskmaster to go over a builder DAG will have the biggest effect

Variable Substitution:
I abuse this in Parts to share data in a lazy fashion between components. It has been a sore point for me, given reason stated below. We have done some work to address the items by reusing states better. I can say there are some issues with the current code that causes memory bloat and wasted time. I don’t want to dwell on this, but will say that this is the second biggest item in my mind that would have a big impact to overall time to the user. I know I want to change the load logic in Parts to avoid using the substitution engine as much as possible.

Environment creation:
            It easy to define lots of different environment in a large build. How you do this is can be subtitle and have a huge effect on build time. Ideally, you always want to clone the “default” environment you have or pass values into builders, not the environment. I feel that it better for SCons to define a more Default environment and all environment created are clones. I would also push to have all Clone be a copy of write environment. There are still cases in which the user needs a “clean” environment, however, in my experience, the common case of all the environments I have made in Parts are only small copy on write clones from a common base. I think we should have more copy on write higher up the stack. At the moment the class that does copy on write are used in builders, not in the Clones.
Configure check performance:
•       For me so far I try to avoid this feature as much as I can. However, it does have it uses. I feel from using automake at the moment SCons version is faster, but lacks some common features. The main issue I have seen is that a user can make complex logic that can run slow. For a project I am working on porting from automake, the item for me is if there is a better way to say this in SCons. At the moment it is a lot of code that is easy to break. I would like a better way to express this. I feel this could help address maintainability issues with configure logic as well as avoiding certain speed issues to better use Scons logic to check if we need to

Some last thoughts:

  1.  The big value SCons tends to have for me is the ability to create reproducible environments to do a build. One that is not broken because of different shells the user might be running in. This ability to duplicate exactly on a dumb shell is a huge win. The use of SConsign to help store tool state is an item I want to improve on in the Parts toolchain improvements. I think for SCons this is a win as well. More so for people using SCons to cross build. There is a time to start up we can avoid by some smarter logic on using what we know about tools. Honestly, tools don’t get added or removed as often as we change build files or source files.
  2.  Given the common case for most devs would be to build changes in the source, It seems to me using our cache better to speed this up would have a big effect. We can detect changes in inputs that would cause us load build files. Most of the time the user added/removed code that has no effect on the actions we would call in the end. Even with changes to imports/include we don’t need to load build files we already processed. The Scanner can deal with that for us.
  3.  Being smarter about how we store data could help us reduce what we keep in memory for a non-interactive build. This can help large builds as having to load a 2-3GB tree takes resources we would rather use on other items. I think we have options to store information and possible use of generators to reduce memory overhead and improve build speeds.
  4.  Given multiprocessing thinking, the main issue is that we have a large data tree. Sharing this tree across processes will be slow. We need to avoid this as much as we can. Using processes to do work that can be independent as possible and pass state to the main thread about node state which has the main data structure will work much better. This should have a positive effect on builder based on Python code as they can build independently. In all cases of builders, we have to address that I have seen builder that try to set state in the environment or globally. These states have to shared or avoided in some way. I not suggesting how to solve this.. but this will be a design issue to address.
  5.  Last item is that no matter how good SCons is.. people will want to be able to generate build files for a different system. The current logic for Visual studio, for example, tries to make a makefile project to run SCons. The users really want to make a MSBuild project. We should do that. Likewise, we should be better at working with other build system projects. Having good middleware to allow building or working with an automake or CMake project will help adoption. CMake is doing well because it is a build generator, same with Meson. You want to cover your bases with your users. Systems like these make it easy to do so.

When I was at Intel some of the people helping me made a profiler for Python in Intel VTune. I believe they are still working on that. It was useful at making fixes that were not obvious in Parts to get speed improvements. Since SCons is open source, you can use this tool for free. I would recommend it as it will give you some incite the default tools will not provide as well.

Jason

From: Scons-dev [mailto:scons-dev-bounces at scons.org<mailto:scons-dev-bounces at scons.org>] On Behalf Of Andrew C. Morrow
Sent: Friday, July 21, 2017 10:40 AM
To: SCons developer list <scons-dev at scons.org<mailto:scons-dev at scons.org>>
Subject: [Scons-dev] SCons performance investigations

Hi scons-dev -

The following is a revised draft of an email that I had originally intended to send as a follow up to https://pairlist4.pair.net/pipermail/scons-users/2017-June/006018.html. Instead, Bill Deegan and I took some time to expand on my first draft and add some ideas about how to address some of th e issues. We hope to migrate this to the wiki, but wanted to share it here first for feedback.

----

Performance is one of the major challenges facing SCons. When compared with other current options, particularly Ninja, in many cases performance can lag significantly. That said other options by and large lack the extensibility and many features of SCons.

Bill Deegan (SCons project co-manager) and I have been working together to understand some of the issues that lead to poor SCons performance in a real world (and fairly modestly sized) C++ codebase. Here is a summary of some of our findings:

  *   Python code usage: There are many places in the codebase where while the code is correct, performance based on cpython’s implementation can be improved by minor changes.

     *   Examples

        *   Using for loops and hashes to uniquify a list. Simple change in Node class yielded approximately 15% speedup for null build
        *   Using if x.find(‘some character’) >=0 instead of is ‘some character’ in x (timeit benchmark shows a 10x speed difference)

     *   Method to address

        *   Profile the code looking for hotspots with cprofile and line_profiler. Then look for best implementations of code. (Use timeit if useful to compare implementations. There are examples of such in the bench dir (see: https://bitbucket.org/scons/scons/src/68a8afebafbefcf88217e9e778c1845db4f81823/bench/?at=default)

  *   Serial DAG traversal: SCons walks the DAG to find out of date targets in a serial fashion. Once it finds them, it farms the work out to other threads, but the DAG walk remains serial. Given the proliferation of multicore machines since SCons’ initial implementation, a parallel walk of the DAG would yield significant speedup. Likely this would require implementation using the multiprocessing python library (instead of threads), since the GIL would block real parallelism otherwise. Packages like Boost where there are many header files can cause large increases in the size of the DAG, exacerbating this issue. There are two serious consequences of the slow DAG walk:

     *   Incremental rebuilds in large projects. Typical developer workflow is to edit a file, rebuild, test. In our modestly sized codebase, we see the incremental time to do an ‘all’ rebuild for a one file change can reach well over a minute. This time is completely dominated by the serial dependency walk.
     *   Inability to saturate distributed build clusters. In a distcc/icecream build, the serial DAG walk is slow enough that not enough jobs can be farmed out in parallel to saturate even a modest (400 cpu) build cluster. In our example, using ninja to drive a distributed full build results in an approximately 15x speedup, but SCons can only achieve a 2x speedup.
     *   Method to address:

        *   Investigate changing tree walk to generator
        *   Investigate implementing tree walk using multiprocessing library

  *   The dependency graph is the python object graph: The target dependency DAG is modeled via python Node Object to Node Object linkages (e.g. a list of child nodes held in a node). As a result, the only way to determine up-to-date-ness is by deeply nested method calls that repeatedly traverse the Python object graph. An attempt is made to mitigate this by memoizing state at the leaves (e.g. to cache the result of stat calls), but this still results in a large number of python function invocations for even the simplest state checks, where a result is already known. Similarly, the lack of global visibility precludes using externally provided change information to bypass scans.

     *   See above re generator
     *   Investigate modeling state separately from the python Node graph via some sort of centralized scoreboarding mechanism, it seems likely that both the function call overhead could be eliminated and that local knowledge could be propagated globally more effectively.

  *   CacheDir: There are some issues listed below. End-to-end caching functionality of SCons, including generated files, object files, shared libraries, whole executables, etc., is one of its great strengths, but its performance has much room for improvement.

     *   Existing bug(s) when combining CacheDir with MD5-Timestamp devalues CacheDir.

        *   Bug: http://scons.tigris.org/issues/show_bug.cgi?id=2980

     *   Performance issues:

        *   CacheDir re-creates signature data when extracting nodes from the Cache, even though it could have recorded the signature when entering the objects into the cache.

     *   Method to address

        *   Store signatures for items in cachedir and then use them directly when copying items from Cache.
        *   Fix the CacheDir / MD5-Timestamp integration bug

  *   SConsign generation: The generation of the SConsign file is monolithic, not incremental. This means that if only one object file changed, the entire database needs to be re-written. It also appears that the mechanism used to serialize it is itself slow. Moving to a faster serialization model would be good, but even better would be to move to a faster serialization model that also admitted incremental updates to single items.

     *   Method to address:

        *   Replace sconsign with something faster than the current implementation, which is based on Pickle.
        *   And/or Improve sconsign with something which can incrementally only write that which has changed.

  *   Configure check performance: Even cached Configure checks seems slow, and for a complexly configured build this can add significant startup cost. Improvements here would be useful.

     *   Method to address:

        *   Code inspection, look for improvements
        *   Profile

  *   Variable Substitution: Currently variable substitution, which is largely used to create the command lines run by SCons, uses an appreciable percentage (approximately 18% for a null incremental build) of SCons’ CPU runtime. By and large much of this evaluation is duplicate (and thus avoidable work). For the moderate sized build discussed above there are approximately 100k calls to evaluation substitutions. There are only 413 unique strings to be evaluated. Consider that the CXXCOM variable is expanded 2412 times for this build. The only variables which are guaranteed unique are the SOURCES and TARGETS, all others could be evaluated once and cached.

     *   Prior work on this item:

        *   https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/Discussion

     *   Working doc on current and areas for improvement:

        *   https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/SubstImprovement2017

     *   Method to address:

        *   Consider pre-evaluating Environment() variables where reasonable. This could use some sort of copy-on-write between cloned Environments. This pre-evaluation would skip known target specific variables (TARGET,SOURCES,CHANGED_SOURCES, and a few others), so minimally the per command line substitution should be faster.

Bill and I would appreciate any feedback or thoughts on the above items, or suggestions for other areas to investigate. We are hoping that by addressing some or all of these items, the runtime overhead of SCons could be brought down significantly enough to re-render it competitive with other build systems. We hope to begin work on the above items once SCons 3.0 has shipped.

Thanks,
Andrew

_______________________________________________
Scons-dev mailing list
Scons-dev at scons.org<mailto:Scons-dev at scons.org>
https://pairlist2.pair.net/mailman/listinfo/scons-dev

_______________________________________________
Scons-dev mailing list
Scons-dev at scons.org<mailto:Scons-dev at scons.org>
https://pairlist2.pair.net/mailman/listinfo/scons-dev

_______________________________________________
Scons-dev mailing list
Scons-dev at scons.org<mailto:Scons-dev at scons.org>
https://pairlist2.pair.net/mailman/listinfo/scons-dev

_______________________________________________
Scons-dev mailing list
Scons-dev at scons.org<mailto:Scons-dev at scons.org>
https://pairlist2.pair.net/mailman/listinfo/scons-dev

_______________________________________________
Scons-dev mailing list
Scons-dev at scons.org<mailto:Scons-dev at scons.org>
https://pairlist2.pair.net/mailman/listinfo/scons-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/scons-dev/attachments/20170724/8662e459/attachment-0001.html>