Ninja: Ninja should have a way to trigger compiles through IPC instead of CreateProcess

Created on 23 Aug 2019  ·  6Comments  ·  Source: ninja-build/ninja

CreateProcess on Windows is slow at the best of times, and Chrome developers often find themselves not in the best of times. Some process creation/destruction issues that have been dealt with over the last few years in the Chrome build include:

  1. Chrome builds hit UserCrit lock contention during process destruction causing build slowdowns and significant mouse stuttering during goma builds due to a Windows regression. Fixed in Windows, fix eventually propagated to all relevant Windows 10 builds (https://randomascii.wordpress.com/2017/07/09/24-core-cpu-and-i-cant-move-my-mouse/)
  2. Chrome builds hit UserCrit lock contention during process destruction causing build slowdowns and significant mouse stuttering during goma builds due to gomacc.exe indirectly pulling in gdi32.dll. Fixed in gomacc by avoiding gdi32.dll.
  3. Anti-malware software caused an ~11 ms regression in CreateProcess making it impossible for a full Chrome build to be faster than ~ten minutes, and slowing all builds. Fixed by disabling this anti-malware software.
  4. At least two variations on problem number 3 including other anti-malware software that is either not correctly disabled or still has overhead (MsMpEng.exe) even when disabled.
  5. The high process creation rate in Chrome's build triggered bugs in SCCM that caused zombie processes that eventually led to up to 32 GB of lost memory (https://randomascii.wordpress.com/2018/02/11/zombie-processes-are-eating-your-memory/)
  6. Enabling App Verifier for gomacc.exe to investigate occasional heap corruption hit an O(n^2) performance issue in log-file creation (https://randomascii.wordpress.com/2018/10/15/making-windows-slower-part-2-process-creation/)

Other process creation/destruction issues that didn't affect Chrome builds include:

  1. Chromium's large test binaries hit an O(n^2) CFG performance bug hit in Windows which caused unit_tests.exe to take approximately 5x as long as it should on Windows 10. Fixed by disabling CFG in test binaries (https://randomascii.wordpress.com/2019/04/21/on2-in-createprocess/)
  2. clang-cl unit tests hit UserCrit lock contention during process destruction causing the tests to take 5x as long as they should on Windows 10 and causing significant mouse stuttering. Fixed by avoiding pulling in gdi32.dll.

Currently all of these issues are mitigated, with the exception of MsMpEng.exe causing non-zero overhead even when everything is in excluded folders. However there is ongoing effort needed to keep all Chrome developers' machines properly configured, and some loss of security because of the many exclusions needed to maintain decent build performance. There is also ongoing effort to detect and investigate these issues.

At some point it is no longer worth tilting at the CreateProcess windmill and instead avoid calling CreateProcess so frequently. ninja.exe could be taught how to use IPC to communicate with a local server process that would manage a pool of compile processes (goma and/or clang-cl) in order to avoid 99% of CreateProcess calls during a build.

Doing this would reduce the CPU cost of CreateProcess, would remove CreateProcess from the serialized critical path in ninja, would allow more security monitoring to be enabled with fewer exclusions, and would save us from the costs (investigation time and build time) of future regressions.

Alternately ninja could multi-thread CreateProcess but this is not as complete a solution.

Switching from CreateProcess to IPC in ninja is not clearly the right thing to do, but it is worth discussing. A prototype has been created so test results should be shared here.

All 6 comments

It seems to me like library-ifying Ninja would allow for efforts like this to more easily inject the behavior that they want, such as by deriving from an interface class to customize the process creation behavior.

Not surprisingly, I strongly disagree with this. Ninja is a simple build system that assumes that the build environment is sane. There are other build systems with different designs that might assume hostile build environments, but ninja is not that system.

It sounds like ninja's current design help identify lots of bugs that were worth fixing. I consider this a feature.

@nico, I strongly disagree with your opinion on this matter.

To begin with, Ninja started as a tool to improve the compile times of the Chrome project. http://neugierig.org/software/chromium/notes/2011/02/ninja.html

Now you're telling someone who's talking about ways to improve compile times for the chrome project that their contribution isn't welcome.

Ninja would easily accommodate using IPC to execute build jobs if it was a graph execution library first, and a build tool second. That would allow for the IPC logic to be injected into the library, instead of living in the core code.

Furthermore, as a "maintainer" of a project that rarely sees such sophisticated bug reports (with the implied promise to follow up with actual code), it's rather ridiculous that you would close this one with absolutely no discussion of the topic, but none of the 189 other issues that remain open, some as old as 8 years old.

And, of course, calling Windows a hostile build environment may well be true, but it's not a valid justification for ignoring a contribution. Windows is still one of the most widely used development operating systems, and ignoring that is disingenuous at best.

@nico I encourage you to step away from the Ninja project. Your only action in months is to dismiss a sophisticated contribution, with heaps of documentation explaining the problem and solutions that have been pursued.

@randomascii I would be happy to see a contribution like what you're talking about here integrated into Ninja. At my last workplace, I strongly suspect that an IPC mechanism like this could have cut build times by at least several minutes per build, which would have been wonderful.

Could you describe more how a pool of compile processes would be managed? Does windows have a mechanism that allows a process to be re-used to execute something else? Or would this require the compiler process to be reusable?

My understanding is that a colleague has a prototype for ninja-with-IPC.

My main concern about closing this without more discussion is that it misses an opportunity to discuss the security tradeoffs. Most of the ways that we have mitigated the CreateProcess costs/serialization are by disabling various security checks. I'm not an expert on the value of those checks but it could be instructive to have those who are experts weigh in on the benefits of being able to enable them.

I would encourage your colleague to submit a Pull Request here on github with their prototype, so that the implementation can be discussed.

Was this page helpful?
0 / 5 - 0 ratings