Ninja: Ninja consumes all available memory

Created on 27 May 2018  ·  20Comments  ·  Source: ninja-build/ninja

I have been investigating causes of swapping on my system, and stumbled upon this fragment in ninja code: https://github.com/ninja-build/ninja/blob/03df526e07b62c0c5dfe61720cf9263ae4fb808b/src/ninja.cc#L223-L233

Ninja is used by Android build system, and since I am compiling a lot of Android code, it's performance strongly affects usability of my system.

My work PC has 4-core CPU with max 8 threads, and the home PC has 8-core CPU with max 16 (!!) threads. Both have 8Gb of RAM.

Needless to say, ninja compilations quickly hoard all available memory and cause heavy swapping.

Right now ninja defaults to allocating CPU+2 threads, which can easily exhaust OS resources, if amount of available memory does not "match" count of CPUs. There are few other programs with this kind of default, but most of those are games, which are optimized to handle fixed assets and conserve memory. Ninja processes external data — software source code, — some of which is very memory heavy (e.g. C++). This is definitely NOT ok. If the current CPU trend continues, we will soon see consumer-targeted computers with 64+ cores. If the current RAM trend continues, most of those computers won't have matching amount of RAM.

I have seen some discussions about conserving memory, used by compilation, by dynamically monitoring memory usage. I don't personally care about that — most of my projects have predictable compilation footprint.

Instead I'd like ninja to perform some basic sanity checks and limit it's maximum parallelism, based on available system memory. If some of installed CPUs don't have at least 1Gb of RAM to each, don't count those CPUs towards default parallelism setting. This will keep number of parallel jobs roughly same for most systems with <4 CPUs as well as enterprise Xeon buildservers, while providing more reasonable default for systems with subpar RAM amount.

Most helpful comment

@jimon

I understand, that you want to help, but that's not the point.

My point:

1) make has safe defaults and allows you to switch to greater parallelism via environment variable.
2) ninja has unsafe defaults. I am constantly afraid, that building some program will make my system hang (I don't want to check, what build system is used by each random AUR software, before building it!) I don't know, how to globally change defaults of ninja.

All 20 comments

ninja does not know about the process it runs, also I think the model a process uses one cpu core seems best fit for most of buildings.
But memory consumption is largely different between commands (e.g. a simple python script vs large object linking), and it is difficult to have common assumption of memory usage for such commands.

If you want to control the parallelism of memory consuming process, it is better to specify lower -j or use pool feature when you know memory footprint of your build well.
https://ninja-build.org/manual.html#ref_pool

I think the model a process uses one cpu core seems best fit for most of buildings.

I believe, that total amount of RAM should also be considered. Swapping to hard drive certainly does not help real-world performance.

If you want to control the parallelism of memory consuming process, it is better to specify lower -j or use pool feature

I am not developer of Android NDK build system, so I don't pass command-line arguments to ninja command, — Android build system does.

I have set environment variable MAKEFLAGS="-j4" (which is abided by most make-based build systems out there), but ninja does not appear to use it.

I speculate that you can provide arguments to ninja directly through cmake from gradle. From what I can see, cmake integration in gradle has arguments field. Maybe this will work?

@jimon

I understand, that you want to help, but that's not the point.

My point:

1) make has safe defaults and allows you to switch to greater parallelism via environment variable.
2) ninja has unsafe defaults. I am constantly afraid, that building some program will make my system hang (I don't want to check, what build system is used by each random AUR software, before building it!) I don't know, how to globally change defaults of ninja.

@Alexander-- , imagine you have a single executable (compiler or similar) that just allocates more RAM than you have (via swap file or other means). You have a build system that just executes that app and viola - your system is not responsive. By this mental example it's obvious that this problem is not solvable in absolute. Now let's try to imagine a case where executable allocates memory gradually over time, ninja has no knowledge upfront of memory usage, and it will become only apparent that RAM is gone only after it's actually gone, at that point there is nothing to be done. The problem is not trivially solvable, and I don't believe it should be in scope of the project.

As for make having safer defaults - I can't imagine how many developer time is wasted simply because devs are either not aware or can't be bothered with setting parallelism in make. I constantly see my colleagues wasting minutes or even hours because they are not aware that make builds in one thread by default.

I don't want to oppose fixing your problem, and I don't want to sound harsh :) But I think the most optimal solution is just to override gradle behavior and move on. Because it is very specific to your project/computer and probably doesn't showing up on a bigger scale.

Which Android build system are you using? Building Android apps with the NDK, or building the entire platform (ROM)?

Ninja + Gradle + Jack = Hell. (it takes up at least 16GB RAM)
I don't know where problem exists exactly. but I guess the major problem is from java build using jack-server. More Important thing is that Ninja(from Soong) changes all legacy things and introduces so many small software things, therefore we are lost in build system and cannot find the problem and solution.
I missed old Make. please get me out of this Hell of build.
I should got back to solve this "No Jack server running. Try 'jack-admin start-server'"

To rephrase @atetubou: if your CPU has 64 cores, it is reasonable to assume it is capable of executing 64 programs in parallel. The reason this assumption is a problem for you is that your compilation tasks appear to be larger than an ordinary program. Ninja has a mechanism for communicating this information, via the 'pool' feature. If you read that docs section you'll see it's designed for exactly this problem. https://ninja-build.org/manual.html#ref_pool

I am sympathetic to your problem but I don't see an easy way for Ninja to solve it. You say it should "consider" the total amount of RAM, but what sort of formula could we use?

@av930 jack is going away in P and can be disabled in O builds. I removed the use of jack from O builds and my 8-core with 12 gig memory has no problem running with 16+ threads. You can also change the arguments ninja uses during the builds. The problem is jack not ninja.

Also this change might help you: https://github.com/ninja-build/ninja/pull/1399

Oh my god! thanks, It works. Finally I finished AOSP full build in 16GRAM machine.
jack disable: ANDROID_COMPILE_WITH_JACK := false in build/make/core/javac.mk

this is log.
[ 99% 101694/101695] Install system fs image: out/target/product/taimen/system.img
out/target/product/taimen/system.img+ maxsize=2740531200 blocksize=135168 total=1076289888 reserve=27709440
[100% 101695/101695] Target vbmeta image: out/target/product/taimen/vbmeta.img

build completed successfully (04:43:31 (hh:mm:ss))

I think this might be solvable by telling the OS that for a group of n running processes it's okay to suspend up to n-1 processes in an out-of-memory situation. Does someone know if this can be achieved with cgroups on Linux?

There are actually two PRs which implement a memory limit: #1354 and #660. Discussion in the latter is interesting.

This mostly sounds like the Android build was holding ninja wrong by not putting jack in a pool.

@nico

Could you point at specific Ninja version, that fixed this issue? I am still seeing Ninja use too much parallel tasks by default whenever I build something from AUR. It looks like Ninja is still attempting to create 18 thread on a machine with 16 cores, regardless of available amount of RAM.

Ninja doesn't do anything here, it's up to the generator to make sure things that need lots of RAM (or similar) are in an appropriately-sized pool.

it's up to the generator to make sure things that need lots of RAM (or similar) are in an appropriately-sized pool

Is this your personal opinion, or official position of the Ninja developers? The documentation never speaks of such responsibility, and I haven't heard of generators, that actually meddle with Ninja parallelism.

Either way, generator writers aren't in better position than you to choose, how many parallel processes to run. That's best decided by buildserver administrator.

You are saying, that there is no (and should never be) a generic way to lower Ninja resource usage, that works regardless of used generator. I disagree. There are build tools (e.g. GNU Make), that have that functionality, so there is already a precedent for such feature.

The documentation never speaks of such responsibility, and I haven't heard of generators, that actually meddle with Ninja parallelism.

Not explicitly about RAM, but:

"build-time customization of the build. Options belong in the program that generates the ninja files."

"Ninja has almost no features; just those necessary to get builds correct while punting most complexity to generation of the ninja input files."

You are saying, that there is no (and should never be) a generic way to lower Ninja resource usage, that works regardless of used generator.

There are two ways: The -j and the -l flag.

I am still seeing Ninja use too much parallel tasks by default whenever I build something from AUR.

I don't know exactly how the AUR works, but would putting the following script into /usr/local/bin and making it executable via chmod +x /usr/local/bin/ninja work?

#!/usr/bin/env python3

import sys
import subprocess

try:
    subprocess.check_call(['/usr/bin/ninja', '-j1'] + sys.argv[1:])
except subprocess.CalledProcessError as e:
    exit(e.returncode)

-j1 could have been mentioned like in the first response.
I have this project which has over time aquired a lot of internal external_projects. That is, they build statically instead of the dynamic build that the common makefile uses, so they each have their own flags and sources that get built.
Today, on this older laptop, (I haven't previously used ninja on this system) it's only a 4GB windows 7 system with core i7 (8 threads). When I started the build, the whole system went to a crawl for half an hour while it swapped to and from the disk.
It resulted in baizaare errors

C:/general/build/mingw64-x64/sack/RelWithDebInfo_out/core/include/SACK/stdhdrs.h:259:24: fatal error: C:/general/build/mingw64-x64/sack/RelWithDebInfo_out/core/include/SACK/loadsock.h: Invalid argument
compilation terminated.

why is including a file an invalid argument? the file exists....
so the whole build finished, but didn't build very many targets successfully...
even when I now use

ninja -j1 -v install 

the inner ninja processes still use a -j8

The build ended up being -j8 on the first which launched 8 external projects each with -j8, so -j64 peaked out memory until there was none...

Jobserver support might help in that case: #1139

The independent Ninja build language reimplementation https://github.com/michaelforney/samurai supports this via the environment variable SAMUFLAGS. Simply use samu anywhere you'd otherwise use ninja, or install samu as your system /usr/bin/ninja (some linux distros have options to install this competing implementation as the default, or only, ninja).

I advise updating AUR packages that build with ninja, to use community/samurai instead.

Was this page helpful?
0 / 5 - 0 ratings