Ninja: Tons of include file notices when building with Chinese version of Visual Studio 2010

Created on 5 Jul 2013  ·  32Comments  ·  Source: ninja-build/ninja

When building chromium with ninja with Chinese version of Visual Studio 2010, the console window outputs tons of include file notices and thus greatly slows down the building process. The notice is like:

注意: 包含文件:include file path

The English equivalent is:
Note: including file: .....

Perhaps ninja is stripping the include notices based on the English words only.

bug windows

Most helpful comment

With a french local, ninja 1.8.2 and CMake 3.10.2, this still happens...

All 32 comments

On an English install "Note: including file:%s%s\n" appears as resource in the string table in VC\bin\1033\clui.dll.

1033 is the locale identifier for "English (United States)".

I'm not aware of any command line option to force cl into 1033 locale unfortunately. I assume if multiple locales are installed it uses system settings to determine which to use.

So, I guess we'd have to have add various languages to the prefix search in the /showIncludes parser. :/

I think cmcldeps (the CMake parser of this output) uses a regex, something like "[^:]+: [^:]+: (.*)", to grab all output lines that look like showincludes output. I haven't looked at the code too hard because I'd eventually like to implement something like that and I don't want to violate any copyrights. :)

The tricky part is not confusing showincludes output with warnings. sfcheng, could you paste the Chinese output of Visual Studio cl.exe when showing a warning or error message?

It looks like that : is not ascii 58, so that might add a wrinkle. Maybe the line number "(\d+)" that will be in errors could be useful signal.

I'm not aware of any command line option to force cl into 1033 locale unfortunately.

There is no (clean) way to do this, unfortunately. Foreign language versions of VS would have other locale resources in different number(e.g. 1041 for JA).
What we learned: always install EN version of VS, then install language pack if needed :(

But fortunately, "error Cnnnn" and "warning Cnnnn" never get localized. So we can use them as key. But as @sgraham said, line numbers looks like more promising because it would also allow filtering out 'note:' output.

I'm not sure whether or not : are not ascii 58. In Japanese version, these are certainly ascii 58.

FWIW, Japanese output would look like this:

C:\cygwin\home\oku>type main.c
#include <stdio.h>
int nah(void){}; /* Trigger "function must return a value */
main(){return nah();}

C:\cygwin\home\oku>cl /showIncludes main.c
Microsoft(R) C/C++ Optimizing Compiler Version 16.00.40219.01 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.c
メモ: インクルード ファイル:  C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h
メモ: インクルード ファイル:   C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\crtdefs.h
メモ: インクルード ファイル:    C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\sal.h
メモ: インクルード ファイル:     c:\program files (x86)\microsoft visual studio10.0\vc\include\codeanalysis\sourceannotations.h
メモ: インクルード ファイル:    C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\vadefs.h
メモ: インクルード ファイル:   C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\swprintf.inl
c:\cygwin\home\oku\main.c(2) : warning C4716: 'nah' : 値を返さなければいけません

Related art: https://bugzilla.mozilla.org/show_bug.cgi?id=587372 (approach there: read prefix from an env var, have an autoconf check to verify /showIncludes parsing works. Not great.)

Idea similar to the mozilla bug: There could be a toplevel var msvs_includes_prefix, and generators could write that by compiling a dummy #include "knownheader.h" file with /showIncludes and writing whatever is in front of "knownheader.h" in the cl.exe output into that toplevel var. Ninja would then use msvs_includes_prefix as the /showIncludes prefix.

While CMake configurures, the include prefix is read from one dummy build,
https://github.com/Kitware/CMake/blob/master/Modules/CMakeClDeps.cmake
where a regex is used, later on the prefix is passed as argument to the tool,
https://github.com/Kitware/CMake/blob/master/Source/cmcldeps.cxx
and std::string::substr is used for processing the cl.exe output.

I assume ninja also needs to accept an additional (global) argument.
(like nico suggested)

CMake also uses cmcldeps to generate dependencies for .rc files by first "compiling" the .rc file with cl which generates the dependency file, and then processes it with the rc tool.
Not sure if or how this could be integrated into ninja.

https://github.com/martine/ninja/pull/665

Does this work with non-Ascii prefixes?

665 is merged. We might have to iterate on encoding issues a bit still though, so leaving this open until this is verified working.

With a french local, ninja 1.8.2 and CMake 3.10.2, this still happens...

665 added msvc_deps_prefix. Does cmake set that? @syntheticpp @mathstuf

@nico I see code in CMake which references it.

Still happening on Visual Studio Community 15.9.7...

For the record, this is still happening to me with the following configuration:
CMake 3.14
Ninja 1.8.2 (the one that ships with Visual Studio 2019)
French locale.

EDIT: Better workaround: set VSLANG=1033 in the environment to force CL to output English messages.

Old workaround:
And for those who also hit this issue, my workaround was to comment out the following line in $CMAKE_PATH\share\cmake-3.14\Modules\Platform\Windows-MSVC.cmake:
#set(CMAKE_NINJA_DEPTYPE_${lang} msvc) (line 368 in mine)

This unfortunately causes CMake to generate deps = gcc instead of juste removing the deps line, but that hasn't seemed to break my builds. YMMV. This is a workaround.

Setting deps = gcc is probably benign without setting depfile as well.

@DrFrankenstein Would you be willing to try reproducing this after applying this PR to the ninja codebase? https://github.com/ninja-build/ninja/pull/1671

I'll give it a shot this week!

Unfortunately, that didn't fix it.

I built ninja from that branch, then used that version of ninja to build itself again, and it still leaked the include messages to the terminal.
image

It seems to me like the issue here might be with the MSVC include handler.

Is the grammar for that not recognizing the output from cl.exe correctly?

Well...

It looks like it's a problem with hardcoded english.

https://github.com/ninja-build/ninja/blob/master/src/clparser.cc

string CLParser::FilterShowIncludes(const string& line,
                                    const string& deps_prefix) {
  const string kDepsPrefixEnglish = "Note: including file: ";
  const char* in = line.c_str();
  const char* end = in + line.size();
  const string& prefix = deps_prefix.empty() ? kDepsPrefixEnglish : deps_prefix;
  if (end - in > (int)prefix.size() &&
      memcmp(in, prefix.c_str(), (int)prefix.size()) == 0) {
    in += prefix.size();
    while (*in == ' ')
      ++in;
    return line.substr(in - line.c_str());
  }
  return "";
}

@DrFrankenstein

Do you feel like messing with that english prefix at the top of the function to see if it does you any better?

Ha! I was actually going to look in there tomorrow. Seems like you caught it before I had a chance to.

I just shut my computer down for the night. I'll get back to you tomorrow!

It's worth noting, though, deps_prefix should contain the localized string as set in the rules.ninja file (usually detected and set by CMake). It only uses the hard-coded one if it's absent.

I suspect the logic just after it might be the actual culprit. But as I said, I'll do a proper investigation/debugging session tomorrow.

The encodings don't match. deps_prefix is in Latin-1 (where the NBSP before the colons is 0xA0), and line is in CP437 for some reason (NBSP = 0xFF).
image

I think that CL itself is outputting CP437, but the CMake-generated rules.ninja is in Latin-1. I'm guessing that some conversion occured on the CMake side, but that'll require more digging.

EDIT: It seems like CL will output in whatever the console's codepage is. (Source 1, Source 2). I'm not sure how we can force it to be something else.

Perhaps we can bring the two together by converting them both to a common encoding such as UTF-8 (or whatever Ninja prefers to use), e.g. by calling MultiByteToWideChar(CP_OEMCP, ...) on the CL output, and MultiByteToWideChar(1252, ...) on the string that comes from rules.ninja.

Thinking back on this... this might be CMake's fault. On Windows, the execute_process command seems to convert the output of the command to UTF-8 internally (and accepts an optional ENCODING parameter to specify the output's encoding). It thus writes it back in UTF-8 in the rules.ninja file (where NBSP is 0xA0 and not 0xFF).

I tried changing CMAKE_DETERMINE_MSVC_SHOWINCLUDES_PREFIX to use ENCODING NONE (perform no conversion), but it seemed to break all sorts of things in CMake.

So the question I'm having now is... should ninja's msvc_deps_prefix be a bitwise match of the compiler's output, or should it be in whatever encoding the file is expected to be, in which case it would be Ninja's job to do the proper conversions from the compiler output?

@bradking Thoughts on the encoding and prefix detection here?

Historically ninja has been encoding agnostic (as long as the encoding used the same byte as ASCII for '/'). However, Windows might make that difficult.

Ninja's CLParser::FilterShowIncludes is using memcmp to compare msvc_deps_prefix to lines in MSVC's output so indeed the value needs to be a bitwise match. CMake may need some work to preserve that. CMake currently converts to UTF-8 internally so perhaps all that is missing is converting back to the codepage's encoding when writing the value to build.ninja.

IIRC, MSVC's output encoding can be affected by environment variables and/or flags. That means we may end up with the compiler output in a different encoding than the codepage in which Ninja is operating and using to interpret strings in build.ninja. Such cases may require extra support from Ninja to handle but further investigation would be needed.

I couldn't find any environment variable affecting the codepage used by CL. I think it just uses the codepage associated with the process (which is based on the system's regional settings, or by the console settings if the process is running in one).

However, there _is_ an environment variable that sets the language used by CL, VSLANG, which can be useful as a workaround for users affected by this bug. Setting VSLANG=1033 before generating the ninja files will prevent the bug from happening.

Just to restate my above comment in different words: Ninja treats its input files as (encodingless) bytes, and does encoding-ignorant byte comparisons of strings, to attempt to evade these issues. You need the bytes that appear in the build.ninja file to match the bytes that ninja reads from the process stdout, but ninja doesn't care about encodings.

After CMake generating all the build files, I manually converted rules.ninja to UTF-8 which contains a line msvc_deps_prefix = 注意: 包含文件:, and then things got fixed. (That file used to be in GB2312 encoding, which correspond to the default code page 936.) I guess changes could be done to CMake so that it always converts rules.ninja to UTF-8?

I have no experience working on locales other than code page 936 or 65001, so I have no idea whether the solution above is a universal fix, though.

Same problem and manage to erase this output with add /W2 instead of /W3 in CMAKE_CXX_FLAGS

This is related to #1766

Was this page helpful?
0 / 5 - 0 ratings