Xgboost: Better XGBoost installation on Mac OSX?

Created on 17 May 2019  ·  28Comments  ·  Source: dmlc/xgboost

Issue

Currently on MacOS the process to install the python package is as follows.

$ brew install gcc@5
$ export CC=/path/to/gcc-5; export CXX=/path/to/g++-5; pip install xgboost

Question

What I would like to learn from a more experienced contributor is whether there are any plans to simplify this install process? The above is not tenable for an automated install system for any package that depends on xgboost. What would be required to make xgboost compatible with Apple's clang.

1.0.0 Blocking

Most helpful comment

The formula was accepted by Homebrew, so Mac users can now do:

brew install xgboost

All 28 comments

Apple’s clang doesn’t support OpenMP out of the box, so hence Homebrew GCC is needed. So, no, XGBoost will not be compatible with Apple’s clang.

I think we can simplify the process by distributing binary wheels for Mac OSX. The binary wheels will contain pre-built libxgboost.dylib so that the user won’t need to have any compiler. (This is how Windows users don’t need to have Visual Studio installed to use XGBoost.)

However, I’m afraid that the maintainers (including myself) currently are not familiar with binary packaging with Mac OSX, i.e. how to make binaries that would be broadly compatible across multiple versions of OSX. Do you have any suggestions here?

For now, you should consider using conda-forge to automate XGBoost installation on Mac OSX.

@hcho3 thanks for your quick response! Conda certainly is an option but it would be much simpler to use pip. I'll look into what binary packaging on macos would look like. I am also not familiar with binary packaging, so anyone else's input who has experience in that area would be much appreciated.

I've had some difficulty with this issue as the dylib produced by the standard compilation process has a hard dependency on the homebrew gcc's libraries. If anyone has a way of changing that dependency after compilation (or making it generic across gcc versions) that would be great, but I don't think macOS ships with libgomp (which provides the OpenMP support) so we might need to package that as well, which makes life difficult.

@Craigacp @hcho3 Is this something we could consider until a cmakelists workaround is found. https://github.com/netket/netket/issues/225#issuecomment-502714445. I'm not super familiar with the internals of xgboost, how critical is OpenMP to the performance of the library.

This also seemed promising but I wasn't able to get it to work: https://stackoverflow.com/questions/46414660/macos-cmake-and-openmp.

@adithyabsk @Craigacp OpenMP is very much critical for the performance of XGBoost, since we want to use all available cores of multi-core CPUs commonly available on users' systems. Without OpenMP, you'd be able to use only one CPU core.

IMHO, pip is not designed for handling such external dependencies as libomp. On the other hand, conda is able to handle non-Python dependencies just as easily. See this post: https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/

How Microsoft/LightGBM solves the problem: they ask users to run brew install libomp. I'm not sure if this is any easier than installing GCC or Conda, since you will need to first install Homebrew.

@hcho3 the brew install libomp solution might be better as that can be provided in pre-install setup scripts whereas, currently, one has to separate out xgboost in CI pipelines to specify the appropriate gcc and g++ versions. Definitely, agree with you as far as conda goes and that might end up being the only solution but I just wanted to explore the other options to see if anything else was possible.

Sorry for the silly question, but is OpenMP required at runtime? For example, could we compile the dmlc-core and xgboost with OpenMP installed and then bundle that file into a wheel so that compilation wouldn't be necessary at install time using a tool like audit_wheel?

https://stackoverflow.com/a/42106034

@adithyabsk I just tried using brew install libomp and now I'm able to compile XGBoost with the default compiler, Apple Clang:

brew install libomp
mkdir build
cd build
cmake ..
make -j10

What's more, the resulting binary libxgboost.dylib depends only on /usr/local/opt/libomp/lib/libomp.dylib and OSX system libs. (No more dependency on specific version of GCC! Hooray!) So I suppose brew install libomp is the least painful way to install XGBoost on Mac OSX without Conda.

Distributing pre-compiled binaries is still tricky, however. Even if we were to include libomp.dylib inside the wheel, Mac OSX will not use the file, since shared library dependency is specified with full path:

hcho3@localhost: xgboost$ otool -l libxgboost.dylib    # show list of library dependencies

libxgboost.dylib:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x00           6    15       2112 0x00918085
....
Load command 10
          cmd LC_LOAD_DYLIB
      cmdsize 64
         name /usr/local/opt/libomp/lib/libomp.dylib (offset 24)
   time stamp 2 Wed Dec 31 16:00:02 1969
      current version 5.0.0
compatibility version 5.0.0
Load command 11
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name /usr/lib/libc++.1.dylib (offset 24)
   time stamp 2 Wed Dec 31 16:00:02 1969
      current version 400.9.0
compatibility version 1.0.0
Load command 12
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Wed Dec 31 16:00:02 1969
      current version 1252.50.4
compatibility version 1.0.0

On the other hand, Windows is more flexible when it comes to locating shared libraries. I found it sufficient to simply include vcomp140.dll (OpenMP runtime) inside the wheel.

@hetong007 Related note: brew install libomp should also enable multi-threading for CRAN XGBoost on Mac OSX

@hcho3 I think so. XGBoost R package calls the same backend API thus should behave the same.

@hcho3 That's an awesome development! Already moving in the right directions as I can attest that in a lot of R&D labs installing xgboost is a pain point for those not intimately familiar with its internal requirements.

Following up on this note:

Mac OSX will not use the file, since shared library dependency is specified with full path

Maybe we could look more into this particular issue to see if there are any workarounds to get the libomp.dylib into the binary wheel.

@hcho3 could also be because of the extension itself? Should we be using .so on macOS as well. This issue thread and stackoverflow post seem to indicate so.
https://stackoverflow.com/questions/2488016/how-to-make-python-load-dylib-on-osx
https://github.com/MoDeNa-EUProject/MoDeNa/issues/1

@adithyabsk Given the complexity of shipping the runtime library in the wheel (and getting it to load), let's settle with brew install libomp.

  • Homebrew is quite widely used already among power users (I think).
  • With libomp, we can use Apple Clang to compile XGBoost, thus eliminating hard dependency on a specific version of Homebrew GCC.
  • This method has been verified by other projects, such as LightGBM.

Ps. I'm looking at https://iscinumpy.gitlab.io/post/omp-on-high-sierra/ to understand the use of OpenMP in Apple Clang.

@hcho3

Ps. I'm looking at https://iscinumpy.gitlab.io/post/omp-on-high-sierra/ to understand the use of OpenMP in Apple Clang.

These PRs may help you:
https://github.com/microsoft/LightGBM/pull/1501, https://github.com/microsoft/LightGBM/pull/1923.

@adithyabsk This is one of priorities of mine. I'd like to make a fix before the 1.0.0 release.

@hcho3 Glad to hear it! I'll see if I can tinker around with this issue as well.

@adithyabsk One subtle problem I've run into brew install libomp is that XGBoost would be compiled without OpenMP, because CMakeLists.txt was not configured correctly. (I could tell by running a moderately heavy job on my Macbook; without OpenMP, jobs will take 2-3x as long.) I'm trying to revise CMakeLists.txt to properly enable OpenMP.

@StrikerRUS Thanks for the link. Getting build system working is quite hard, and it helps me a lot to have a point of reference (LightGBM).

@adithyabsk One subtle problem I've run into brew install libomp is that XGBoost would be compiled without OpenMP, because CMakeLists.txt was not configured correctly. (I could tell by running a moderately heavy job on my Macbook; without OpenMP, jobs will take 2-3x as long.) I'm trying to revise CMakeLists.txt to properly enable OpenMP.

Any luck? The reason I ask is that "pip install xgboost -U" fails even after installing libomp via "brew install libomp".

@wel51x We did not yet modify CMakeLists.txt to get the new solution working. For now, you should follow instructions in https://xgboost.readthedocs.io/en/latest/build.html.

@adithyabsk @Craigacp I found https://github.com/matthew-brett/delocate. This may be a useful solution to remove hard-coded library dependencies.

In case somebody finds it helpful... I understand that it is by no means a mainstream approach, but the latest xgboost with OpenMP support can be installed on MacOS using Nix (https://nixos.org/nix/) as trivially as

$ nix-shell -p python3Packages.xgboost

Hey @hcho3, I created a Homebrew formula for XGBoost to help simplify installation on Mac, so users can run brew install xgboost in the future. It works great, but unfortunately won't be accepted using an older version of GCC.

Discussion: https://github.com/Homebrew/homebrew-core/pull/43246

One option is to disable OpenMP, but as you mentioned, it's not great for performance. If you're able to commit the changes to make it work with libomp, I can update the formula and we can push this forward.

Thanks for the updates.

fwiw, I updated the formula so it no longer depends on GCC but lacks support for OpenMP. We can update it once support for libomp is released.

The formula was accepted by Homebrew, so Mac users can now do:

brew install xgboost

I used brew install xgboost but I still cant import XGBoost. There's no __init__.py file or anything inside the actual newly installed XGBoost directory, so I cant use any of the XGBoost functions. Is there another step after using brew to install XGBoost?

@bnicholl See https://github.com/dmlc/xgboost/issues/4949#issuecomment-542333666 for a temporary solution.

@hcho3

Thanks for the link. Getting build system working is quite hard, and it helps me a lot to have a point of reference (LightGBM).

With the incoming CMake 3.16 release (in RC phase now) it should be even easier: there will be no need in passing extra args for >=Mojave users. Refer to https://gitlab.kitware.com/cmake/cmake/merge_requests/3916.

@adithyabsk @Craigacp #5146 should now let you use OpenMP without installing Homebrew GCC. Now XGBoost will only depend on the libomp Homebrew package.

@ankane Consequently, we should be able to submit the next release of XGBoost (1.0) with OpenMP enabled.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wenbo5565 picture wenbo5565  ·  3Comments

FabHan picture FabHan  ·  4Comments

uasthana15 picture uasthana15  ·  4Comments

choushishi picture choushishi  ·  3Comments

hx364 picture hx364  ·  3Comments