Xgboost: XGBoost 0.90 Roadmap

Created on 21 Apr 2019  ·  56Comments  ·  Source: dmlc/xgboost

This thread is to keep track of all the good things that will be included in 0.90 release. It will be updated as the planned release date (~May 1, 2019~ as soon as Spark 2.4.3 is out) approaches.

  • [x] XGBoost will no longer support Python 2.7, since it is reaching its end-of-life soon. This decision was reached in #4379.
  • [x] XGBoost4J-Spark will now require Spark 2.4+, as Spark 2.3 is reaching its end-of-life in a few months (#4377) (https://github.com/dmlc/xgboost/issues/4409)
  • [x] XGBoost4J now supports up to JDK 12 (#4351)
  • [x] Additional optimizations for gpu_hist (#4248, #4283)
  • [x] XGBoost as CMake target; C API example (#4323, #4333)
  • [x] GPU multi-class metrics (#4368)
  • [x] Scikit-learn-like random forest API (#4148)
  • [x] Bugfix: Fix GPU histogram allocation (#4347)
  • [x] [BLOCKING][jvm-packages] fix non-deterministic order within a partition (in the case of an upstream shuffle) on prediction https://github.com/dmlc/xgboost/pull/4388
  • [x] Roadmap: additional optimizations for hist on multi-core Intel CPUs (#4310)
  • [x] Roadmap: hardened Rabit; see RFC #4250
  • [x] Robust handling of missing values in XGBoost4J-Spark https://github.com/dmlc/xgboost/pull/4349
  • [x] External memory with GPU predictor (#4284, #4438)
  • [x] Use feature interaction constraints to narrow split search space (#4341)
  • [x] Re-vamp Continuous Integration pipeline; see RFC #4234
  • [x] Bugfix: AUC, AUCPR metrics should handle weights correctly for learning-to-rank task (#4216)
  • [x] Ignore comments in LIBSVM files (#4430)
  • [x] Bugfix: Fix AUCPR metric for ranking (#4436)
roadmap

Most helpful comment

This is not a question for Databricks but for the Spark project. The default policy is maintenance releases for branches for 18 months: https://spark.apache.org/versioning-policy.html That would put 2.3.x at EOL in about July, so wouldn't expect more 2.3.x releases after that from the OSS project.

All 56 comments

as we are going to have breaking changes like https://github.com/dmlc/xgboost/pull/4349 and https://github.com/dmlc/xgboost/pull/4377

shall we bump version to 0.9?

@CodingCat Sure, we can bump to 0.90, if the breaking change is significant. Can you do me a favor and write one-paragraph description of why #4349 was needed?

sure,

* Spark 2.3 is reaching its end-of-life in a few months

Is there an official statement on that? They released 2.2.3 in January and 2.3.3 in February. Our vendor (MapR) still ships 2.3.1.

@alexvorobiev https://github.com/dmlc/xgboost/issues/4350, you can check with @srowen from databricks

This is not a question for Databricks but for the Spark project. The default policy is maintenance releases for branches for 18 months: https://spark.apache.org/versioning-policy.html That would put 2.3.x at EOL in about July, so wouldn't expect more 2.3.x releases after that from the OSS project.

@srowen Thanks!

@srowen @CodingCat @alexvorobiev Let's also discuss the possibility of supporting Scala 2.12 / 2.13. Right now, XGBoost4J is compiled for Scala 2.11:
https://github.com/dmlc/xgboost/blob/2c61f02add72cce8f6dc1ba87e016e3c5f0b7ea6/jvm-packages/pom.xml#L38-L39

A user reported that XGBoost4J JARs compiled for Scala 2.11 is not binary compatible with Scala 2.12.

Yeah, 2.11 / 2.12 are still binary-incompatible, and Spark has two distributions. Both are supported in 2.4.x though 2.12 is the default from here on in 2.4.x. 3.0 will drop Scala 2.11 support.

It may just be a matter of compiling two versions rather than much or any code change. If you run into any funny errors in 2.12 let me know because I stared at lots of these issues when updating Spark.

2.13 is still not GA and think it will be a smaller change from 2.12->2.13 than 2.11->2.12 (big difference here is totally different representation of lambdas).

@hcho3 I assume you wanted to tag @alexvorobiev?

@alexeygrigorev Oops, sorry about that.

the only issue is that we need to introduce a breaking change to the artifact name of xgboost in maven, xgboost4j-spark => xgboost4j-spark_2.11/xgboost4j-spark_2.12, like spark https://mvnrepository.com/artifact/org.apache.spark/spark-core and we need to double check if we have any transient dependency on 2.11 (I think no)

Hi, @srowen though 2.12 is the default from here on in 2.4.x, I checked branch-2.4 pom.xml, if you don't specify profile scala-2.12, you still get a 2.11 build, no?

You could choose to only support 2.12 in 0.9x, and then you don't have to suffix the artifact name. If you support both, yeah, you'd really want to change the artifact name unfortunately and have _2.11 and _2.12 versions.

Yes the default Spark 2.4.x build will be for 2.11; -Pscala-2.12 gets the 2.12 build.

thanks, I'd stay conservative in supporting 2.12 at least for the coming version

as far as I know, most of Spark users are still using 2.11 since they are used to following previous versions of Spark

I may not have bandwidth to go through every test I have for introducing 2.12 support

I would choose to support 2.12 + 2.11 or 2.12 in 1.0 release...

@hcho3 FYI, I just removed the dense matrix support from the roadmap given the limited bandwidth

@hcho3 Could you take a look at https://github.com/dmlc/dmlc-core/pull/514 when time allows? It might be worth merging before the next release hit.

@trivialfis Will look at it

@CodingCat I think we should push back the release date, as Spark 2.4.1 and 2.4.2 have issues. What do you think?

@srowen Do you know when Spark 2.4.3 would be out?

I think it’s fine to have some slight delay

Okay, let’s wait until Spark 2.4.3 is out

Would there be the last 0.83 release for Spark 2.3.x?

@CodingCat What if we make two parallel releases 0.83 and 0.90, where 0.83 includes all commits just before #4377? The 0.83 version would be only released as JVM packages, and Python and R packages would get 0.90. It won't be any more work for me, since I have to write a release note for 0.90 anyway.

One issue though is the user experience with missing value handling. Maybe forcing everyone to use Spark 2.4.x will prevent them from messing up with missing values (the issue which motivated #4349)

@hcho3 I am a bit concerned on the inconsistency of different versions in the availability of pkgs.

I can imagine questions like hey, I find 0.83 in maven so I upgrade our Spark pkg, but I cannot use 0.83 in notebook when attempting to explore my new model setup with a small amount of data with python pkg?

I would suggest we either have a full maintenance release on 0.8x branch or nothing

@CodingCat Got it. We'll do consistent releases for all packages. What's your take on 0.83 release then? Should we do it?

@CodingCat Actually, this will create work for other maintainers, we'll need to ask them first

short answer from a personal view is yes in theory, but it might be more than cutting right before a commit (as you said, it will create work for others as well) (but I am kind of hesitated to do this because of the limited resources in the community...)

here is my 2 cents about how we should think about maintenance release like 0.8x

  1. the reason to have a maintenance release is to bring in critical bug fixes, like https://github.com/dmlc/xgboost/commit/2d875ec0197d5a83e7d585daf472b8201aa97c51 and https://github.com/dmlc/xgboost/commit/995698b0cb1da75f066d7e0531302a3bfa5a49a4

  2. on the other side, to make the community sustainable other than burning out all committers, we should drop support of previous version periodically

  3. the innovations and improvements should be brought to the users through a feature release (jump from 0.8 to 0.9)

if we decide to go 0.83, we need to collect opinions from @RAMitchell @trivialfis as well and use their judge to see if we have important (more about correctness) bug fixes which are noticed by them

and then create a 0.83 branch based on 0.82 to cherry-pick commits......a lot of work actually

If I understand correctly, 0.9 will not support older versions of spark, hence the proposal to support a 0.83 version as well as 0.9 to continue support for older spark versions while including bug fixes?

Generally I am against anything that uses developer time. Aren't we busy enough already? I do see some value in having a stable version however.

@CodingCat Is there any way to incorporate bug fixes (2d875ec and 995698b) without upgrading to Spark 2.4.x?

If making maintenance releases is more than just cutting branches (e.g. need to cherry-pick), I would rather not make such commitment.

Generally I am against anything that uses developer time. Aren't we busy enough already?

I agree.

@CodingCat Is there any way to incorporate bug fixes (2d875ec and 995698b) without upgrading to Spark 2.4.x?

@hcho3 unfortunately no, due to the breaking changes in the library depended by Spark, we can only compile and run xgboost with the consistent version of spark

if in future, we are interested in maintenance release, the workflow (after releasing 0.9)

  1. backport necessary fix to 0.9-branch

  2. release 0.9x for every, say, 2 months, or triggered by an important bug fix

  3. major features and all fixes backported to 0.9x should be available in master

  4. when release 1.0, cut a branch from master......

but again, once we have a big refactor in master and want to backport fix to 0.9 after that...tons of work

@CodingCat Given the current size of dev community, let's punt on maintenance releases.

@tovbinm Sorry, I don't think we'll be able to do 0.83 release, due to lack of bandwidth. Is upgrading to Spark 2.4.3 feasible to you?

That’s unfortunate. No, not in the short term. We are still on 2.3.x.

What’s the commit that upgraded Spark from 2.3 to 2.4? Perhaps we can cut there (if it’s above 0.82 of course).

@tovbinm You can build XGBoost with commit 711397d6452d596d7acbb68f1052ffebdee3e3af to use Spark 2.3.x.

Great. So why not make a public release from that commit?

As @CodingCat said, maintenance releases are not simply a matter of cutting before a commit. Also, making public releases are implicit promises of support. I do not think maintainers are up for supporting two new releases at this point in time.

I'll defer to @CodingCat as to whether we should make a release from 711397d6452d596d7acbb68f1052ffebdee3e3af

External memory with GPU predictor - this would mean code would not crash with what(): std::bad_alloc: out of memory anymore? (i.e. temporarily swap into RAM?)

related issue I guess https://github.com/dmlc/xgboost/issues/4184 - this was mainly on temporal bursts of memory, the process of fitting itself never require so much memory

@hlbkin You'll need to explicitly enable external memory, according to https://xgboost.readthedocs.io/en/latest/tutorials/external_memory.html

I assume its not possible to switch otherwise without a major version bump (i.e. 1.0), but when you do, could you consider supporting conformant PEP 440 version numbers (i.e. x.y.z), and preferably semantic versioning? The standard interpretation of 0.90 (rather than 0.9.0) would be that it is the 90th minor release of the major version 0.x (i.e. pre-stable-release) series, and is no more significant than 0.83. Furthermore, this restricts you to a maximum of 9 point releases per minor version, and creates difficulties for some tools (and people) to interpret. Thanks!

+1

@CAM-Gerlach We'll consider it when we release 1.0. On the other hand, we don't want to rush to 1.0. We want 1.0 to be a milestone of some sort, in terms of features, stability, and performance.

Thanks for the explanation, @hcho3 .

You probably want to make sure you set the python_requires argument to '>=3.5' in setup() to ensure users with Python 2 don't get upgraded to an incompatible version accidentally.

@hcho3 External memory is not available with GPU algorithms

@hlbkin You are right. External memory will be available only for GPU predictor, not training.

@rongou @sriramch Am I correct that GPU training isn't available with external memory?

@hcho3 yes you are correct. we are working on it. the changes are here if you are interested. i'll have to sync this change with master and write some tests.

@sriramch Awesome! Should we aim to include external memory training in the 0.90 release, or should we come back to it after 0.90?

just my two cents, let's reserve a bit on compacting many new features in 0.x (in a rush manner) and consider what is to be put in 1.0 as a milestone version

@CodingCat I agree. FYI, I deleted distributed customized objective from 0.90 roadmap, since there was substantial disagreement in #4280. We'll consider it again after 0.90.

@sriramch Let's consider external memory training after 0.90 release. Thanks a lot for your hard work.

This might be a good time to release the cuda 9.0 binaries instead of 8.0. I think 9.0 will now be sufficiently supported by users driver version. Additionally the 9.0 binaries will not need to be JIT compiled for the newer Volta architectures.

@hcho3 are we ready to go?

Almost. I think #4438 should be merged.

All good now. I will go ahead and start working on the next release. ETA: May 16, 2019

  • [x] Require Python 3 in setup.py
  • [x] Change CI to build CUDA 9.0 wheels (#4459)
  • [x] Fix Windows compilation (#4463)
  • [x] Set up a minimal viable CI for Windows with GPU (#4463)

@RAMitchell Should we use CUDA 9.0 or 9.2 for wheel releases?

Lets use 9.2 as that is already set up on CI. The danger is that we require Nvidia drivers that are too new. For reference here is the table showing the correspondence between cuda version and drivers: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver

As far as I know this should not impact CPU algorithms in anyway. If users begin to report issues then we can address this in future with better error messages around driver compatibility.

Hmm in that case I can try down-grading one of CI worker to CUDA 9.0. Since we are using Docker containers extensively, it should not be too difficult.

I'm going to prepare 0.90 release now. My goal is to have the release note complete by end of this week.

Closed by #4475

Was this page helpful?
0 / 5 - 0 ratings