As of late our docs take ~14 minutes to build on circle-ci whereas they took about 6 minutes to build in the previous release. The root cause of this slow-down seems to be that woodwork is inferring some categorical variables as text which then causes AutoML to use the TextFeaturizer. However, even if ww fixes the categorical vs text inference, the time to build the docs will inevitably increase as we write more documentation. This makes it hard for developers to iterate on the docs locally.
Possible solutions:
Yep. I changed the default automl stopping criterion to max_batches=1
a couple weeks back also, which didn't help.
I like the solutions you listed! Plus one of my own:
I recommend we go with option 2, but with option 3 in mind.
I noticed that docs have been taking much longer to build. I think this is likely because the automl docs were changed in c871f3b to use the fraud dataset, instead of the breast cancer data set (+ elsewhere?) to showcase infer_problem_types, since the breast cancer dataset only has numeric columns.
I suspect this is a different issue / reason for the even-longer build time of docs, from the previous 20 minutes to now >30 minutes, and could be worth mentioning!
@dsherry FYI
Another possible solution is to use multiple processors to build the docs:
https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-j
Update following discussion with @dsherry.
Adding in the -j
flag to our Makefile
allows the build docs
test on circleci to finish faster, as seen here. Unfortunately, ReadtheDocs doesn't run this command, which means that the actual generation of published documentation still takes a while and often errors out.
This is what a successful build looks like for ReadtheDocs, taking a little over 20 minutes to complete. The differences between the HTML and Latex build times suggests that building the Jupyter notebooks themselves do not take a lot of time, which is good.
However, we're also finding instances where the build fails like this. We noticed that for some reason, ReadtheDocs is running the full sequence of commands twice, which causes the build to take much longer (well over 30 minutes each to create the HTML and latex files), and causes the doc build to fail. I'll follow up with the ReadtheDocs support team to see why this is happening and how we can fix this, and I'll update with those results here when I get feedback.
@bchen1116 contacted support and they said
It looks like the underlying cause of this bug is the number of active versions that you have. I see a few errors in our logs related to this.
To work around this for now, you might reduce the number of active versions that you keep. It looks like you are building versions for individual branches or pull requests, have you tried our pull request building feature? This would help remove the unneeded versions after building, while still keeping the built content.
I believe the "pull request building feature" referenced here is this, confirming.
Update:
We've updated RTD to build from pull requests only, removing the unnecessary builds to different versions (branches) that we push. Additionally, we've deleted all unnecessary (untagged) versions from RTD (miscellaneous branches that we use for PRs), which seems to have helped the doc builds. We don't notice any docs timing out on builds, so we will close this issue tomorrow unless we begin seeing timeouts again.
@bchen1116 is this closeable now?
Closing now, as there's been no issue with slow doc builds.
Most helpful comment
Update following discussion with @dsherry.
Adding in the
-j
flag to ourMakefile
allows thebuild docs
test on circleci to finish faster, as seen here. Unfortunately, ReadtheDocs doesn't run this command, which means that the actual generation of published documentation still takes a while and often errors out.This is what a successful build looks like for ReadtheDocs, taking a little over 20 minutes to complete. The differences between the HTML and Latex build times suggests that building the Jupyter notebooks themselves do not take a lot of time, which is good.
However, we're also finding instances where the build fails like this. We noticed that for some reason, ReadtheDocs is running the full sequence of commands twice, which causes the build to take much longer (well over 30 minutes each to create the HTML and latex files), and causes the doc build to fail. I'll follow up with the ReadtheDocs support team to see why this is happening and how we can fix this, and I'll update with those results here when I get feedback.