Gatsby: question - incremental builds support = part II

Created on 16 Apr 2018  ·  54Comments  ·  Source: gatsbyjs/gatsby

4981

I think @LeKoArts is right. What I mean is if you generate a site with 2000 pages and deploy to aws, then one of those content pages changes in the cms, can you generate just that one page and deploy it.

question or discussion

Most helpful comment

Just wanted to update that my team is close to publishing a PR into the Gatsby repo that we think enables incremental builds. We're just taking some time to write a good PR and tighten up the code, but I will update here when we are done (in the next week or so).

All 54 comments

It's not something Gatsby does at the moment, but it is something people have asked for. There's been work in version 2 to improve performance on larger sites, but there's no release date for that yet.

@m-allanson is there a discussion/issue on how to handle this? I didn't see it in the link you listed. I'm curious to hear conversations on how to handle doing this on a host like Netlify & using a CMS like Wordpress/Drupal that currently require a lot of HTTP requests during build.

AFAIK you wouldn't been able to use incremental builds on netlify because .cache and public directories are not preserved between builds, so it will always do clean build

That's good to know. I'm tossing around a ton of ideas that aren't well thought out. So even if we could eliminate the need for HTTP requests, we still need to make sure the .cache and public directories can be referenced by the build tool which eliminates many of the hosts that lower the bar to entry.

Another use case for incremental building is when you have a very large site that you want to build in parts. I was getting "heap out of memory error" when building ~5k pages at once.

We plan on our site getting very large, so we're testing Gatsby at larger scales. We've tried doing something like this path: './src/pages/${subPath}', where subPath is process.argv[3]. This works nicely when we host parts of our site with gatsby develop. It also circumvents the problems with the memory heap when using gatsby build for a 5k+ page site. For it to really be a solution, it would probably depend on the ability to specify an output subdirectory within the public folder: https://github.com/gatsbyjs/gatsby/pull/4756

what if another approach is used to achieve the same goal. I wanted to run an idea by you guys and see what people think. So lets say you have a 5k page web site. The initial pages would be generated statically but each page will have a sub component that will load on top of the static content with the same content thats read from static json files. This way if a user wanted to update one page in the CMS in the middle of the day, they can make the update and just that static json file would be regenerated and deployed to a CDN. Then you can just regenerate the whole site maybe once a day as a nightly process. The seo static content might not be the most up to date during the day but I dont see that as a big deal. It will just get updated during the nightly process.

@robertschneiderman we've run in to the memory issue as well. We're closer to 1500 pages, but an insane amount of images (design blog). We've turned off source maps, and stopped the build from downloading image files, but ultimately had to edit the build command to increase the memory allocated to the node instance. via the --max_old_space_size flag.

One thing that worries me about this feature is schema building. If we don't have every post available for gatsby to build a schema from, our queries will throw errors. It would be really nice if there was a way to pass schemas to gatsby, or at least provide dummy entities during the build to demonstrate the different shapes they may take.

I am considering using Gatsby to build the UI for a content site with over 5000 items, most with interconnected relationships to each other. The data will come from a database-driven CMS.

The benefit to using Gatsby over a standard API-driven React site is that I would spend a fraction of the time building and maintaining the data API and state management system that loads the remote data and stores it. (Since I plan on deploying this application for multiple sites of similar size, this seems like a very valuable benefit.)

The downside to using Gatsby in this case would be the fact that the entire site would need to be rebuilt for even the most insignificant content update. Forgot to add a comma? Rebuild all 5000 pages! Who even knows how long that would take? This is even more of an issue when considering the experience of the CMS users - they're used to seeing changes appear on the site immediately after they save them. With Gatsby, we're looking at a few minutes' wait (at least) before the change appears.

If there were a way to trigger builds for a subset of pages, it would make Gatsby the clear, definitive choice. At this moment, though, it's a tough sell.

BTW, I've been working a lot on improving speeds for larger site builds for v2. On the latest v2 beta — you might be able to build 5000 pages in < 1:30. There'll be more speed improvements coming.

That's amazing @KyleAMathews! I definitely look forward to that! Let me know if you want to test against an image heavy blog

@KyleAMathews 5K is nice but we need 1M 😉

If we want to compile parts of the site separately, we can set flags on build so that gatsby-node knows only to generate the parts of the site specified. We could then add back in the previously generated static files. This works for us as long as we link to the previously generated files with a basic <a href> as opposed to a <Link to >.

I'm wondering if we can make <Link to> work when linking to previously generated files if we merge in some of the previous data.json at build time. Looking into that a bit more at the moment.

I have no worry with the build time but more with the volume of static files that I need to upload for any update, we launched a large visual portfolio with Gatsby and the static site to upload is over 150 MB
Mostly images.
This makes the site unavailable around 40 minutes during an update
The availability to rebuild a part of the site is definitely a feature that would boost Gatsby.
I plan to use Gatsby for a new site but I will divide the site in a static and dynamic part using a traditional php CMS for the news part.

@rbmedia you might want to consider a host that does deployment switching like Netlify so your current site stays running until your new version is ready.

Thanks Matt, I will consider it!
I did built some News websites with Drupal in the past, any update had to be online within a short lapse of time (less than 2 minutes). I would love to use Gatsby in the future for this kind of sites.

Any news on this issue? We plan a site with around 100k pages and incremental builds would be awesome.

make another path as default static page folder, not '/public'.
After run gatsby build, copy the ../public/* to the default path.

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

I still don't think this is fixed/supported in Gatsby. Any news @ TeamGatsby?

it's a long-standing issue because it's really hard to fix without thinking heavily about it. @Moocar has an issue open to at least get us a step in the right direction.

Does Gatsby currently track which GraphQL nodes are retrieved on a given page? If so, it would seem viable to add incremental rebuilds based on changes to the data. That’s half of the work, no?

The other chunk of work is providing source plugins with a cache and encouraging plugin developers to only fetch changed data where possible. In many instances this is trivial.

@coreyward Yes, Gatsby tracks every node that is returned for a query (via page-dependency-resolver.js). It's what powers gatsby develop's ability to only rerun queries only for changed data. We don't currently save that information to disk so it's not used for gatsby build yet but that's definitely the plan.

I know that for our team this will be the go/no-go decision against using Gatsby for our 2019 rebuild of our flagship site. I'm really hoping it can be released or at least be on the horizon as we start building. We support hundreds of web authors editing various pieces of the site throughout the working day. When they hit save they pretty much expect the content to be updated. It's not uncommon for them to go back just to fix a comma or change the date on the post.

@mattbloomfield we have more customers interested in this so we have this high up on the priority list.

we're implementing gatsby with a drupal 8 backend using gatsby-source-graphql plugin, and performance is not an issue so far, with ~4000 pages built in < 30 seconds. we're pulling all data in gatsby-node as opposed to running thousands of StaticQuerys, and bypassing image processing for now.

```
success run graphql queries — 3.088 s — 4008/4008 1311.56 queries/second
success write out page data — 0.070 s
success write out redirect data — 0.001 s
success Build manifest and related icons — 0.117 s
success onPostBootstrap — 0.127 s

info bootstrap finished - 15.751 s

success Building production JavaScript and CSS bundles — 3.361 s
success Building static HTML for pages — 6.906 s — 4006/4006 609.25 pages/second
info Done building in 26.047 sec

I'm currently evaluating using Gatsby to speed up an old Heroku-hosted Rails 3.x site that's slow as molasses. It has about 1 million pages so incremental builds are the only way it would work. Most pages don't change so making them static feels like a huge win, but new pages are constantly added and some old pages get edited. Users expect to see the changes within seconds. My hope was to add just enough code to the Rails app to make it a JSON API server, and generate a new frontend with Gatsby, with static assets hosted somewhere like Netlify or S3.

I was thinking I would be able to do something like run an incremental Gatsby build via a job queue worker. The Rails API server knows when a page gets updated, so it would create an 'update page job' using the page_id (a key in the postgres DB), and the worker would pass that to Gatsby with an ENV var with something like PAGE_ID=1235 gatsby build. I'd use that ENV var within createPages() to look up just what's needed for that one page and build it. The resulting output file(s) would get transferred to the static host (I'm hoping there's a build hook for that). If no PAGE_ID var is set it would build all pages as usual.

If a page is deleted, the Rails API would create a job that either deletes the assets directly from the static host, or maybe there's something needed from Gatsby so I'd still run that with a different ENV variable. (I'm thinking I'd need the page's path at the minimum).

Am I barking up the wrong tree thinking that Gatsby is compatible with this kind of project? Thanks for any help.

We have an alpha version up. It's not incremental builds yet but at least the path forward.
you can use it by installing npm install --save gatsby@per-page-manifest

More info:
https://github.com/gatsbyjs/gatsby/pull/13004

@mpoisot for now per page building isn't working yet. I'm not sure what the timeframe you're looking at for this project. If the queries are light, gatsby might be an okay fit for your site even without incremental builds.

cc @KyleAMathews @Moocar to give a better explanation of this.

Pinging this, as it's been a few months since last update and it seems to be the place of action. I see that the breaking down of the page-data.json has been in, and I've been using it.

Is there a more concrete set of requirements and tasks driving this forward? I understand that it's a big problem, but it always helps if it's visibly broken down into smaller problems that can show progress and traction.

@wardpeet @Moocar I'm unsure who's the most appropriate person/list to ping on this, but I see you as both being the last actives from the project on here. Any updates as to the primary goal of this ticket?

Having a good convo with @KyleAMathews about incremental builds and how they might get delivered https://twitter.com/dominicfallows/status/1169152367964643328?s=19

Having a good convo with @KyleAMathews about incremental builds and how they might get delivered https://twitter.com/dominicfallows/status/1169152367964643328?s=19

TLDR;

@KyleAMathews confirmed that Gatsby are working on the Gastsby Cloud hosted incremental build features.

Self-hosted/on-premises "Gatsby Enterprise" version, with incremental builds, is possible, but they are not working on it yet....

Dominic Fallows - Sep 4 - Most vendors we choose offer a self-managed/on-premises option, as Gatsby OSS does. We happily pay for those, as we would an on-premises Gatsby Enterprise Cloud solution from you.

Kyle Mathews - Sep 4 - yeah for sure — we have a pretty clear path for supporting onprem versions of what we're doing — it's all Kubernetes so it should be possible — but onprem adds a lot of overhead when we're initially just working on shipping something that works 😅

Dominic Fallows - Sep 4 - Now that is great news to hear! Sorry if I've missed that discussed elsewhere, but that onprem roadmap would be super useful for businesses and developers alike to have sight of.

Kyle Mathews - Sep 4 - It's far enough away right now that I couldn't give a timeline. Definitely not this year and wouldn't want to promise next year either. Depends on how fast we can scale revenue and our engineering team

It's a pity as it blocks using Gatsby as a tool for publishers where we talk about millions of canonical pages and another same or indexing ones.

Wouldn't it make sense to "eject" such use case as separate project using same concepts/core?

Make or break feature for 2020 decisions. Seems to be a good place to invest all that VC money 😀

Gatsby does a lot of things right but long build times make it absolutely unusable in larger projects :/ We discussed moving away from the framework this week just because of that.
Please make faster build happen!

Agree with above! Gatsby either gets niched into a quick and easy blogging solution or implements incremental/faster builds and becomes enterprise ready.

Absolutely correct; bumping against this over and over on larger projects. Without incremental builds Gatsby is not an option.

Incremental builds on Gatsby Cloud fixes these issues. You can signup for the private beta here https://www.gatsbyjs.com/builds-beta/

Nothing about that seems to suggest it supports incremental builds though - just that it has the "fastest build times for Gatsby sites".

I'd be concerned about the implication that incremental builds would only be available on a hosted Gatsby service rather than available to be used standalone.

I see what you mean @dwightwatson there's nothing on the website that says it's "incremental." At Gatsby Days London they demoed builds and it was definitely incremental builds. Not sure how it's done though and if it will be apart of the Gatsby package or if it's just going to be a service they provide.

Investors gotta make their money back somehow. 🙄

trying to build very large website 140k+ pages
image

gatsby build is somewhat good… but doing the deployment its painful (zeit.co)

Unsure how to add a label to this, but I'm putting this down as still an issue.

@gomflo is there a way I can build your site? There might be some low hanging fruit to resolve to improve build times :) No promises.

Nothing about that seems to suggest it supports incremental builds though - just that it has the "fastest build times for Gatsby sites".

I'd be concerned about the implication that incremental builds would only be available on a hosted Gatsby service rather than available to be used standalone.

re this ^: If my gatsby repo is in gitlab and not github, will I be able to use gatsby cloud/build features?

I might have mentioned that before, but regards original issue/feature. For publishers, Gatsby will make sense if we could trigger the only generation of new pages plus maybe update to indexes. Hardly any publisher would care about updating old canonical pages.

So will we have a standalone partial update or no chances? Maybe there is some another way to update only a few pages and don't rebuild the whole project?

Just wanted to update that my team is close to publishing a PR into the Gatsby repo that we think enables incremental builds. We're just taking some time to write a good PR and tighten up the code, but I will update here when we are done (in the next week or so).

Just wanted to update that my team is close to publishing a PR into the Gatsby repo that we think enables incremental builds. We're just taking some time to write a good PR and tighten up the code, but I will update here when we are done (in the next week or so).

Here's the PR https://github.com/gatsbyjs/gatsby/pull/20785

New PR, focusing in on incremental data changes https://github.com/gatsbyjs/gatsby/pull/21523

With #21523 merged in and Incremental Builds available in Gatsby Cloud, I believe this issue is resolved. It doesn't support all workflows, but I am going to close this for now and it may be better to open a new issue in the future for future endeavors if need be.

Should it really be closed? The optimization was just that - an optimization. It wasn't truly incremental builds. On top of that, whatever is available through Gatsby Cloud is not available through usage of the public package. For the specific intent of this ticket, nothing has been resolved.

Should it really be closed?

Based on https://github.com/gatsbyjs/gatsby/issues/5496#issuecomment-641005662, I don't think this issue should stay closed, and I don't understand why the not stale label was removed.

Has anyone here tried, or know if it is possible, to tweak GatsbyJS webpack configuration to simultaneously produce both development preview and production build version with ”gatsby develop”? (Possibly resulting ”incremental builds” with cost of forever running development server.)

Was this page helpful?
0 / 5 - 0 ratings