Kibana: [DISCUSS] Kibana index version migrations

Created on 21 Nov 2017  ·  74Comments  ·  Source: elastic/kibana

Proposal

Introduce a testable migration process that allows developers to incrementally add complex migration steps throughout the development of several minor releases.

Goals

Testable
The process should be easily testable so that at any point a failure to account for a required migration step will be captured. Ex: I almost submitted a PR that removed a field from the kibana mapping. Without a manual migration step, that data would have been lost. This would have gone unnoticed until late in the 7.0 index upgrade testing (if at all).

Incrementally add migration steps
We don't want to push the entire migration burden to the last minute. We should be able to incrementally add migration steps and tests to catch issues early and prevent major releases from being pushed back due to last minute bugs being found.

Flexibility
Right now we assume the entire re-indexing will fit into a painless script, but this won't hold us over for long. As a specific example, I'd like to migrate some data stored as JSON in a text field. Manipulating JSON from painless is not possible currently. I'd bet even more complicated scenarios are right around the corner. Our approach should be flexible to easily accommodate migration steps that won't fit into painless.

Make it easy for developers
Our approach should be un-intimidating so all developers on the team can easily add their own migration steps without requiring too much specific knowledge of the migration code. Making this a simple process will encourage us to fix issues that rely on .kibana index changes which can help clean up our code. There have been outstanding issues for many months that don't get addressed because they require mapping changes. A very small change (the same PR mentioned above) that incidentally removes the need for a field on the kibana mapping, and a pretty straightforward (albiet still not possible in painless) conversion, should be easy to throw into the migration.

Single source of conversion truth
There are potentially different areas of the code that need to know how to handle a bwc breaking change. For example, I'd like to introduce a change in a minor release which removes the need for a field on the .kibana index. In order to support backward compatibility, I need to do three things:

  • Make sure objects exported prior to the change can be imported and look as expected.
  • Make sure objects stored in an index that hasn't been migrated can be opened and look as expected
  • Make sure objects in a migrated index can be opened and look as expected.

I'd have to think more about how/whether all three spots could take advantage of the same code.

Pluggable?
We might want to make this pluggable, so saved object types we are unaware of can register their own migration steps.

Questions

  • Should this process focus only on migrating saved objects, the kibana index as a whole, or cover both?

Implementation details

TODO: have not thought this far ahead yet.

cc @epixa @archanid @tsullivan @tylersmalley

Core discuss enhancement

All 74 comments

Had a short meeting with @tylersmalley and @chrisronline where some of this came up.

Proposal:
One of the recurring themes of the 5.x to 6.x migration is we don't have absolute control of the kibana index, but expect it to behave in a certain way. We need to be more aggressive and lock down what we can - the kibana server. We know exactly what types are expected, and if we don't enforce it there's a huge amount of states that the index can get into that can cause errors.

  • On startup, diff expected mappings vs what's saved in elasticsearch. If anything is unexpected we fatal.
  • Include a CLI that will more or less do what the upgrade assistant did. Push new mappings, reindex. Prompt user to update kibana.yml. I'm not a fan of deleting the old index automatically, it adds an unnecessary burden in case a rollback needs to happen. It also adds more steps in the form of backup warnings.

Thoughts on transformations:

  • Repurposing fields adds complexity, so I propose registering transforms from field A to field B and running as part of the CLI tool. Applications get to avoid legacy format checks.

This is a quick writeup, I want to think more about it still but thought I'd see if there's any feedback.

I did a little investigation/thinking about this concept and have some very basic code in a working state. We could use this as a platform for more discussion about how to do this.

https://github.com/elastic/kibana/compare/master...chrisronline:enhancement/fix_kibana_index

I like @jbudz idea of detecting invalid mappings and giving a clear error.

I've used a number of migration tools (in Ruby, Clojure, .NET, and JavaScript). And a long time ago, I wrote a SQL migration tool for .NET because I didn't like EF. I'd be happy to contribute to this effort.

My thought is that we'd do something similar to Rails db migrations:

  • Each migration has an ID (in Rails it's just a timestamp, I think, e.g. 20180306161135)
  • When a migration is run, we store it in some index, so that we don't re-run the same migration
  • You can migrate up, which runs migrations sequentially, in order
  • (Maybe) you can migrate back down to a certain timestamp
  • A migration is nothing more than a pure function that takes the old state and returns the new state
  • Ideally, they're written in JavaScript/TypeScript, though we should support painless for more optimal migrations
// migrations/20180306161135-advanced-bar-chart-color-support.js or somewhere well-defined
export const key = "bar-chart"; // The type of data being migrated

// This gets called for each instance of "bar-chart" data that's in the Kibana index
// if that would be too slow (dunno how big our index is), we can enforce that migrations
// are all done via painless. But I'd personal rather start with a full-fledged language like JS,
// and use painless only as an optimization technique. This is also nicely testable.
export function up(oldState) {
  return {
    ...dissoc(oldState, ["defaultColor"]),
    style: {
      background: oldState.defaultColor,
      borderColor: oldState.defaultColor,
    },
  };
}

// We may not need to bother with this. Most places I've worked, we only really wrote
// up migrations, and if they caused an issue, we wrote further up migrations to address those.
export function down(newState) {
  return {
    ...dissoc(newState, ["style"]),
    defaultColor: newState.style.background,
  };
}

Hm. Thinking about it a bit more...

  • Migrations probably need to be more complex than simple pure functions, due to two scenarios:

    • We need to take two (or more) separate record types and merge them into one

    • We need to take one record type and split it into two or more

An example of one of these scenarios is the way the Kibana dashboard stores its plugin data. Right now, there are two records for every widget you see on the dashboard. There's the wrapper/container's record and there's the wrapped visualization's record. We are considering unifying these two records.

A pseudocode migration for this might look something like this:

// migrations/20180306161135-combine-panel-and-child-records.js
// conn is an interface to the Kibana elastic store
export async function up(conn) {
  return conn.each({type: "dashboard-panel"}, async (panel) => {
    const childState = await conn.find({id: panel.childId});
    await conn.upsert({
      ...panel,
      childState,
    });
    await conn.delete(childState);
  });
}

  • The migration system should be transactional, ideally, so we don't end up w/ partially applied migrations

To clarify a bit for @chrisdavies - dashboard will need the ability to parse fields, then separate and/or combine into more or less fields, on the document. Essentially this issue: https://github.com/elastic/kibana/issues/14754

I don't think right now we have any instance of needing to combine documents, though it's an interesting thought. If we did ever have that need, how could it be worked into a migration process? I we ever flattened the dashboard data structure (put all visualizations into a single dashboard object to ease security issues with nested objects) we might need it! But I'm pretty sure we won't ever do that and we are just going to address security issues assuming our nested object environment.

About the transactional part - we'll need to handle this manually ourselves, since ES doesn't support transactions. Copy the old kibana index migrate to a new one. Maybe even leave the old one around just in case anything goes wrong, rather than deleting right away upon success. We've had bugs in the migration process and retaining that old index is pretty important to avoid data loss in those unexpected cases.

I like the way you're thinking about this @chrisdavies.

One part I have been thinking about is how we handle this from a users point-of-view. If we detect on start-up that a migration is required, probably due to an upgrade, we should force the user to a page explaining the migration. This will give the user an opportunity to backup/snapshot their Kibana index. The page would provide a button which would perform the migration, re-indexing into a new index and update the alias. Currently, we don't use aliases for the Kibana index, so this would be new but would allow for users to revert back (last resort). The Cloud team would need this migration to be an API call they need to avoid presenting this to the user when they upgrade the cluster.

Having this UI would help prevent issues where someone stands up a newer version of Kibana and points it to an existing Kibana index. This way, the user knows they will be performing a migration and would need to have the permissions to do so. If we automatically did this migration, the existing Kibana index would no longer be compatible with the newer schema.

I am not sure that we need a down function considering they would have a way to revert the entire index. The migration could first create a new index with the current mappings of all the plugins. You can see how we do this in the mappings mixin here which is used to create the index template. In the migration script, we could use the scroll API to iterate over the objects to transform them then bulk inserting them. I believe the free-form nature would allow us to combine documents if needed down the road. If they are simple transformations, we could utilize the _reindex API and do it with Painless like we did for the 6.0 upgrade.

I am thinking that there would only be a single migration possible, per minor release. This should greatly simplify things and the new index would be the kibana.index setting with the minor version appended (ex: .kibana-6.3)

@tylersmalley Makes total sense.

I've only just been introduced to painless (and Elastic, for that matter), but it makes sense that we'd have to support it.

The only thing I'm not sure about is the single migration per minor release, though. That seems a little limiting, unless you mean a single migration per type + minor release? In which case, I think we could safely say that.

After chatting with a few folks, here's the game plan so far:

Migration functions

  • Migrations will be vanilla JS
  • Each migration will consist of two functions:

    • A filter that answers the question: "Does this migration apply to this document?"

    • A transform that takes an old document shape and transforms it to the new shape

  • Ideally, both the filter and transform functions are pure, but this is not enforced

Checking if migration is required

  • We will store a checksum of all migrations that have been applied to an index
  • This will be a checksum of the migration file names (not their content)
  • There will be an API endpoint to check if the Kibana index is out of date
  • When Kibana UI starts up, the browser will hit that endpoint, and if it is out of date, it will show the user a screen indicating that a Kibana index migration is required
  • Users w/ Kibana admin access (or something) will have the ability in the UI to kick off the migration

Migrating

  • There will be an API endpoint that will run the Kibana migration
  • In addition to the checksum, we will also store the id of the latest migration that was applied
  • When running migrations, we will apply any migrations later than this id
  • Migrations are "up" only-- downs are performed by manually changing back to the old Kibana index
  • Migration will consist of these steps:

    • Put the current Kibana index into read-only mode

    • If .kibana is an index, and not an alias:

    • Reindex .kibana to something like .kibana-original, retaining .kibana's cluster settings / metadata

    • Make .kibana-original read-only

    • Create the .kibana alias, pointing to .kibana-original

    • Create a new index, named after the point relase (e.g. .kibana-7.0.1), that keeps the current index's metadata/cluster settings

    • Use the scroll API to move all documents from the old index to the new

    • Pass each document through the stream of migrations

    • Bulk insert the transformed documents into the new index (at some configured rate, 100 or so, maybe)

    • When all docs are migrated, flip the .kibana alias to point to the new index

Notes:

Why aren't we using reindex + painless?

  • The largest Kibana index Tyler is aware of is ~10K docs, so not too big
  • Writing migrations in JS/TS allows us to test/debug more easily
  • Allows us to easily stream multiple migrations together

Why aren't we storing a list of applied migrations, and then simply applying any migrations not in that list?

  • Migrations should be consistently and predictably applied
  • For any given point-release, all of the migrations in that release should have been tested in-order
  • If a migration doesn't make it into the next point release, it should be renamed to have a date that is later than the current point release's release date
  • We thought this should be explicitly handled by migration authors, rather than risking running migrations out of sequence

Why don't we delete the old index?

  • We want to keep at least v-1 indexes around in case the user wants to rollback the migration
  • We don't want to complicate the migration system with this (yet), though we may in the future

Why a checksum instead of just comparing the latest migration id in the index with the latest migration id in Kibana source?

  • During development of a release, it's likely that migrations will be merged into master in unpredictable orders, and a checksum allows us to safely detect this and prompt Kibana devs to take the appropriate action...
  • In the unlikely event that out-of-sequence migrations make it to production, this will alert us right away

Will plugin authors be able to hook into the migration system? Or do plugin authors need to maintain their own migrations?

I have so many thoughts about this but not enough time today to weigh in. For now, I'll just weigh in on what @trevan mentioned and say that it is _critical_ that migrations be pluggable from day one.

Yes. Plugin authors will be able to add their own migrations. I'm not sure of the exact details yet, but my current thought is that it's similar to the way web-frameworks typically do it.

A migration would be something like ./migrations/20180308144121-hello-world.js in the Kibana code-base. Plugins such as x-pack or 3rd party plugins would have their own ./migrations folder with migration files in there. This is assuming that plugins are run with full-trust. (Is that a valid assumption?) So, x-pack might have migrations like ./kibana-extra/x-pack-kibana/migrations/20180308144121-foo.js

If we took that approach, the question is how such migrations should be run:

  • Option 1: All merged into one sequential migration stream?
  • Option 2: Core Kibana migrated first, and plugin migrations run sequentially per plugin?

I'm leaning towards option 2, as it's much simpler, but it can be argued either way.

If we go with option 2-- running each folder of migrations independently-- we can get away with not having to do a migration diff in order to determine what migration(s) need to be run.

Plugins are expected to be compatible with whatever version of Kibana they are hosted in. So, plugin migrations might reasonably be expected to assume that core Kibana docs are already in the current version's format. Plus, I suspect that plugins will only be migrating their own documents and not touching core Kibana documents. (Is that true?)

Anyway, if we go with option 2, there will be a checksum and latest migration id per plugin, and the check for "Do we need to run a migration?" would be comparison of the checksums.

Edit: Another question that arises when you think about plugins is: how do we want to handle the scenario where all migrations succeed, except for some migrations for a certain plugin?

My thought is, we'd probably want Kibana to still run, but just have the failed plugin(s) disabled until their indexes are properly migrated. Thoughts?

@chrisdavies, if a visualization is considered "core Kibana documents" then visualization plugins will definitely migrate core Kibana documents.

You need to make sure that a plugin can migrate without Kibana changing its version. Say you have plugin A with version 1.0 that is compatible with Kibana 7.0. Plugin A has an update to version 1.1 which is still compatible with Kibana 7.0 but it requires a migration. Updating plugin A should trigger a migration even though Kibana isn't being upgraded.

@trevan Right. It won't be tied to a Kibana version for exactly that reason. If we detect any unrun migrations, we'll require migrations to be run before Kibana can be considered properly ready to run. So, if someone drops a new migration into a plugin, Kibana will no longer be considered to be initialized.

That's the current thought. The idea being that plugins are as important as core. For example, if someone has a critical security plugin, and that plugin has not been properly migrated, Kibana should not assume it's OK to run.

I have a couple of questions for anyone interested in this:

Question 1: What strategy should be used to run migrations?

  • Take all migrations from all plugins and core Kibana, sort them by timestamp, and run them in order
  • Group migrations by plugin / core, and run them in order within that grouping (e.g. run core first, then run plugin1, then plugin2, etc)

I'm leaning towards the last option, as it makes migrations more predictable for migration authors.

Question 2: What should we do about this scenario:

  • PluginA has migrations 001-foo.js, 005-bar.js
  • The ops team runs those migrations
  • Later, migration 002-baz.js is dropped into that plugin's folder.
  • This migration was added out-of sequence, e.g. after a later migration had already been applied.

Should this be considered an error needing manual correction? Should we just run it, and assume the best (I vote no, but am open to discussion)? Should we require migrations to have both an "up" and "down" transformation, in which case, we could rollback, 005-bar.js, then run 002-baz.js, 005-bar.js1

One thing I'm wondering, though is what strategy should be used to run migrations?

This will create a minor delay, but if we do this all through the new platform instead of relying on the old, then we have a topological sorting of plugin dependencies, which means we can guarantee that migrations are always run in the proper order based on the plugin dependency graph. I don't think either of the options you proposed are reliable enough to be used for something this important.

Should this be considered an error needing manual correction? Should we just run it, and assume the best?

We should never even attempt to apply migrations for a system if something seems out of whack with it. If we identified any issue like this, my expectation is that this would error with an appropriate message and Kibana would fail to start. Overwhelming validation is the story of the day here.

Should we require migrations to have both an "up" and "down" transformation

Up/down migrations aren't practical in this world because we can't enforce lossless migrations, so we may not have all the necessary data to revert back to an older state. This is OK in a traditional migration system because the idea is that an intentional revert in a production database is a nuclear option, so consequences are expected. We're talking about these migrations running all the time, so losing data is unacceptable. To us, rolling back a migration means rolling back Kibana to an older index.

What should we do about this scenario: ...

Let's not rely on file names at all for this. Instead, let's require that plugins define an explicit order of their various migrations directly in their plugin code. Sort of like an entry file to migrations for that plugin. This makes it easy to create unit tests to verify migration orders and stuff without having to fall back to testing the file system. Aside from plugin loading itself, we should really avoid loading arbitrary files from the file system where possible.

This has the added benefit of making migrations just another formal extension point that can be documented alongside any other future plugin contract documentation rather than a special case.

Good points, everyone. Re @epixa, agreed. Further notes based on your comments:

Migration order

We'll rely on the new platform. It's not too much risk, as the new platform is scheduled for merge fairly soon, and migrations are a good few months off, even by optimistic estimates. This solves the migration-order problem.

File system

I think you're right about this. We won't do a file-system-based approach, but here are some points in favor of a file-based migration system, which our system needs to address:

A file-system based migration system provides:

  • Natural unique ids (filename) for each migration
  • Familiarity (all web frameworks I know of use a file-system based migration system)
  • Simple systematic generation, migration authoring, as there is "one" way to structure things that we can support and document

You can fairly easily unit test your migrations if done with a file-system. But a big downside to the file-system approach is that it would disallow transpilation, since it's loaded/run dynamically in production. I think that is a deal-breaker.

We can expose migrations programmatically (e.g. as part of a plugin's interface). We'll use something similar for core Kibana. My thought is that it would be essentially an ordered array of migrations, each of which is a hash/object with three keys/props:

  • id: any - A serializable, unique identifier of this migration (was the filename in the previous model)
  • filter(doc: any) => boolean - A function, ideally pure, that indicates whether this migration applies to the specified doc
  • transform(doc: any) => any - A function, ideally pure, that takes a document, and returns the new shape of that document

This means we need to detect duplicate IDs and error on that, as we can't rely on the filesystem to do this for us. We should also highly encourage a best-practices convention for authoring migrations. I think we do this via a yarn command: yarn migration:new {name} [{plugin-name}] which will create a new migration file, following a well-defined convention. The migration author then fills out the file, possibly transforming it to their preferred language, and imports it/exposes it in their plugin as mentioned above. This gives us (almost) the best of both worlds. And greatly reduces the odds that a migration makes it to production out of order.

Error cases

We'll error if:

  • We detect an out-of-sequence change to any set of migrations
  • We detect duplicate ids within a set of migrations

How do we want to handle the scenario where a plugin is added to an existing Kibana installation?

  • Should we run the new plugin's migrations?
  • Should we assume the plugin's migrations don't need to run, as by definition, the plugin hasn't yet created any data?

The latter is preferable for many reasons, but might not be realistic.

Can we dictate that plugin migrations should only apply to docs created by those plugins?

If not, we need to run migrations as soon as a plugin is added, and each migration needs to be intelligent about detecting the shape of the data it is transforming, as that data may be in any number of shapes depending on the version of Kibana / other plugins in the system at the time the new plugin was added.

How do we want to handle the scenario where a plugin is added to an existing Kibana installation?

I think this one is at least partly dependent on how tightly coupled this migration stuff is to the initial mapping creation stuff. Migrations will be able to change the values of stored data, but they can also modify the mappings for their objects, which is something that applies to all future objects as well. If we treat all of this as the same problem, then the initial mappings are essentially just migration zero, and in order to have the mappings properly set up for today, you must run all of the available migrations.

If you can figure out a nice way to handle this that doesn't require running migrations unnecessarily, that would certainly be ideal.

Can we dictate that plugin migrations should only apply to docs created by those plugins?

I think yes, we should lock this down and not allow any plugin to directly mutate the objects of another plugin. Plugins shouldn't have to worry about their own objects changing without their control. If they want to make the structure of their own objects pluggable, then they can expose a custom way for other plugins to modify them.

Can we dictate that plugin migrations should only apply to docs created by those plugins?

I think yes, we should lock this down and not allow any plugin to directly mutate the objects of another plugin. Plugins shouldn't have to worry about their own objects changing without their control. If they want to make the structure of their own objects pluggable, then they can expose a custom way for other plugins to modify them.

One example of objects that might need to be directly mutated by another plugin are visualizations. A plugin can add its own visualization as well as its own agg type. For the first case, the plugin would be migrating a visualization object that is completely owned by itself. For the second case, a visualization from a completely different plugin might have the agg type from the first plugin that needs to be migrated. We do that where we've added a custom agg type (country code) that is used by the region map visualization.

We do that where we've added a custom agg type (country code) that is used by the region map visualization.

It seems that in this scenario, plugin2 is simply a consumer of plugin1's data. It doesn't seem as if plugin2 should be mutating plugin1's data, right? I suspect nothing but dragons lie down that road.

@epixa

I think this one is at least partly dependent on how tightly coupled this migration stuff is to the initial mapping creation stuff.

My initial thought was that seeding / initializing data should be considered separately from migrations, and I'd only focus on migrations here. But, I don't think that's possible, due to this scenario:

pluginA is being upgraded to 2.0, and 2.0 needs a brand new document where it stores certain settings, so it needs to seed that. And that new seed data shouldn't pass through any previous migrations, but it should pass through future migrations. In other words, it's possible (probable?) that a system will require a combination of seeding and migrating over time. And these must necessarily be consistently ordered or else we'll have an unpredictable outcome.

So, yeah. I think it would be beneficial to have this one system handle both seeding of new data and migration of existing data.

If you can figure out a nice way to handle this that doesn't require running migrations unnecessarily, that would certainly be ideal.

I think we might be able to initialize a plugin without requiring a full migration of the Kibana index. Essentially, if we have a brand new plugin and/or brand new system, we may be able to have new/seed documents pass through the migration pipeline and directly into an existing index.

I don't love modifying an existing index, but in this case, it should be relatively harmless, as the old pre-plugin system should be unaffected by any docs created by the new plugin.

It seems that in this scenario, plugin2 is simply a consumer of plugin1's data. It doesn't seem as if plugin2 should be mutating plugin1's data, right? I suspect nothing but dragons lie down that road.

I'm not sure if we are saying the same thing or different things. As an example, the plugin A has a visualization "region_map". It owns that visualization and so it should probably handle all of the migration. But plugin B adds a custom agg type which is available in the UI as a possible option in "region_map". If the user creates a region_map visualization using the custom agg type from plugin B, then the data stored in elasticsearch will be an entry that is owned by plugin A with a sub part of it (the agg type configuration) owned by plugin B. Plugin B might need to edit the visualization object that plugin A owns to migrate the agg type configuration.

@trevan Ah. Thanks for the clarification. That is really complicated.

In that scenario, do we have a consistent, systematic way for plugin B to detect that plugin A's document has data that it owns?

In that scenario, do we have a consistent, systematic way for plugin B to detect that plugin A's document has data that it owns?

I know for this particular situation, plugin B can load all of the visualizations and check if each one has its agg type. I'm not sure there are many existing situations like this but I doubt there is a "consistent, systematic way". I believe you could make it consistent and systematic, though. It could be something along the lines of what @epixa said. Plugin A would expose a mechanism for plugin B to migrate the agg type data that it owns.

I just wanted to make sure that this was taken into account as it is designed.

If we make the migration system reusable, then we can pass it to each plugin to allow it to manage its own pluggable data. Sort of like a nested set of loops. The main migration system kicks off a "loop through all objects and defer to each of the "type" owners for migration behavior", then inside the migration for each object of type, those plugins can choose to iterate further.

The visualizations plugin invokes a migration function for each object of type "visualization". In that migration function, when it detects an agg_type that it doesn't recognize, it loops through all agg type registrations from other plugins and invokes the custom migration code only on the agg_type data that was provided by the third party plugin.

A simplified completely non-functional example of this flow (do not take as a suggested implementation):

// in my plugin init
visualizations.registerAggType({
  type: 'my_agg_type',
  migrate(aggType) {
    // do stuff with aggType
  }
});

// in visualizations plugin
migrate(obj) {
  const aggType = registeredAggTypes.find(aggtype => aggtype.ownsAggType(obj));
  aggType.migrate(obj.agg_type);
}

// in global migration system
objects.forEach(obj => {
  const plugin = plugins.find(plugin => plugin.ownsType(obj));
  plugin.migrate(obj);
});

@trevan Thinking about this a bit more, I'm not sure that this is a scenario that a migration system would need to directly address. Here's why:

In this scenario, if PluginB has created a breaking change to its public interface (e.g. in our example, it changes its aggregation data shape in some breaking way), it is up to consumers of PluginB to update their own code to conform to the new PluginB interface.

So, in the scenario we mentioned, if someone is upgrading their system to have PluginB 2.0, they'd also need to update any consumers of PluginB (e.g. PluginA) to work with the new version.

Obviously, in an ideal world, plugins should try to never make breaking changes to their public API, though this is not always possible.

I've started working on this, and was planning on allowing for a "seed" migration type. But I now realize @epixa was referring to Elastic mappings in his previous comment.

Do you think we also need to support seeding data, or can we treat migrations as either an Elastic mapping update or a pure transform of v1 state -> v2 state?

I think we should have a seed type as well. We already hack this together in Kibana through a non-standard process with our advanced settings, which get added as a config document. It probably won't be as common as the other types.

Just wanted to update this thread with the latest info: I've got a working prototype, and am hoping to get it into PR-ready shape this week.

Some scenarios still need to be worked out before merge:

  • What about importing of stored objects / dashboards / etc? These might be old files, containing old document-versions, right? If so, imported data might also need to be sent through a migration process prior to saving it in the index.

  • Kibana-index specific: If we're dealing with a brand-new Kibana installation, there will be no Kibana index, but we may want to run migrations, anyway, if there are any seed documents. In this scenario, is it OK to have migrations run as part of the Kibana bootup process (especially useful when running in dev-mode)?

What about importing of stored objects / dashboards / etc? These might be old files, containing old document-versions, right? If so, imported data might also need to be sent through a migration process prior to saving it in the index.

If we could extend the migration process to these documents, that would be awesome. Importing from older versions is an important workflow that we want to continue to support.

Kibana-index specific: If we're dealing with a brand-new Kibana installation, there will be no Kibana index, but we may want to run migrations, anyway, if there are any seed documents. In this scenario, is it OK to have migrations run as part of the Kibana bootup process (especially useful when running in dev-mode)?

When were you planning to run the migrations otherwise? I figured they would always run on startup.

Regarding imports, I agree. I'll talk to the management folks and update this thread w/ the results of that discussion.

W/ regard to running migrations, I thought we had agreed to not run them automatically, but I can't seem to find that discussion / agreement anywhere. I would definitely prefer to automatically run them at startup. Is there a good reason not to run them automatically? Would this adversely affect other areas (such as cloud, maybe)?

@chrisdavies The advantages of running them automatically on startup are so numerous, I think we should go down that route and only not do it if we have to rule it out for some currently unknown reason.

One consideration though is that automatic migrations must result in a new kibana index. If we do something royally dumb that breaks Kibana post-migration, users have to be able to quickly downgrade their Kibana install to get Kibana running again.

This seems right to me.

Migrations always create a new index, so we should be fairly safe to auto-migrate, I think.

Talked to @chrisronline about the data import scenario. We think the migration system can fairly easily handle the happy path, but there is one hairy edge-case that needs to be worked out:

  • PluginA has migrations [1, 2, 3]
  • The index currently has all 3 of those applied, and the current application code assumes all relevant data conforms to PluginA v3 specification
  • Now, a user tries to import data that came from PluginA v1
  • The import fails, due, say, to a bug in migration PluginA v2
  • The user, in frustration, disables PluginA altogether, and re-runs the import
  • Now, we have PluginA v1 data and PluginA v3 data in the same index-- the index is in an inconsistent state

We can detect this scenario, as we'll know that the index has was migrated to PluginA v3 at one point, and that PluginA is now disabled. But the question is, what should we do in this case?

Thinking about this, we could take a different approach to migrations:

  • We'd still allow any number of seed migrations, but only one transform migration per plugin
  • Transform migrations are a function: (doc_v_1-N) => doc_v_N
  • Transforms are responsible for determining: "Is this a doc I understand?" if not, it should return the doc unchanged
  • Transforms should understand any version of applicable docs, and be capable of converting them to the latest version
  • The fastest path through a transform function should be

    • A doc that does not apply or a doc is the latest version

    • In these cases, the doc is returned unchanged

  • We can run migrations on docs on load/save, if we so choose with minimal perf impact, since

    • There will be relatively few transform functions (at most one per plugin)

    • They will be fast for up-to-date docs, which is the most common case

  • This simplifies the migration system, but increases the complexity burden that is placed on migration authors
  • This means an index with possibly mixed versions can still be successfully migrated

With this strategy, the previous problem scenario becomes this:

  • User tries import, fails due to pluginA, disables pluignA, re-runs the import
  • The index now has docs of type 'plugin-a' which are of different versions
  • If pluginA is re-enabled,

    • All docs will get migrated to the latest version

    • The index is consistent again

  • If pluginA is never re-enabled,

    • The migration artifacts it left behind are presumably vestigial

    • If they aren't, that's simply a risk of yanking a plugin out of your system

Changes to the saved object client

Something that @chrisronline @kobelb might want to weigh in on:

When it comes to data-imports, we have a number of choices, two of which I list here:

1. Run transforms on all docs before saving them

This is basically a validation step, which is generally advisable at an API boundary, anyway.

2. Have an optional checksum, which if passed, will transform out-of-date docs before saving them

This adds complexitiy to both the saved object client API and to the import/export feature, but is more performant than performing a transform before all saves.

One last point on the enable / disable plugin scenario: Right now, if you disable or enable a plugin which has migrations defined, the index will be migrated, since the system's migration status won't match the index's migration status... Is this behavior OK? It seems sub-optimal, to say the least, so I can spend time thinking of a workaround, if we think it's worthwhile.

@chrisdavies one of the reasons for having migrations is to change mappings types to change how searching works (ex: changing keyword to text). If we only were only able to transform documents on save, and an index could have mixed versions, how would we accomplish this? How would we then manage the mapping if we had mixed document versions?

@epixa @archanid, I discussed this w/ @tylersmalley and @kobelb, and we arrived at a possible solution:

  • We want to keep the original, simple, granular migration system (keeps migrations themselves really simple and testable, etc)
  • We want to avoid data loss when doing import / export and enabling / disabling plugins
  • It's OK for an index to have inconsistent versions of documents, as long as the inconsistent parts are vestigial, that is, associated with a disabled plugin and not used by the current system
  • In order to properly handle this scenario, we'll lock down what properties a plugin owns on a document

    • Plugins will only be allowed to migrate properties that they own

In the following example, two plugins own different parts of this document:

  • A dashboard plugin owns the dashboard property
  • A fanci-tags plugin owns the tags property
  • We store the migration state of the document, which indicates that the first 2 dashboard plugin migrations are applied and the first 3 fanci-tags plugin migrations are applied
// A hypothetical document
{
  _source: {
    type: 'dashboard',
    migrationState: {
      plugins: [{
        id: 'dashboard',
        migrations: 2,
      }, {
        id: 'fanci-tags',
        migrations: 3,
      }],
    },
    dashboard: {
      name: 'whatevz',
    },
    tags: ['a', 'b'],
  },
}

If this document is imported into a system that has a version of fanci-tags that has 5 migrations, the last 2 tag migrations will be applied to it prior to persisting.

If it is imported into a system that has the fanci-tags plugin disabled, it will store its data in such a way that it can be recovered if the plugin is ever re-enabled, possibly something like this:

// A hypothetical document
{
  _source: {
    type: 'dashboard',
    migrationState: {
      plugins: [{
        id: 'dashboard',
        migrations: 2,
      }, {
        id: 'fanci-tags',
        migrations: 3,
        state: "[\"a\",\"b\"]",
      }],
    },
    dashboard: {
      name: 'whatevz',
    },
  },
}

Not saying those formats are final, just some pseudo code to represent the idea. This still feels a bit complex, so we're going to give it some time to see if a better solution presents itself.

@epixa currently, a document is defined by a type. The change @chrisdavies described would remove that notion and allow a type, which is just a namespaced piece of data, to contain multiple types per document. I feel this would simplify the migrations and mappings when plugins are extending saved objects, allowing you to easily identify who owns what piece of an object. Thoughts?

Keeping everything straight via this comment thread is getting a bit challenging. I'm going to keep this markdown file up-to-date as a sort of mini-specification:

https://gist.github.com/chrisdavies/8a15622a7b482821711aa24ed40d2efb

Basically, wht's new in that gist is:

  • Plugins need to expose their mappings so the migration system can understand what plugins defined what mappings
  • When a plugin is disabled, its data and mappings are not removed (and no migration is necessary)
  • When a plugin is deleted, the migration system can be told to delete all associated mappings / data

    • But it should also be fine to just let it live as vestigial data

  • The data import feature probably should run through a specific import API endpoint, rather than through the saved object client API, as it needs to understand migration version information, and it needs to support dropping invalid data

Just to make sure my understanding of the verbiage is clear, when you say:

In order to properly handle this scenario, we'll lock down what properties a plugin owns on a document

  • Plugins will only be allowed to migrate properties that they own

Does that allow the case where a plugin allows plugins to modify properties that it owns? Take my case above where Kibana plugin owns the agg types but a plugin can create new agg types and might need to update them. In this case, the property is "agg type" and it is owned by Kibana, not by the other plugin. But the other plugin needs to migrate the "agg type" if it owns the specific type.

I haven't yet been able to think about all the new details in here, so apologies for that. I did want to touch on one particular thing that @tylersmalley mentioned:

The change @chrisdavies described would remove that notion and allow a type, which is just a namespaced piece of data, to contain multiple types per document.

I don't see how the notion of type goes away. We still need to associate saved objects with other concrete saved objects (like dashboard -> visualization), which we can only do if there is a definitive type associated with them. Even the high level example shown in the comment prior to that one still included a top level type attribute.

Another thing we're currently relying on type for, for better or for worse, is as an implicit way to establish ownership over an object. I think this notion is important, where no matter what sort of extensions are applied to any given saved object, there's only one definitive plugin that owns that object. So if we ever needed to do something like "uninstall this plugin and nuke everything associated with it", we can do it even when other plugins have extended it.

Yeah. My first stab at this is more raw than we had discussed. So, the notion of types isn't affected. It's possibly too raw, but my thought was that migrations might be used to migrate saved objects from one format to another, and so, maybe they should live at a different abstraction level than saved objects.

After going back and forth, wrote the current implementation such that transforms receive two arguments: (source, doc) where source is the raw _source value, and doc is the raw document read from Elastic, including the _id property.

Transforms can then return either a new source shape, or an object with _id and _source, the idea being that seeds might want to specify a well-known id and transforms might want to (someday) transform the way ids are being stored... This might be putting too much power into the hands of migration authors, though, and I'm not 100% satisfied with the implementation, either.

So, the PR is as much a "let's start talking details" as it is a final implementation.

@epixa I want to touch on something you mentioned in this thread which I missed regarding automatically running migrations.

My concern with the automatic migrations is they would render any existing Kibana instances pointed to the same index as inoperable. Someone bringing up a new version of Kibana would, without knowing, affect any existing instances. I think we should either make this a command which should be explicitly run, or something that can be run in the Kibana UI through an API call.

@tylersmalley running multiple versions of Kibana against the same Elasticsearch index is something we've historically had poor support for, and likely will have to get a lot more strict about in the near future. We aren't writing to the .kibana index in a backwards compatible manner consistently these days, and when multiple version of Kibana are running at the same time, we can get some rather inconsistent behavior.

Running two different versions of Kibana behind a load balancer at one time isn’t something we can realistically support right now. At the very least, the way we handle config migration can cause data loss in that scenario. Non-sticking sessions (which is what we recommend) can result in random client errors due to mismatched versions. Security privileges will behave strangely.

We may want to support zero downtime rolling upgrades for Kibana one day, but realistically that’s just not safe right now.

Later today, I'm going to move forward with getting the .kibana-specific migration stuff together. My plan right now is to automatically run the migrations when Kibana starts. The reason is that if we run them automatically, devs won't have to manually run migrations every time they launch Kibana. And our existing tests should just work, as migrations will run at start-up, rather than having to be explicitly run.

It's fairly easy to turn off auto-running and just make migrations an explicit script, though, if we want to do that instead.

Bah. I've come around to share the opinion of @tylersmalley that we should not automatically run migrations.

The current approach to migrations (and to import/export) has scattered migration logic into places it really doesn't feel like it belongs, although it's approaching "done" status, it feels messy. (This is particularly true of the saved object client changes.)

  • This would be a cleaner approach:

    • Don't auto-run migrations

    • Create a migrations API

    • No special yarn scripts for undo / rollback

    • Create a UI that shows when Kibana is un-migrated, which allows users to

    • Run migrations (optionally force)

    • Rollback to previous version

    • Auto-run migrations as part of test bootstrapping

    • Zero-downtime deploys can be done like so:

    • Stand up a new Kibana instance while the old is still active

    • Kick off migration via the API

    • Wait for completion

    • Flip the load balancer to point to the new Kibana instances

    • Take the old instances offline

    • While migrations are running, old Kibana is still accessible in a read-only state, so some features won't work properly

This feels cleaner than the current approach, though this will take more work, and involve a new UI.

Secondly...

Import / export is currently planned via a small change to the saved object client. It's a bit of a hack. It would be cleaner, I think to make import / export part of the aforementioned migrations API instead, and modify the import/export UI to call this.

  • A cleaner import / export approach:

    • Done via an API instead of the hacky saved object client changes

    • Given a list of ids, returns the exported JSON file

    • Given a JSON file, imports all docs, migrating as needed

    • If any docs already exist, these are sent back for confirmation (want to override? Yes, Yes to all, No / Skip, Cancel Import, etc)

    • If any docs fail to migrate for some reason, this is indicated, etc.

The upside is that migration logic resides in only two places: the migration engine itself, and the Kibana index-specific API.

The downside is: it's more work, more code, and it means that object-level security would need to be enforced in the saved-object client and in the migrations API.

@tylersmalley @epixa

We decided to run migrations when Kibana starts.

If you have 2+ Kibana servers, one will run migrations when it starts. The other(s) will not run migrations, as migrations will already be running (in one of the other servers). So, what should these secondary servers do?

  • Fail to start w/ some error "Cannot start until the Kibana index is finished migrating"
  • Go into a holding pattern, waiting for the Kibana index to finish migrating before becoming "green"

    • This seems preferable

    • Should they time out after a while?

What about this scenario:

  • Original, unmigrated index w/ lots of docs from various plugins...
  • Attempt to migrate, but w/ out those plugins enabled or even installed...
  • Currently, migrations will bomb on the docs owned by the missing plugins, as it never knew about those plugins in the first place

I think we need to move mappings over from original indices, but trump them w/ mappings defined by current plugins.

Is there a better alternative?

If you have 2+ Kibana servers, one will run migrations when it starts. The other(s) will not run migrations, as migrations will already be running (in one of the other servers). So, what should these secondary servers do?

A holding pattern probably makes sense. In a lot of ways, it's the same thing as the one running migrations, it's just not actually running them itself. It could still serve a "migrations running" response to http requests or something, just like the server that's doing the migration.

I think we need to move mappings over from original indices, but trump them w/ mappings defined by current plugins. ... Is there a better alternative?

I can't think of one off-hand. This seems like an appropriate solution - basically we're just moving the problem down the road. I do think we should be validating the core _shape_ of these orphaned mappings though. There are assumptions Kibana itself makes about the shape of all documents in its index, with things like a keyword type attribute and an id. While we can't validate the plugin-specific changes, we can certainly validate our core assumptions. What do you think?

@epixa

I think the holding pattern makes sense.

I agree that we need to move the mappings.

I've got mixed opinions about validating mappings / docs. On the one hand, I like having as strong a guarantee as we can that the index is in a state we understand. On the other hand, what do we do if we find mappings / docs that don't meet our expectations? Fail the migration? In that case, how does a customer fix the issue? Should we add a --drop-invalid-docs option or something along those lines?

Jotting down notes from a conversation w/ @tylersmalley and @spalger.

We've decided to explicitly not support deletion of the .kibana index. Migrations ensure the index is created and properly aliased when Kibana starts. This new behavior disrupts any existing code that would have deleted the index (as you can't delete an alias in the same way you would an index).

Instead, we are putting logic into the saved object client which will assert a valid index migration state prior to each write, and fail the write if the index is in an invalid state. This seemed like a reasonable approach to us, as by definition, if people are tinkering with the Kibana index and get it into an unknown state, it's unknown what the consequences of automatic recovery would be.

This feature is currently in code-review, finalization mode, but I've come across a potential roadblock.

Problem

The current implementation will require

  • Update of all test fixtures

    • Every time a migration is added (even if it doesn't affect the docs in the fixture)

    • (Maybe) every time the Kibana version bumps (the index names contain the Kibana version)

  • The sheer number of breaking tests is a yellow (or maybe red) flag

    • If it takes this much work to get tests working, how will this change affect 3rd parties?

Possible solution

I think it might be possible to have a migration system with a lighter touch and still hit the goals we've set. We could create a system that behaves similarly to the current Kibana version:

  • No migration state per index (except a temporary doc used for optimistic concurrency)
  • No alias
  • No index creation at start up-- only migration of existing indices
  • Saved object client puts an index template prior to any write operations
  • The Kibana index can be blown away and re-created by tests

How it would work:

  • We'd add a "migrationVersion" property to each doc (see details below)
  • When Kibana starts, it makes sure all existing docs are up to date

    • It does this by querying for any docs which have an out of date migrationVersion

  • The saved object client will also transform any docs it reads or writes

    • This allows us to handle scenarios where docs are snuck into the index after migration (e.g. the way our tests do)

  • Import / export is simplified, as the version info is exported / imported with each doc and properly handled by the saved object client
  • If a document has migration info from an unknown plugin (either missing or disabled)

    • We will still persist it, and allow mappings to dictate whether or not the persist succeeds

    • When (if ever) that plugin is enabled, the docs will be migrated if necessary

  • Migrations could be done in a (somewhat) non-destructive way without requiring an alias (if we so choose)

    • Reindex to a backup index prior to migrating the current index

    • Migration of the current index is a simple read, transform, update of each out-of-date doc

  • Initially, we can avoid supporting having "seed" migrations, but these are fairly simple to add back in if / when needed
  • The migrationVersion would look something like this:
{
  migrationVersion: {
    pluginA: 3,
    pluginB: 1,
  },
}
  • Explanation:

    • The pluginA properties are the ids of any plugins which have data in the document (generally, there will be only one)

    • The numeric value is the index of the latest migration from that plugin which has been applied to the doc

@chrisdavies What's the best way for me to learn about the current state of migrations, so I can better parse your new proposal? I'm looking for a definitive technical overview of how migrations work. Would that be https://github.com/chrisdavies/kibana/blob/feature/migrations-core/packages/kbn-migrations/README.md?

@epixa That readme is currently the best written source outside of the source code itself, so yep! I'm also happy to hop on Slack / Zoom as necessary.

Chatted w/ @spalger about this. Here's the summary of that conversation:

  • Not tracking index-level migration state, but per doc
  • When we start up, we do a query to migrate if needed
  • This is also done at read time and write time
  • Saved object client's find operation is a special case.

    • We'll add an aggregation to check if any docs are in old state

    • If so, we'll do a de-duped migration (de-duped per Kibana instance)

    • Re-executes the find and returns the result after migration completes

    • If migration fails

    • Kibana should go into red-state

    • Log the id of the doc, the id of the plugin and the id of the migration and the error

  • Saved object will continue to function as it does today w/ exception of the migration version check

We both think this change is worth doing now rather than moving forward with the current approach. This per-document approach is more robust in a number of ways, and also eliminates a good deal of complexity (such as the need for a --force option, the need to update test snapshots all over the place, etc).

Chatted w/ @tylersmalley and @jbudz and debated the pros/cons of the various approaches. The decision was to stay the course and fix all of the broken tests, as the current approach has less impact on performance, and ultimately is architecturally simpler even though it's significantly more code.

In a world of automatic migrations, the whole premise behind esArchiver seems flawed. Or at least the extent to which we're relying on esArchiver for tests is flawed.

Rather than importing raw document data through elasticsearch, we should probably be importing saved object data through kibana instead. If I add a migration and a test fails, then that should indicate a regression, but instead it's going to be treated as a blanket reason to rebuild the fixture data with esArchiver.

I can see esArchiver fixtures being useful for testing our migration system itself and/or for testing Kibana running against old known legacy document states if that isn't handled by the migration system automatically. But beyond that all other tests should really be seeding data the way we expect data to be seeded in the real world.

@epixa You and Tyler are on the same wavelength and have convinced me. :)

Good news, @tylersmalley ! More changes. @epixa and I talked this morning, and came up with a scenario that is going to come up in the next release. That scenario is splitting plugins apart into smaller plugins, and changing what plugins own a type.

This pointed out a flaw in the current implementation, and a change that both facilitates this requirement while also simplifying the implementation.

Currently, we allow migrations to apply to any doc based solely on the filter the migration specifies. We track migrations per plugin, and enforce migration order on a per-plugin basis.

Changes

  • Track migrations on a per type basis
  • Changing the type (e.g. dash -> dashboard) can be done by creating a migration for "dash" that returns a doc of type "dashboard"
  • Type ownership can be moved between plugins so long as the migrations also move
  • I think this means we no longer need to store mapping information in migrationState itself

@epixa pointed out that the current direction of migrations would allow us to easily make breaking changes to our API, as migrations would make it easy to change the shape of our data willy-nilly, and right now, our raw data is our API in some sense, as we expose it via the saved client system.

So, we decided to disallow breaking mapping changes during minor version transitions, and relegate such changes to the major version boundaries. Minor versions would allow adding properties to mappings, but not removing properties, nor changing leaf-properties.

@epixa suggested that in minor versions, we don't migrate the index at all, but simply run a transform function (one per type) on all docs coming into / out of the saved object API. So, the index may end up with docs of varying versions, but at read-time, they'd get sanitized and made consistent.

After heading down this road a bit, I think I've come up against a roadblock:

_Problem 1_

  • We can only run transform functions on full documents (e.g. not on partial documents)
  • There appears to be at least one saved object method that produces partial docs (find, and maybe update)
  • This means that any attempts at making the data consistent at read-time are not going to be fully reliable

_Problem 2_

  • Are writes backward compatible, too?

    • If so, how do we know which field wins?

    • Say, we have added a 'foo' field which is hydrated from a JSON blob

    • If the foo field doesn't exist, we hydrate it from JSON, if it does, we hydrate the JSON from the foo field

    • Some 3rd party reads that doc (including the foo field they know nothing about), they update the JSON as they always have, and pass the object back (including existing fields, such as foo, which they know nothing about), but their update gets obliterated by the foo field!

_Possible solutions_

Migrate the index even on minors

  • This allows us to do finds predictably
  • Instead, we can go back to migrating the index at startup, which for minor versions would look like:

    • In the _meta field, store

    • Kibana version

    • Checksum of all the types that have transforms defined (allows us to handle plugin enable / disable)

    • When booting, after Elasticsearch becomes available:

    • Compare the _meta with the current Kibana values

    • If index.version has a different major version than kibana.version, throw error

    • If index.version is more recent than kibana.version, throw error



      • Maybe allow starting w/ a --downgrade option or something, which allows downgrading within the same major version



    • If the version diff is a minor version bump,



      • Patch the mappings and index template


      • Scan docs in, run them through their transforms, write them back to the same index


      • When done, mark the index as up to date w/ kibana.version



  • In export files, we need only store the Kibana version, and can either store it once, or per doc, whichever is easiest

    • Importing will call the saved object API, passing it the doc(s) along with the Kibana version stored in the file

    • The appropriate transform(s) will be applied based on the Kibana version

  • When the saved object write APIs are not passed a Kibana version, they will apply only the current major version transform per doc
  • When developing a transform,

    • The developer will need to modify the Kibana version to be the appropriate major version for their transform (e.g. 6.x, if developing a 6.x transform)

    • In master, package.json is currently 7.x, so anyone writing 6.x transforms will need to change that temporarily

Modify saved object find to not support source filtering

  • Here, we'd always fetch full documents in the find API

    • Run them through the transforms

    • Perform source filtering logic on our own in Node

    • Pass the transformed and source-filtered documents over the wire

  • My concern with this is that the added complexity is worse than if we just migrate the index

    • But, it does mean we can punt index migration logic until major versions, which does simplify some things

I have a proposal which, I think, simplifies the migration process and allows for seamless integration with plugins.

Tracking migrations

Every saved object should include a field which records which plugins have an interest in that object and the version of the plugin that last upgraded the object. For instance:

    "versions": ["core:6.3.0", "my-plugin:6.2.0"]

This way, Core can apply all upgrades to migrate the object from 6.3.0 to the current version, regardless of whether my-plugin is installed or not. Later, when my-plugin is installed, a further migration can be run which can apply its own list of upgrades.

This property should be a required value that every plugin needs to provide, so that any object which is missing this value can be considered to come from the last known version without support for this property.

Core and plugins would be passed all objects, and they would be responsible for determining whether they should make changes to the object or not.

Specifying migrations

I don't see the point of using hashes to identify individual migrations - it overcomplicates things. I'd just have a single migration function per version, and deal with version changes when backporting.

Always reindex

I would always run migrations by reindexing to a new index. On the subject of limiting breaking changes, this could be done in minor versions by first applying the existing mapping to the new index before running any upgrades. On major version upgrades, you wouldn't apply the original mapping but instead build the new mapping entirely from core and plugins.

It's possible that an index has been upgraded to (eg) 7.1.0 but still contains plugin data for an uninstalled plugin that comes from 6.3.0. When installing the plugin, the index would need to be migrated in "major" mode.

Require user interaction

I wouldn't run the migration automatically, at least not at first. I would make the index read-only and lock the Kibana interface saying that a migration is required. When the user presses a button, run the migration and report back, saying that they can delete the old index once they're satisfied that the migration has been successful.

@clintongormley Thanks for the feedback. I like your suggestions.

Every saved object includes a field like "versions": ["core:6.3.0", "my-plugin:6.2.0"]

I like this. I had an implementation a few iterations back that did almost exactly this, except it stored it like so: {core: '6.3.0', 'my-plugin': '6.2.0'} which I think I prefer to the array. Is there a reason to prefer the array approach instead?

The downside to storing this on a per-doc basis is that it makes the "is this index up to date?" check more complex. Is there a reason to store this per doc instead of at the index level?

I don't see the point of using hashes to identify individual migrations

Yeah. I don't like the chekcsum, either, and changed my mind about it after posting my comment-- I forgot to update the issue. The original idea behind a checksum was that it allowed us to enforce migration ordering (this was before we tied migrations to specific Kibana versions) and it allowed for a very fast "is this up to date?" check. But I'm with you.

Always reindex

This is also what we were doing originally, and I also prefer this. It allows us to ensure that any changes made by migrations are accurately reflected in the index while also preventing data-loss from happening. I think it's a big enough win that it's worth doing. @epixa should weigh in on this though, as he has differing opinions.

A challenge with this is if folks enable / disable plugins while doing migrations. It could mean running multiple migrations per Kibana version. The original migration implementation managed this by reindexing to an index that looked something like this: .kibana-{VERSION}-{SHA_OF_PLUGINS} so that there would not be conflicting index names, so that's solvable, but:

  • You may end up w/ more indices than you'd like
  • Adds friction to the enable / disable plugin scenario

Require user interaction

I like making migrations an explicit step. It hopefully means that problems with migrations will be detected early on, rather than down the road after they've crept into various parts of the system.

I think the main challenge to a UI approach is, who has permission to click the "Migrate This Thang" button?

Anyone? Migrations are not an x-pack feature, so they can't fully rely on auth, though they can optionally rely on it. I think OSS customers would just have to live with the fact that anyone can click this.

For x-pack customers we could restrict clickability to only those users who have full privileges to the .kibana index (assuming we have such a role / concept).

An alternative solution that we have kicked around is:

  • If migration is required, put Kibana into a "red" state w/ an appropriate message
  • Don't provide a "Migrate" button in the UI, but provide clear operational guidelines on how to run migrations (as a cli, say)
  • Operations folks can run migrations via a terminal command or whatever, maybe starting kibana with a --migrate flag or something

This approach has higher friction, but does get around the permissions issue.

I like this. I had an implementation a few iterations back that did almost exactly this, except it stored it like so: {core: '6.3.0', 'my-plugin': '6.2.0'} which I think I prefer to the array. Is there a reason to prefer the array approach instead?

Only that you use fewer fields. I'd be ok with the object approach too.

The downside to storing this on a per-doc basis is that it makes the "is this index up to date?" check more complex. Is there a reason to store this per doc instead of at the index level?

The thing I liked was that (after we move to this approach) it provides a mechanism for each plugin to indicate its interest in a particular object, which makes it easy for a plugin to decide whether it needs to do anything or not. It'd also be OK to store the upgrade levels in a doc in Kibana which summarises the status for the index. Or, with your field structure, you could use a max aggregation to figure out whether migrations are required.

A challenge with this is if folks enable / disable plugins while doing migrations. It could mean running multiple migrations per Kibana version. The original migration implementation managed this by reindexing to an index that looked something like this: .kibana-{VERSION}-{SHA_OF_PLUGINS} so that there would not be conflicting index names, so that's solvable, but:

For index names, we don't need the SHA_OF_PLUGINS, we can just have a counter.

  • You may end up w/ more indices than you'd like

Easy to provide a UI function to clear up old indices.

  • Adds friction to the enable / disable plugin scenario

Disable doesn't need to do anything, just enable. And we could use that max-agg to check if anything needs to be done. Obviously that'll only work once all plugins actually record an interest in saved objects.

For x-pack customers we could restrict clickability to only those users who have full privileges to the .kibana index (assuming we have such a role / concept).

I like the idea of doing it through the UI instead of the CLI. It can run as the Kibana user, so doesn't require the user to have admin permissions (although we could provide that option as a setting). If somebody has installed a new Kibana, then the migration needs to be run regardless of who runs it. The only downside of letting any user run it is that the admin doesn't have control over the timing.

I'm sold.

Whether a plugin needs to do anything or not

I think the migration system can automatically determine this, as plugins register migrations on a type + version basis, and we'll know the type + version of docs in the index and in imported files.

That said, the max aggregation is a good solution, and there are other advantages to storing things on a per doc basis. I'll weigh the pros/cons and make a decision, there.

Obviously that'll only work once all plugins actually record an interest in saved objects.

We can do this now, as plugins currently register migrations on a per type basis, so we'll know if we need to migrate or not.

I have the current direction outlined in the readme in this PR, and have also outlined possible changes per @clintongormley's comments.

@epixa and @ @clintongormley, I think the two of you might want to kick around the conflicting options that have been proposed (see the Outstanding questions section of the readme). I'm going to hold off on moving this forward until that's resolved. Happy to hop on Zoom or similar, if need be.

@chrisdavies Thanks for doing due diligence on the compatibility approach. I'm disappointed it won't work since it was a convenient low-risk stepping stone to migrations, but I can't think of any solution to the two problems you raised that doesn't involve dramatically complicating the whole system.

That does leave us with actually executing migrations.

We keep talking about major versus minor, but migrations need to be possible in patch versions as well. If we introduced some bad bug in a minor that resulted in broken data, we need to be able to release a migration that fixes that problem in a patch version.

Every saved object should include a field which records which plugins have an interest in that object and the version of the plugin that last upgraded the object.

This would mean that things like security and tags will need to upgrade all existing objects in the index as well as add themselves to every new object at write time. I don't necessarily think that's a problem, but this will need to happen in place rather than requiring a full reindex, otherwise simply having security enabled will result in a reindex on every upgrade.

Require user interaction

I think the benefits of making this an active operation don't outweigh the drawbacks. Kibana isn't just a UI, and it's not acceptable for us to block the REST API on a user initiated action like this. As we roll out more and more capabilities at the REST API level, we need to be working toward minimizing downtime on upgrades rather than increasing it.

That aside, we can't simply put things in read-only state and have any expectation that Kibana will behave in a sensible way. Simply loading the Kibana UI requires UI settings which are saved objects, and we can't trust that those are compatible with the current version pre-migration.

It'd also just not be a great experience. Imagine if every time we upgrade demo.elastic.co some random visitor stumbles upon an admin interface asking them to migrate.

Fortunately, with a reindex approach, there shouldn't be any real risk to just running the migration on startup. There'd still be downtime, but it would be as minimal as possible. There's no dealing with a read-only state as we simply don't listen on a port until things are square.

This would mean that things like security and tags will need to upgrade all existing objects in the index as well as add themselves to every new object at write time. I don't necessarily think that's a problem, but this will need to happen in place rather than requiring a full reindex, otherwise simply having security enabled will result in a reindex on every upgrade.

@epixa can you explain more about ^^ - I don't understand why anything needs to happen in place? My understanding is that a doc is retrieved from the old index, funnelled through core migrations, then plugin migrations, then written to the new index. So we'd only need to do this when core or a plugin is upgraded.

@clintongormley and I spoke directly about this to get on the same page, and this is where we landed:

  1. It does look like we need to support full migrations rather than just a compatibility layer. The only way I can think of to address the concern you raised about compatibility layer in find would be to add a ton of complexity to how transformations are defined to give them field level context, and that's just not a rabbit hole worth going down at this time, if ever.

  2. While as a policy we will seek to limit migrations wherever possible, we need to be able to run migrations in _any_ version, including a patch. This is important in the event that a plugin gets installed that requires a migration but also in the event that there exists an egregious bug that results in bad data that we need to address in a patch version.

  3. The proposal to tag objects with the last kibana and plugin versions that ran migrations on them seems pretty solid. It also means that things like object import actually rely on the same mechanism for determine what transformations need to happen as the migration system itself, which is nice.

  4. Always reindexing is the way to go. It's the only definitive way we can help prevent bad data loss from stupid mistakes that Kibana or plugins make.

  5. Migrations should happen at Kibana startup and shouldn't require user interaction. Kibana's more than a UI, and we need to limit the downtime during upgrade as much as possible. There are multiple options for how to make Kibana behave during a migration to make it "nicer" for users, but initially we should just block Kibana from binding to a port until the migration is finished. This way it effectively just looks like a longer startup time for Kibana when a migration runs. We can continue to iterate on this as needed.

@chrisdavies What do you think?

I think it's a solid plan. It's actually pretty close to the original plan, but with some nice simplifications (like tracking migrations based on versioning rather than clunky hashes / migrationState). I agree completely that the consistency is a nice feature of this approach.

So yaya! I'm really happy with this direction.

Thanks, @epixa @clintongormley for hopping on this and hammering it out.

@epixa @tylersmalley

A couple of questions remain that I'm unsure about:

What should the saved object REST API do when it is sent documents that have no migration version information associated with them?

Do we assume the docs are up to date or do we assume they are out of date? It seems that we need to assume they are out of date (e.g. they predate the migrations feature), but:

  • If out-of-date, this means Kibana and all future API callers will need to pass accurate version information to the saved object API

    • Is this a realistic demand for us to make?

    • It means API consumers need to have an intimate knowledge of our versioning scheme

    • In the browser, we may be able to do this automatically via the saved object browser client

    • When Kibana loads in the browser, we can fetch a dictionary of plugin -> type -> version

    • When the browser writes a saved object, the saved object client assumes it is up to date (unless it is explicitly told otherwise), and it slaps on the version information of the plugin that owns the document's type prior to sending it to the server

    • This assumes that the browser is properly building the object, but we can build a validation layer into the server at some point, if we wish to verify this prior to writing

    • This'll get more complex if we introduce multi-type documents (e.g. docs that have a primary type, but also anciliary data such as ACLs or whatever)

  • If up-to-date, this means we'll just pass them into the index, and hope for the best

    • This seems likely to lead to improperly formed data in the index

What do we do if the saved object API receives an "update" call with a document whose migrationVersion is out of date?

I think we have to fail the update, as it seems unrealistic to ask migration authors to write migrations in such a way that they can properly handle partial documents (and update calls will have partial documents).

EDIT: Given the complexity of expecting everyone to pass our API proper migration version info, it may be better / simpler to say that we don't pass migration version info with every saved object write. Instead, we assume docs w/ no migration version are up to date. And, we tell API consumers that if they are calling the Kibana API, they need to update their code to send us up-to-date documents or the resulting behavior is undefined.

What should the saved object REST API do when it is sent documents that have no migration version information associated with them?

I think the saved object API should only deal in terms of the latest document format, which does mean that we shouldn't introduce BWC breaks on the data level within minors, but that was our plan anyway. This is a decision we can change in the future if we deem it necessary.

The _import_ API is the only API that must be able to deal with older objects, and for that API I think we should assume all objects will contain a version or they are treated as crazy-old (this is the precise way to refer to objects that predate this migration system). Of course we'll need to ensure the _export_ API includes the current version on each object.

What do we do if the saved object API receives an "update" call with a document whose migrationVersion is out of date?

As in my previous answer, I don't think these APIs should deal with migration at all for now. I'd take it a step further and say that we should explicitly deny any request (other than import) that even specifies a migrationVersion property, which will prevent people from doing things they don't expect and will give us the ability to introduce a more robust handling of migrationVersion as a feature in a future minor version if we felt it necessary.

I'm going to close this as saved object migrations were merged in #20243

Was this page helpful?
0 / 5 - 0 ratings