Kibana: Kibana stays read only when ES high disk watermark has been exceeded and later gone beneath the limit

Created on 24 Aug 2017 · 22Comments · Source: elastic/kibana

Kibana version: 6.0.0-beta1

Elasticsearch version: 6.0.0-beta1

Server OS version: Ubuntu 16.04.2 LTS

Browser version: Chrome 60.0.3112.90

Browser OS version: Windows 10

Original install method (e.g. download page, yum, from source, etc.): Official tar.gz packages

Description of the problem including expected versus actual behavior:

I'm running a single node Elasticsearch instance, logstash and Kibana. Everything runs on the same host in separate docker containers.

If the high disk watermark is exceeded on the ES host, the following is logged in the elasticsearch log:

[2017-08-24T07:45:11,757][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-08-24T07:45:41,760][WARN ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] flood stage disk watermark [95%] exceeded on [CSOifArqQK-7PBZM_keNoA][CSOifAr][/data/elasticsearch/nodes/0] free: 693.8mb[2.1%], all indice
s on this node will marked read-only

When this has occured, changes to the .kibana index will of course fail as the index cannot be written to. This can be observed by trying to change any setting under _Management_->_Advanced Settings_ where a change to i.e. _search:queryLanguage_ fails with the message Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

index_read_only

If more disk space now is made available, ES will log that the node has gone under the high watermark:

[2017-08-24T07:47:11,774][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [one or more nodes has gone under the high or low watermark]

One would now assume that it would be possible to make changes to Kibana settings but trying to make a settings change still fails with the error message:

Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Steps to reproduce:

Make sure that setting changes can be performed without errors
Fill up the elasticsearch data disk so that the high disk watermark is exceeded (I used fallocate -l9G largefile)
Verify in the ES log that the high disk watermark has been exceeded and the indices has been marked read-only
Perform a setting change and verify that it fails since writes are prohibited
Resolve the high disk watermark condition (which I did with rm largefile)
Verify that the ES log states that the node has gone under the high disk watermark (and thus should be possible to write to?)
Perform a setting change and it will fail when it actually should succeed.

Pioneer Program Operations

Source

algestam

👍1

Most helpful comment

I just got hit by this. It's not just Kibana, all indexes get locked when the disk threshold is reached and never get unlocked when space is freed.

To unlock all indexes manually:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

xose on 20 Nov 2017

👍62 🎉14 ❤11 😄8

All 22 comments

So how do I recover from this? .kibana stays in read only no matter what I do. I have tried to snapshot it, delete it and recover it from snapshot - still read only...

scaarup on 29 Sep 2017

I just ran into this on a test machine. For the life of me I can't continue putting data in to the cluster. I finally had to blow away all the involved indices.

darkpixel on 20 Nov 2017

😕1

i resolved the issue by deleting the .kibana index:
delete /.kibana/
I lose certains configurations/visualizations/dashboards but it dislocked.

sz3n on 20 Nov 2017

👍2 🎉1

I just got hit by this. It's not just Kibana, all indexes get locked when the disk threshold is reached and never get unlocked when space is freed.

To unlock all indexes manually:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

xose on 20 Nov 2017

👍62 🎉14 ❤11 😄8

Thanks @xose, I just got hit by this again and was able to recover by using the command you suggested :)

The problem occurred on all indices, not just the .kibana one.

According to the ES logs, the indices was set to read-only due to low disk space on the elasticsearch host. I run a single host with Elasticsearch, Kibana, Logstash dockerized together with some other tools. As this problem affects other indices is think this is more of an Elasticsearch problem and that the problem seen in Kibana is a symptom of another issue.

algestam on 23 Nov 2017

This bug is stupid. Can you Unbreak it for now? At least you should display a warning and list a possible solution. It is really stupid for me to look into js error log and find this thread!

saberkun on 27 Nov 2017

👎7

@saberkun You can unbreak it by following the command @xose posted:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

darkpixel on 27 Nov 2017

👍25

Yes, I did.

On Sun, Nov 26, 2017 at 11:12 PM Aaron C. de Bruyn notifications@github.com
wrote:

@saberkun https://github.com/saberkun You can unbreak it by following
the command @xose https://github.com/xose posted:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/kibana/issues/13685#issuecomment-347074533,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEpb5RJrhqJ8fK9wxGtNvTZtomMtlqzZks5s6jbBgaJpZM4PBHOW
.

saberkun on 27 Nov 2017

Can you provide additional information? Did you receive an error when running the command? Did the indices unlock and now you're getting a new error message? What error messages are you seeing in your log files now?

darkpixel on 27 Nov 2017

Thanks. It is fixed by the command. I mean yes, I used it to fix the
problem

On Sun, Nov 26, 2017 at 11:19 PM Aaron C. de Bruyn notifications@github.com
wrote:

Can you provide additional information? Did you receive an error when
running the command? Did the indices unlock and now you're getting a new
error message? What error messages are you seeing in your log files now?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/elastic/kibana/issues/13685#issuecomment-347075205,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEpb5Xn5uJBlzvAyXkAjRPom-OiwJ43Gks5s6jg0gaJpZM4PBHOW
.

saberkun on 27 Nov 2017

+1
Receiving this error after upgrade from 5.5 to 6.0

kesha-antonov on 27 Nov 2017

ELK 6, cleared half the drive still read-only, logstash is allowed to write again, kibana remained read-only

Managed to solve the issue with the workaround provided by @xose

purplesrl on 27 Nov 2017

+1, same error for me.

harmenverburg on 4 Dec 2017

Same issue for me. Got resolved by solution given by @xose.

sangeetawakhale on 5 Dec 2017

👍1

Same here. All hail @xose.

patodevilla on 10 Jan 2018

I just upgraded a single-node cluster from 6.0.0 to 6.1.1 (both ES and Kibana). When I started the services back up, Kibana was throwing:

blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Same as last time--I had to delete the .kibana index to get it back up and going. There was also the current logstash index with one of the shards listed as unallocated. I deleted it as well and then got the usual flood of alerts in.

I didn't run out of space--there's ~92 GB out of 120 GB free on this test machine. The storage location is ZFS and a scrub didn't reveal any data corruption.

The only errors in the log appear to be irrelevant:

[2018-01-13T20:48:14,579][INFO ][o.e.n.Node               ] [ripley1] stopping ...
[2018-01-13T20:48:14,597][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:821) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:327) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:320) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:746) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:760) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:428) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:113) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1010) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:825) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1027) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:301) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at org.elasticsearch.http.netty4.Netty4HttpChannel.sendResponse(Netty4HttpChannel.java:146) [transport-netty4-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:491) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:380) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:375) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:91) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:908) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:891) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onClusterServiceClose(ClusterStateObserver.java:310) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:230) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.doStop(ClusterApplierService.java:168) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterService.doStop(ClusterService.java:106) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.stop(Node.java:713) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.close(Node.java:735) [elasticsearch-6.0.0.jar:6.0.0]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:89) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:76) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.elasticsearch.bootstrap.Bootstrap$4.run(Bootstrap.java:185) [elasticsearch-6.0.0.jar:6.0.0]
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] stopped
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] closing ...
[2018-01-13T20:48:14,704][INFO ][o.e.n.Node               ] [ripley1] closed
[2018-01-13T20:48:39,879][INFO ][o.e.n.Node               ] [ripley1] initializing ...
[2018-01-13T20:48:40,054][INFO ][o.e.e.NodeEnvironment    ] [ripley1] using [1] data paths, mounts [[/scratch/elasticsearch (scratch/elasticsearch)]], net usable_space [92.5gb], net total_space [93.6gb], types [zfs]
[2018-01-13T20:48:40,055][INFO ][o.e.e.NodeEnvironment    ] [ripley1] heap size [989.8mb], compressed ordinary object pointers [true]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] node name [ripley1], node ID [TvkaGbQpR5KZ-ZScMZN6AQ]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] version[6.1.1], pid[6942], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/4.10.0-38-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2018-01-13T20:48:40,120][INFO ][o.e.n.Node               ] [ripley1] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [aggs-matrix-stats]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [analysis-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [ingest-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-expression]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-mustache]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-painless]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [mapper-extras]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [parent-join]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [percolator]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [reindex]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [repository-url]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [transport-netty4]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [tribe]
[2018-01-13T20:48:41,321][INFO ][o.e.p.PluginsService     ] [ripley1] no plugins loaded
[2018-01-13T20:48:43,801][INFO ][o.e.d.DiscoveryModule    ] [ripley1] using discovery type [zen]
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] initialized
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,759][INFO ][o.e.t.TransportService   ] [ripley1] publish_address {192.168.42.40:9300}, bound_addresses {[::]:9300}
[2018-01-13T20:48:44,792][INFO ][o.e.b.BootstrapChecks    ] [ripley1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-13T20:48:47,864][INFO ][o.e.c.s.MasterService    ] [ripley1] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}
[2018-01-13T20:48:47,869][INFO ][o.e.c.s.ClusterApplierService] [ripley1] new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}, reason: apply cluster state (from master [master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-13T20:48:47,884][INFO ][o.e.h.n.Netty4HttpServerTransport] [ripley1] publish_address {192.168.42.40:9200}, bound_addresses {[::]:9200}
[2018-01-13T20:48:47,884][INFO ][o.e.n.Node               ] [ripley1] started
[2018-01-13T20:48:48,326][INFO ][o.e.g.GatewayService     ] [ripley1] recovered [6] indices into cluster_state
[2018-01-13T20:49:01,493][INFO ][o.e.c.m.MetaDataDeleteIndexService] [ripley1] [logstash-2018.01.14/D0f_lDkSQpebPFcey6NHFw] deleting index
[2018-01-13T20:49:18,793][INFO ][o.e.c.m.MetaDataCreateIndexService] [ripley1] [logstash-2018.01.14] creating index, cause [auto(bulk api)], templates [logstash-*], shards [5]/[0], mappings []
[2018-01-13T20:49:18,937][INFO ][o.e.c.r.a.AllocationService] [ripley1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2018.01.14][4]] ...]).

darkpixel on 14 Jan 2018

🎉1

+1 same error in 6.1.2

zjhgx on 7 Feb 2018

This is a function of Elasticsearch. Per the Elasticsearch error, all indices on this node will marked read-only.

To revert this for an index you can set index.blocks.read_only_allow_delete to null.

More information on this can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html

tylersmalley on 7 Feb 2018

FYI - for anyone still running into this, here's a quick one-liner to fix the indices:
curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

darkpixel on 27 Mar 2018

👍20

FYI - for anyone still running into this, here's a quick one-liner to fix the indices:
curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

I too was doing this until I found @darkpixel 's solution (https://github.com/elastic/kibana/issues/13685#issuecomment-347074533)

You can do this setting for _all instead of going one by one. In my case, it takes quite a while to do it for hundreds of indices, while setting on 'all' takes only a few seconds.

curl -XPUT -H "Content-Type: application/json" https://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

outworlder on 2 Oct 2018

👍6

i resolved the issue by deleting the .kibana index:
delete /.kibana/
I lose certains configurations/visualizations/dashboards but it dislocked.

Thanks a lot for this WA. It's solved problem for me.

Frank591 on 29 Nov 2018

👎3

This worked for me. Both commands were needed to get kabana working after a fresh install:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_cluster/settings -d '{ "transient": { "cluster.routing.allocation.disk.threshold_enabled": false } }'
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

This did not require deleting the .kibana index. Works perfectly now!

Source:
https://selleo.com/til/posts/esrgfyxjee-how-to-fix-elasticsearch-forbidden12index-read-only