Flynn: 重新启动节点后集群死亡

创建于 2016-11-15 · 9评论 · 资料来源: flynn/flynn

你好，

我已经在 AWS 上安装了 1 个节点集群（通过云安装程序）。
在我使用它时，我在尝试从 Github 存储库进行部署时遇到了一些错误。
当我按下启动按钮时，我看到了这个错误：

Error getting slugrunner image: controller: resource not found

我转到 AWS EC2 控制台并从那里重新启动实例，之后仪表板不再运行。

我在这里关注了其他问题，例如#2075 和其他类似的问题，但都没有奏效。

flyn-host ps结果：

root@...:/home/ubuntu# flynn-host ps
ID                                              STATE    CREATED        CONTROLLER APP  CONTROLLER TYPE  ERROR
ip1003163-59bf627f-4519-43e0-a260-de3c4578ad88  running  7 minutes ago  postgres        postgres         
ip1003163-55de131e-fe88-4282-82b3-1c4e0985010c  running  9 minutes ago  flannel         app              
ip1003163-23e6afa3-3df5-45c9-90b7-4c6e40f6dc96  running  9 minutes ago  discoverd       app

当我尝试从 CLI 扩展仪表板时：

[alireza@arci]$ ./flynn-cli -a dashboard scale web=1
Get https://controller.2rv1.flynnhub.com/apps/dashboard/release: dial tcp xxx.xxx.xx:443: getsockopt: connection refused

版本：

[alireza<strong i="22">@arch</strong> flynn-cli]$ ./flynn-cli version
v20161115.0

调试日志： https :

资料来源

Alir3z4

最有用的评论

数字海洋上的同样问题

alidavut 于 2016-11-16

👍2

所有9条评论

@titanous建议在https://github.com/flynn/flynn/issues/2075#issuecomment -155068222 上运行flynn-host fix --min-hosts 1 --peer-ips 127.0.0.1

结果：

root@:/home/ubuntu# flynn-host fix --min-hosts 1 --peer-ips 127.0.0.1
INFO[11-15|18:15:58] found expected hosts                     n=1
INFO[11-15|18:15:58] ensuring discoverd is running on all hosts 
INFO[11-15|18:15:58] checking flannel 
INFO[11-15|18:15:58] flannel looks good 
INFO[11-15|18:15:58] waiting for discoverd to be available 
INFO[11-15|18:15:58] checking for running controller API 
INFO[11-15|18:15:58] checking status of sirenia databases 
INFO[11-15|18:15:58] checking for database state              db=postgres
INFO[11-15|18:15:58] checking sirenia cluster status          fn=CheckSirenia service=postgres
INFO[11-15|18:15:58] found running leader                     fn=CheckSirenia service=postgres
INFO[11-15|18:15:58] found running instances                  fn=CheckSirenia service=postgres count=1
INFO[11-15|18:15:58] getting sirenia status                   fn=CheckSirenia service=postgres
INFO[11-15|18:15:58] cluster claims to be read-write          fn=CheckSirenia service=postgres
INFO[11-15|18:15:58] checking for database state              db=mariadb
INFO[11-15|18:15:58] skipping recovery of db, no state in discoverd db=mariadb
INFO[11-15|18:15:58] checking for database state              db=mongodb
INFO[11-15|18:15:58] checking sirenia cluster status          fn=CheckSirenia service=mongodb
INFO[11-15|18:15:58] found running leader                     fn=CheckSirenia service=mongodb
INFO[11-15|18:15:58] found running instances                  fn=CheckSirenia service=mongodb count=1
INFO[11-15|18:15:58] getting sirenia status                   fn=CheckSirenia service=mongodb
INFO[11-15|18:15:58] cluster claims to be read-write          fn=CheckSirenia service=mongodb
INFO[11-15|18:15:58] checking for running controller API 
INFO[11-15|18:15:58] killing any running schedulers to prevent interference 
INFO[11-15|18:15:58] no controller web process running, getting release details from hosts 
INFO[11-15|18:15:58] starting controller web job              job.id=ip1003163-3fbc3084-9ead-497b-8e38-618a325a82cc release=c7359e8b-70ab-4f04-acc3-6c6670f2b0f6
INFO[11-15|18:15:58] waiting for job to start 
18:16:58.779286 host.go:157: discoverd: timed out waiting for instances

运行flynn-host fix --min-hosts=1给了我相同的结果。

Alir3z4 于 2016-11-15

我刚刚在AWS EC2上重新启动实例后发现，实例的公共IP已更改。
但是flyn-cli 还在尝试使用旧IP？

是否应该由 Flynn 在集群初始化时创建的 AWS VPC 处理 IP 更改？

Alir3z4 于 2016-11-15

我刚刚在 AWS 上初始化了一个 3 节点集群，我通过控制台重新启动了机器，同样的事情又发生了。

仪表板不可访问并且缩放它会导致连接被拒绝错误。
似乎flynn 将通过简单的重启而死亡。

实例上的 IP 与以前相同，自全新安装以来没有任何更改。
也不能做调试日志，所有节点都给我同样的错误：

root@ip-10-0-0-130:/home/ubuntu# flynn-host collect-debug-info
INFO[11-16|13:20:34] uploading logs and debug information to a private, anonymous gist 
INFO[11-16|13:20:34] this may take a while depending on the size of your logs 
INFO[11-16|13:20:34] getting flynn-host logs 
INFO[11-16|13:20:34] getting sirenia metadata 
INFO[11-16|13:20:34] getting scheduler state 
EROR[11-16|13:20:34] error getting scheduler state            err="object_not_found: no leader found"
INFO[11-16|13:20:34] getting job logs

我运行了flynn-host fix --min-hosts 3 ，结果：

ubuntu@ip-10-0-0-130:~$ flynn-host fix --min-hosts 3
INFO[11-16|13:13:12] found expected hosts                     n=3
INFO[11-16|13:13:12] ensuring discoverd is running on all hosts 
INFO[11-16|13:13:12] checking flannel 
INFO[11-16|13:13:12] flannel looks good 
INFO[11-16|13:13:12] waiting for discoverd to be available 
INFO[11-16|13:13:12] checking for running controller API 
INFO[11-16|13:13:12] checking status of sirenia databases 
INFO[11-16|13:13:12] checking for database state              db=postgres
INFO[11-16|13:13:12] checking sirenia cluster status          fn=CheckSirenia service=postgres
INFO[11-16|13:13:12] found running leader                     fn=CheckSirenia service=postgres
INFO[11-16|13:13:12] found running instances                  fn=CheckSirenia service=postgres count=2
INFO[11-16|13:13:12] getting sirenia status                   fn=CheckSirenia service=postgres
INFO[11-16|13:13:12] cluster claims to be read-write          fn=CheckSirenia service=postgres
INFO[11-16|13:13:12] checking for database state              db=mariadb
INFO[11-16|13:13:12] skipping recovery of db, no state in discoverd db=mariadb
INFO[11-16|13:13:12] checking for database state              db=mongodb
INFO[11-16|13:13:12] checking sirenia cluster status          fn=CheckSirenia service=mongodb
INFO[11-16|13:13:12] no running leader                        fn=CheckSirenia service=mongodb
INFO[11-16|13:13:12] found running instances                  fn=CheckSirenia service=mongodb count=0
INFO[11-16|13:13:12] getting sirenia status                   fn=CheckSirenia service=mongodb
INFO[11-16|13:13:12] killing any running schedulers to prevent interference 
INFO[11-16|13:13:12] getting service metadata                 fn=FixSirenia service=mongodb
INFO[11-16|13:13:12] getting primary job info                 fn=FixSirenia service=mongodb job.id=ip100268-e1451985-a4f2-4c61-925e-1dea329881b2
INFO[11-16|13:13:12] getting sync job info                    fn=FixSirenia service=mongodb job.id=ip1004182-a8beaac1-c540-49d6-b89e-537f4c0470ca
INFO[11-16|13:13:12] terminating unassigned sirenia instances fn=FixSirenia service=mongodb
INFO[11-16|13:13:12] starting primary job                     fn=FixSirenia service=mongodb job.id=ip100268-b40310fa-a8c7-4d28-8392-60609577fb88
INFO[11-16|13:13:12] starting sync job                        fn=FixSirenia service=mongodb job.id=ip1004182-301ba978-37b6-478e-a42c-0ea388d91f50
INFO[11-16|13:13:12] waiting for instance to start            fn=FixSirenia service=mongodb job.id=ip100268-b40310fa-a8c7-4d28-8392-60609577fb88
INFO[11-16|13:13:13] waiting for cluster to come up read-write fn=FixSirenia service=mongodb addr=100.100.7.2:27017
13:18:13.278015 host.go:157: timeout waiting for expected status

这也是 flyn-host.log 的最后 100 行：

root@ip-10-0-0-130:/home/ubuntu# tail -100 /var/log/flynn/flynn-host.log  
t=2016-11-16T13:20:35+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:20:35+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:20:35+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:20:35+0000 lvl=info msg="request completed" component=host req_id=7915c235-a0bc-4665-adbd-20d4a41fed1d status=101 duration=16.041734ms
t=2016-11-16T13:20:35+0000 lvl=info msg="request started" component=host req_id=2bbb78c5-656d-4fd5-aed1-24593a5b7211 method=POST path=/attach client_ip=10.0.0.130
t=2016-11-16T13:20:35+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:35+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:35+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:35+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:35+0000 lvl=info msg="request completed" component=host req_id=2bbb78c5-656d-4fd5-aed1-24593a5b7211 status=101 duration=3.740327ms
t=2016-11-16T13:20:35+0000 lvl=info msg="request started" component=host req_id=eceb6ab6-cdd8-4c06-8da5-82d97f763c5b method=POST path=/attach client_ip=10.0.0.130
t=2016-11-16T13:20:35+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:35+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:35+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:35+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:35+0000 lvl=info msg="request completed" component=host req_id=eceb6ab6-cdd8-4c06-8da5-82d97f763c5b status=101 duration=799.573µs
t=2016-11-16T13:20:51+0000 lvl=info msg="request started" component=host req_id=c7c43efe-21c8-43c4-a99d-a61c5513030e method=GET path=/host/jobs client_ip=10.0.4.182
t=2016-11-16T13:20:51+0000 lvl=info msg="request completed" component=host req_id=c7c43efe-21c8-43c4-a99d-a61c5513030e status=200 duration=2.070395ms
t=2016-11-16T13:20:51+0000 lvl=info msg="request started" component=host req_id=2b6ddb84-ebf3-44a5-b3dd-c737fe4cfb48 method=POST path=/attach client_ip=10.0.4.182
t=2016-11-16T13:20:51+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:20:51+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:20:51+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:20:51+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:20:51+0000 lvl=info msg="request completed" component=host req_id=2b6ddb84-ebf3-44a5-b3dd-c737fe4cfb48 status=101 duration=10.004283ms
t=2016-11-16T13:20:51+0000 lvl=info msg="request started" component=host req_id=92b53082-39b6-4c6e-a393-0df3420bd067 method=POST path=/attach client_ip=10.0.4.182
t=2016-11-16T13:20:51+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:51+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:51+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:51+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:20:51+0000 lvl=info msg="request completed" component=host req_id=92b53082-39b6-4c6e-a393-0df3420bd067 status=101 duration=2.816599ms
t=2016-11-16T13:20:51+0000 lvl=info msg="request started" component=host req_id=448c03b2-66e4-4e7f-b30c-c28dd46d0ee4 method=POST path=/attach client_ip=10.0.4.182
t=2016-11-16T13:20:51+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:51+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:51+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:51+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:20:51+0000 lvl=info msg="request completed" component=host req_id=448c03b2-66e4-4e7f-b30c-c28dd46d0ee4 status=101 duration=1.947717ms
t=2016-11-16T13:21:23+0000 lvl=eror msg="error repairing cluster" component=cluster-monitor fn=checkCluster err="discoverd: timed out waiting for instances"
t=2016-11-16T13:21:23+0000 lvl=eror msg="did not find any controller api instances" component=cluster-monitor fn=checkCluster
t=2016-11-16T13:21:23+0000 lvl=eror msg="scheduler is not up" component=cluster-monitor fn=checkCluster
t=2016-11-16T13:21:23+0000 lvl=eror msg="fault deadline reached" component=cluster-monitor fn=checkCluster
t=2016-11-16T13:21:23+0000 lvl=info msg="initiating cluster repair" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:21:23+0000 lvl=info msg="killing any running schedulers to prevent interference" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:21:23+0000 lvl=info msg="request started" component=host req_id=0743a3cd-e054-4b49-a253-0c22353eb32f method=GET path=/host/jobs client_ip=10.0.0.130
t=2016-11-16T13:21:23+0000 lvl=info msg="request completed" component=host req_id=0743a3cd-e054-4b49-a253-0c22353eb32f status=200 duration=1.117635ms
t=2016-11-16T13:21:23+0000 lvl=info msg="checking status of sirenia databases" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:21:23+0000 lvl=info msg="checking for database state" component=cluster-monitor fn=repairCluster db=postgres
t=2016-11-16T13:21:23+0000 lvl=info msg="checking sirenia cluster status" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:21:23+0000 lvl=info msg="found running leader" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:21:23+0000 lvl=info msg="found running instances" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres count=2
t=2016-11-16T13:21:23+0000 lvl=info msg="getting sirenia status" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:21:23+0000 lvl=info msg="cluster claims to be read-write" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:21:23+0000 lvl=info msg="checking for database state" component=cluster-monitor fn=repairCluster db=mariadb
t=2016-11-16T13:21:23+0000 lvl=info msg="skipping recovery of db, no state in discoverd" component=cluster-monitor fn=repairCluster db=mariadb
t=2016-11-16T13:21:23+0000 lvl=info msg="no controller web process running, getting release details from hosts" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:21:23+0000 lvl=info msg="request started" req_id=2aca8a7b-1bbc-44bc-b0ca-82fc7e73d4a9 component=host method=GET path=/host/jobs client_ip=10.0.0.130
t=2016-11-16T13:21:23+0000 lvl=info msg="request completed" req_id=2aca8a7b-1bbc-44bc-b0ca-82fc7e73d4a9 component=host status=200 duration=1.045199ms
t=2016-11-16T13:21:23+0000 lvl=info msg="starting controller web job" component=cluster-monitor fn=repairCluster job.id=ip1004182-0c54a7f7-d138-43cd-a8bc-eff1ee4920fc release=c19d8097-cc4e-4a2f-88c4-4a61acf8b8a9
t=2016-11-16T13:21:23+0000 lvl=info msg="waiting for job to start" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:21:39+0000 lvl=info msg="request started" component=host req_id=652a75ac-18e7-45e0-b523-8726ef985af9 method=GET path=/host/jobs client_ip=10.0.2.68
t=2016-11-16T13:21:39+0000 lvl=info msg="request completed" component=host req_id=652a75ac-18e7-45e0-b523-8726ef985af9 status=200 duration=3.269771ms
t=2016-11-16T13:21:39+0000 lvl=info msg="request started" component=host req_id=71f87425-cf51-48d2-9009-e7dadc7f10c8 method=POST path=/attach client_ip=10.0.2.68
t=2016-11-16T13:21:39+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:21:39+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:21:39+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:21:39+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-9dc574db-8bbf-40e3-b6a3-6fa9c8e55bab
t=2016-11-16T13:21:39+0000 lvl=info msg="request completed" component=host req_id=71f87425-cf51-48d2-9009-e7dadc7f10c8 status=101 duration=9.185237ms
t=2016-11-16T13:21:39+0000 lvl=info msg="request started" component=host req_id=6680f1f7-bad2-4107-a07e-c6b900430b90 method=POST path=/attach client_ip=10.0.2.68
t=2016-11-16T13:21:39+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:21:39+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:21:39+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:21:39+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d4b567a2-02e8-4dd4-a3ca-c485311a29d2
t=2016-11-16T13:21:39+0000 lvl=info msg="request completed" component=host req_id=6680f1f7-bad2-4107-a07e-c6b900430b90 status=101 duration=4.855914ms
t=2016-11-16T13:21:39+0000 lvl=info msg="request started" component=host req_id=1163fbbb-ac24-4f2c-a681-415b21811fa3 method=POST path=/attach client_ip=10.0.2.68
t=2016-11-16T13:21:39+0000 lvl=info msg=starting app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:21:39+0000 lvl=info msg=attaching app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:21:39+0000 lvl=info msg="sucessfully attached" app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:21:39+0000 lvl=info msg=finished app=host pid=2751 host.id=ip1000130 fn=attach job.id=ip1000130-d8e13387-ce07-4f5b-b1e7-d866f2d26706
t=2016-11-16T13:21:39+0000 lvl=info msg="request completed" component=host req_id=1163fbbb-ac24-4f2c-a681-415b21811fa3 status=101 duration=694.225µs
t=2016-11-16T13:22:23+0000 lvl=eror msg="error repairing cluster" component=cluster-monitor fn=checkCluster err="discoverd: timed out waiting for instances"
t=2016-11-16T13:22:23+0000 lvl=eror msg="did not find any controller api instances" component=cluster-monitor fn=checkCluster
t=2016-11-16T13:22:23+0000 lvl=eror msg="scheduler is not up" component=cluster-monitor fn=checkCluster
t=2016-11-16T13:22:23+0000 lvl=eror msg="fault deadline reached" component=cluster-monitor fn=checkCluster
t=2016-11-16T13:22:23+0000 lvl=info msg="initiating cluster repair" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:22:23+0000 lvl=info msg="killing any running schedulers to prevent interference" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:22:23+0000 lvl=info msg="request started" component=host req_id=8988a859-c539-4b5f-8f99-4f88f040bbe0 method=GET path=/host/jobs client_ip=10.0.0.130
t=2016-11-16T13:22:23+0000 lvl=info msg="request completed" component=host req_id=8988a859-c539-4b5f-8f99-4f88f040bbe0 status=200 duration=1.183111ms
t=2016-11-16T13:22:23+0000 lvl=info msg="checking status of sirenia databases" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:22:23+0000 lvl=info msg="checking for database state" component=cluster-monitor fn=repairCluster db=postgres
t=2016-11-16T13:22:23+0000 lvl=info msg="checking sirenia cluster status" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:22:23+0000 lvl=info msg="found running leader" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:22:23+0000 lvl=info msg="found running instances" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres count=2
t=2016-11-16T13:22:23+0000 lvl=info msg="getting sirenia status" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:22:23+0000 lvl=info msg="cluster claims to be read-write" component=cluster-monitor fn=repairCluster fn=CheckSirenia service=postgres
t=2016-11-16T13:22:23+0000 lvl=info msg="checking for database state" component=cluster-monitor fn=repairCluster db=mariadb
t=2016-11-16T13:22:23+0000 lvl=info msg="skipping recovery of db, no state in discoverd" component=cluster-monitor fn=repairCluster db=mariadb
t=2016-11-16T13:22:23+0000 lvl=info msg="no controller web process running, getting release details from hosts" component=cluster-monitor fn=repairCluster
t=2016-11-16T13:22:23+0000 lvl=info msg="request started" req_id=56a0fc7b-90a7-4a31-84e0-7dfb8dcf3938 component=host method=GET path=/host/jobs client_ip=10.0.0.130
t=2016-11-16T13:22:23+0000 lvl=info msg="request completed" req_id=56a0fc7b-90a7-4a31-84e0-7dfb8dcf3938 component=host status=200 duration=1.124783ms
t=2016-11-16T13:22:23+0000 lvl=info msg="starting controller web job" component=cluster-monitor fn=repairCluster job.id=ip1004182-c3d4ad69-288b-48f2-9332-b8988d4d6232 release=c19d8097-cc4e-4a2f-88c4-4a61acf8b8a9
t=2016-11-16T13:22:23+0000 lvl=info msg="waiting for job to start" component=cluster-monitor fn=repairCluster

Alir3z4 于 2016-11-16

数字海洋上的同样问题

alidavut 于 2016-11-16

👍2

在 ubuntu 16.04 上手动安装时遇到同样的问题。当我尝试部署新应用程序时，它失败并显示“错误：获取 slugrunner 图像时出错：控制器：找不到资源
退出状态 1"

bdevore17 于 2016-11-16

👍1

我今天再次尝试了一个 3 集群节点，发生了同样的事情，不知道为什么，这是可重现的。

通过简单地重新启动机器，集群从互联网和集群节点内部无法访问。

Alir3z4 于 2016-11-17

@Alir3z4为延迟道歉，我已经在 #3711 中提出了解决这两个问题的方法。

lmars 于 2016-11-18

👍1

感谢您解决这个问题以及修复和快速合并/发布。
我将在星期一再次开始配置另一个集群，看看它是如何进行的。

@lmars是否可以使用新版本更新当前集群，或者我必须再次重新初始化？

Alir3z4 于 2016-11-19

@lmars尝试更新但发生错误。
我在 #3714 为它打开了一个问题

Alir3z4 于 2016-11-19

此页面是否有帮助？

0 / 5 - 0 等级

Flynn: 重新启动节点后集群死亡

最有用的评论

所有9条评论

相关问题