Cardano-db-sync: 随机“插入ABlock中的数据库查找失败”错误

创建于 2020-09-28  ·  6评论  ·  资料来源: input-output-hk/cardano-db-sync

按照有关的要求@erikd这个其他问题,我重新打开这是一个新的问题,因为我被这个最新触及5.0.1打包在正式版hub.docker.com图像( inputoutput/cardano-db-sync:5.0.1 [0])。
我必须说,自从这些日志 [1][2] 崩溃以来,问题没有再次出现。

我认为这可能是由一些postgresql连接问题引起的,但无论如何,恕我直言,在最坏的情况下,主进程应该exit >0 (因为它不会从 db 线程自动关闭中恢复)或 db引发这些异常时,应重新启动线程。

[0] inputoutput/cardano-db-sync<strong i="15">@sha256</strong>:b09f440d868749135e74c0bfe6154f210d5836bc2d24a44e484c7dbb4b837689
[1]

[db-sync-node:Info:3354] [2020-09-26 04:29:44.87 UTC] insertByronBlock: slot 3025000, block 3023467, hash 43b510c9fa0d1021d4efbfb0a07c1e90f40258c05d7d66b266aee4fb9963f678
[db-sync-node:Error:3354] [2020-09-26 04:29:59.56 UTC] DB lookup fail in insertABlock: block hash f8452c44591e3db7d5534ceaaff56922609d81deed45b953e7220601bfd4ec87
[db-sync-node:Info:3354] [2020-09-26 04:29:59.56 UTC] Shutting down DB thread
[db-sync-node:Error:3357] [2020-09-26 04:29:59.56 UTC] recvMsgRollForward: AsyncCancelled
...
FROZEN FOREVER
...

[2]

Generating PGPASS file
Connecting to network: mainnet
[db-sync-node:Info:4] [2020-09-26 09:01:38.91 UTC] NetworkMagic: 764824073
[db-sync-node:Info:4] [2020-09-26 09:02:11.59 UTC] Initial genesis distribution present and correct
[db-sync-node:Info:4] [2020-09-26 09:02:11.59 UTC] Total genesis supply of Ada: 31112484745.000000
[db-sync-node:Info:4] [2020-09-26 09:02:11.69 UTC] Inserting Shelley Genesis distribution
[db-sync-node:Info:4] [2020-09-26 09:02:11.83 UTC] epochPluginOnStartup: Checking
[db-sync-node:Info:4] [2020-09-26 09:02:11.86 UTC] localInitiatorNetworkApplication: connecting to node via "/node-ipc/node.socket"
[db-sync-node.Handshake:Info:30] [2020-09-26 09:02:11.86 UTC] [String "Send MsgProposeVersions (fromList [(NodeToClientV_1,TInt 764824073),(NodeToClientV_2,TInt 764824073),(NodeToClientV_3,TInt 764824073)])",String "LocalHandshakeTrace",String "ConnectionId {localAddress = LocalAddress {getFilePath = \"\"}, remoteAddress = LocalAddress {getFilePath = \"/ipc/node.socket\"}}"]
[db-sync-node.Handshake:Info:30] [2020-09-26 09:02:11.87 UTC] [String "Recv MsgAcceptVersion NodeToClientV_3 (TInt 764824073)",String "LocalHandshakeTrace",String "ConnectionId {localAddress = LocalAddress {getFilePath = \"\"}, remoteAddress = LocalAddress {getFilePath = \"/ipc/node.socket\"}}"]
[db-sync-node:Info:34] [2020-09-26 09:02:11.87 UTC] Starting chainSyncClient
[db-sync-node:Info:34] [2020-09-26 09:02:13.97 UTC] Cardano.Db tip is at slot 3025325, block 3023792
[db-sync-node:Info:39] [2020-09-26 09:02:13.97 UTC] Running DB thread
[db-sync-node:Info:39] [2020-09-26 09:02:14.45 UTC] Rolling back to slot 3025325, hash f8452c44591e3db7d5534ceaaff56922609d81deed45b953e7220601bfd4ec87
[db-sync-node:Info:39] [2020-09-26 09:02:14.46 UTC] Deleting blocks numbered: []
[db-sync-node:Info:42] [2020-09-26 09:02:14.58 UTC] getHistoryInterpreter: acquired
[db-sync-node:Error:39] [2020-09-26 09:02:42.88 UTC] DB lookup fail in insertABlock: block hash ff6d511e65fb979ac511e8658c20cfe608bdfe7d3e6172f49115e52487812423
[db-sync-node:Info:39] [2020-09-26 09:02:42.88 UTC] Shutting down DB thread
[db-sync-node:Error:42] [2020-09-26 09:02:42.89 UTC] recvMsgRollForward: AsyncCancelled
...
FROZEN FOREVER
...

所有6条评论

我见过

DB lookup fail in insertABlock: block hash

之前,但一个是由于 postgres 的磁盘空间不足,另一个是在我在测试期间对数据库进行了潜在的不安全操作之后。

至于FROZEN FOREVER部分,这不是我可以直接控制的。 我的代码调用执行 ChainSync 协议的网络代码,然后再调用我的另一段代码。 在这种情况下,我的内部代码抛出了一个异常,该异常被网络代码捕获并且没有传播回我的代码。 可能有一个解决办法,但这不太可能很简单。

我的建议是:

  • 确保机器有足够的磁盘空间(完全同步到纪元 220,我的 db-sync 实例使用的是 6G 磁盘)。
  • 删除并重新创建数据库(以确保删除现有的损坏)。
  • 手动监控同步(应该最多约 3 小时)。

对不起,我忘了从附加的日志中提到从那​​时起它就没有再发生过; 这就是为什么我认为它可能是我的postgres实例的一些特殊问题(当时我在玩 HA IIRC)。

我们可以让这个开放一段时间,看看我们是否能提出更多有用的线索。

JFTR,我从头开始重新同步主网,发现它实际上在一段时间后恢复(至少在最新版本中)并且它可能是由某些cardano-node套接字不可用引起的 [0]。
后一个问题是由于我通过 TCP 共享套接字,因此它可以访问从不同的 k8s pod 运行的不同服务(没有共享的 ipc),并且我得到了一些不稳定性(还不确定它们到底来自哪里,我需要做更多的 TCP 调整) .

[0]

[db-sync-node:Info:217] [2020-10-20 08:09:02.03 UTC] epochPluginInsertBlock: epoch 182
[db-sync-node:Info:217] [2020-10-20 08:11:01.01 UTC] insertByronBlock: slot 3955000, block 3952915, hash d5941f934cf5eb5c3105b584c07aa6c872272dbd0d4fbe78efb36682ed59f211
[db-sync-node:Info:217] [2020-10-20 08:15:22.01 UTC] insertByronBlock: slot 3960000, block 3957915, hash 92494b54140f9db989618f93f11be579b1112d66c173d09cdfb5ac8f9ae6d9ce
[db-sync-node:Error:217] [2020-10-20 08:15:54.86 UTC] DB lookup fail in insertABlock: block hash 55f53f6ddf3546aa29079945eb995244c4a5a205bb287ca76c89a85e2f60ef6d
[db-sync-node:Info:217] [2020-10-20 08:15:54.86 UTC] Shutting down DB thread
[db-sync-node:Error:220] [2020-10-20 08:15:54.86 UTC] recvMsgRollForward: AsyncCancelled
[db-sync-node:Error:211] [2020-10-20 08:34:04.83 UTC] ChainSyncWithBlocksPtcl: AsyncCancelled
[db-sync-node.Subscription:Error:207] [2020-10-20 08:34:04.83 UTC] [String "Application Exception: LocalAddress {getFilePath = \"/node-ipc/node.socket\"} MuxError MuxBearerClosed \"<socket: 12> closed when reading data, waiting on next header True\"",String "SubscriptionTrace"]
[db-sync-node.ErrorPolicy:Warning:4] [2020-10-20 08:34:04.83 UTC] [String "ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError MuxBearerClosed \"<socket: 12> closed when reading data, waiting on next header True\"))) 20s 20s",String "ErrorPolicyTrace",String "LocalAddress {getFilePath = \"/node-ipc/node.socket\"}"]
[db-sync-node.Handshake:Info:1879] [2020-10-20 08:34:05.83 UTC] [String "Send MsgProposeVersions (fromList [(NodeToClientV_1,TInt 764824073),(NodeToClientV_2,TInt 764824073),(NodeToClientV_3,TInt 764824073)])",String "LocalHandshakeTrace",String "ConnectionId {localAddress = LocalAddress {getFilePath = \"\"}, remoteAddress = LocalAddress {getFilePath = \"/ipc/node.socket\"}}"]
[db-sync-node.Handshake:Info:1879] [2020-10-20 08:34:05.85 UTC] [String "Recv MsgAcceptVersion NodeToClientV_3 (TInt 764824073)",String "LocalHandshakeTrace",String "ConnectionId {localAddress = LocalAddress {getFilePath = \"\"}, remoteAddress = LocalAddress {getFilePath = \"/ipc/node.socket\"}}"]
[db-sync-node:Info:1883] [2020-10-20 08:34:05.85 UTC] Starting chainSyncClient
[db-sync-node:Info:1883] [2020-10-20 08:34:07.63 UTC] Cardano.Db tip is at slot 3960564, block 3958479
[db-sync-node:Info:1889] [2020-10-20 08:34:07.63 UTC] Running DB thread
[db-sync-node:Info:1889] [2020-10-20 08:34:08.16 UTC] Rolling back to slot 3960564, hash 55f53f6ddf3546aa29079945eb995244c4a5a205bb287ca76c89a85e2f60ef6d
[db-sync-node:Info:1889] [2020-10-20 08:34:08.17 UTC] Deleting blocks numbered: []
[db-sync-node:Info:1889] [2020-10-20 08:37:42.51 UTC] insertByronBlock: slot 3965000, block 3962915, hash c1fa47cd48391aa3774e61c8830b5d1e8bbf62c4720b856a952fdd6d95dcd5e6
[db-sync-node:Info:1889] [2020-10-20 08:41:59.70 UTC] insertByronBlock: slot 3970000, block 3967915, hash 71dcc988afa8fbf6e2f5c88b165f7ee7dbbe8a1a07141df8a5f2c42986494528
[db-sync-node:Info:1889] [2020-10-20 08:46:19.05 UTC] epochPluginInsertBlock: epoch 183
[db-sync-node:Info:1889] [2020-10-20 08:46:43.59 UTC] insertByronBlock: slot 3975000, block 3972915, hash 9be00c8a13eeaa21b24b070fe9254c6b30c3e1953e8625604c86e8c8d5695213
[db-sync-node:Info:1889] [2020-10-20 08:51:42.20 UTC] insertByronBlock: slot 3980000, block 3977915, hash ec3903b66e2a7638315e9d4b9a19ed904b71eae3e5e4cdd412a9b8ed5875b970
[db-sync-node:Info:1889] [2020-10-20 08:57:17.01 UTC] insertByronBlock: slot 3985000, block 3982915, hash 527077e870a8fa8648fb035107c906cc72e3ea2438a674020dc472ec9e2b8d50
[db-sync-node:Info:1889] [2020-10-20 09:01:33.12 UTC] insertByronBlock: slot 3990000, block 3987915, hash 2d6ed2788d5a694d05866d5c08002e46b9e119cab71c2e56d5b76432f37b8eda
[db-sync-node:Info:1889] [2020-10-20 09:05:17.18 UTC] insertByronBlock: slot 3995000, block 3992915, hash e043d137b267ed16b73b41cb92889b37cad381c53c80390598b547dc9fc58892
[db-sync-node:Info:1889] [2020-10-20 09:06:00.19 UTC] epochPluginInsertBlock: epoch 184
[db-sync-node:Info:1889] [2020-10-20 09:09:43.66 UTC] insertByronBlock: slot 4000000, block 3997915, hash a1f93a150ed43de21591124248b27ba0c830bed8c8e81c9a17d7823b5a6cfc97
[db-sync-node:Info:1889] [2020-10-20 09:14:39.01 UTC] insertByronBlock: slot 4005000, block 4002914, hash 3ad70e79c62b3f0cafac5a6bcce66702cac04ea16b7487ceb4627cd61eb9ca42

这可能是由某些 Cardano 节点套接字不可用引起的

这本身不会导致数据库查找失败。

但是,如果db-sync在套接字消失时正在执行 DB 查找,则产生的异常可能导致 DB 查找提前终止,进而可能会报告为查找失败。

您会注意到,当套接字返回时,它会回滚到块哈希55f53f6d...2f60ef6d (先前查找失败的哈希),而这次查找必须成功,否则链同步将在那里消失。

是否值得保持开放? 问题是,几乎在所有情况下,我都无法解决无法重现的问题。

此页面是否有帮助?
0 / 5 - 0 等级