Cardano-db-sync: Infinite Loop of Panic/Rollback (NewEpochFailure)

Created on 7 Nov 2020 · 31Comments · Source: input-output-hk/cardano-db-sync

```
[db-sync-node:Info:64] [2020-11-07 15:28:05.27 UTC] Starting chainSyncClient
[db-sync-node:Info:64] [2020-11-07 15:28:05.31 UTC] Cardano.Db tip is at slot 13132763, block 4917327
[db-sync-node:Info:69] [2020-11-07 15:28:05.31 UTC] Running DB thread
[db-sync-node:Info:69] [2020-11-07 15:28:06.16 UTC] Rolling back to slot 13132763, hash c19c89792973fe2fff25e5b715e785d549da9647c2f9b7940aefcd29759dcd70
[db-sync-node:Info:69] [2020-11-07 15:28:06.17 UTC] Deleting slots numbered: []
[db-sync-node:Error:69] [2020-11-07 15:28:08.93 UTC] runDBThread: Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 804642000000) (Coin 804640000000))))]]
CallStack (from HasCallStack):
error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-inplace:Shelley.Spec.Ledger.API.Validation
[db-sync-node:Error:64] [2020-11-07 15:28:08.93 UTC] ChainSyncWithBlocksPtcl: Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 804642000000) (Coin 804640000000))))]]
CallStack (from HasCallStack):
error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-inplace:Shelley.Spec.Ledger.API.Validation
[db-sync-node.Subscription:Error:60] [2020-11-07 15:28:08.93 UTC] [String "Application Exception: LocalAddress {getFilePath = \"/opt/cardano/cnode/sockets/node0.socket\"} Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 804642000000) (Coin 804640000000))))]]\nCallStack (from HasCallStack):\n error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-inplace:Shelley.Spec.Ledger.API.Validation",String "SubscriptionTrace"]
[db-sync-node.ErrorPolicy:Error:4] [2020-11-07 15:28:08.93 UTC] [String "ErrorPolicyUnhandledApplicationException Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 804642000000) (Coin 804640000000))))]]\nCallStack (from HasCallStack):\n error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-inplace:Shelley.Spec.Ledger.API.Validation",String "ErrorPolicyTrace",String "LocalAddress {getFilePath = \"/opt/cardano/cnode/sockets/node0.socket\"}"]

```

Source

reliablestaking

👀1

Most helpful comment

Ok, I know what is causing the problem. Fixing this is relatively simple. The fix will not require a db resync unless the ledger state is already corrupt (which will be detected by the fixed version of the software).

The problem is:

There are two versions of a function to apply a block to a ledger state; a slow one that does full checking and fast one that does fewer checks.
Since db-sync is getting blocks that have already been validated by the node, it seemed sensible to use the fast version.
I had been told that the checks in the fast version included checking that the block's previous hash matches the value of the head hash in the ledger state, but this hash check was not being done.
The protocol occasionally allows more than one slot leader to mint a block for a specific slot, and the blocks they produce will have different hashes (ie the block includes data specific to the slot leader).
Since I am using the fast version and my code rolls back to a specified slot, it it possible for it to roll back to the correct slot number, but the wrong block (ie slot number correct, but wrong hash and therefore wrong block).
When it now rolls forward, the lack of hash checking means the problem is not detected until the start of the next epoch.

erikd on 17 Nov 2020

🎉3 👍2

All 31 comments

Is this mainnet? Are you upgrading from one version of the software to another?

erikd on 9 Nov 2020

~The NewEpochFailure part of the error message suggests that the db-sync version is incompatible with the node version.~

Version 6.0.x of db-sync is compatible with 1.21.x of the node.

erikd on 9 Nov 2020

yes, mainnet, I built a new DB and it is working now with the same version, so not sure what happened

reliablestaking on 9 Nov 2020

Closing this.

erikd on 10 Nov 2020

I've just hit this on the cardano-graphql CI server, without any change of interest such as version updates. Connecting to mainnet with this config.

[db-sync-node:Error:62741] [2020-11-12 06:02:09.50 UTC] runDBThread: Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]
CallStack (from HasCallStack):
  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation
[db-sync-node:Error:62736] [2020-11-12 06:02:09.50 UTC] ChainSyncWithBlocksPtcl: Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]
CallStack (from HasCallStack):
  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation
[db-sync-node.Subscription:Error:62732] [2020-11-12 06:02:09.50 UTC] [String "Application Exception: LocalAddress {getFilePath = \"/node-ipc/node.socket\"} Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]\nCallStack (from HasCallStack):\n  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation",String "SubscriptionTrace"]
[db-sync-node.ErrorPolicy:Error:4] [2020-11-12 06:02:09.50 UTC] [String "ErrorPolicyUnhandledApplicationException Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]\nCallStack (from HasCallStack):\n  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation",String "ErrorPolicyTrace",String "LocalAddress {getFilePath = \"/node-ipc/node.socket\"}"]
[db-sync-node.Handshake:Info:62759] [2020-11-12 06:02:17.54 UTC] [String "Send (ClientAgency TokPropose,MsgProposeVersions (fromList [(NodeToClientV_1,TInt 764824073),(NodeToClientV_2,TInt 764824073),(NodeToClientV_3,TInt 764824073)]))",String "LocalHandshakeTrace",String "ConnectionId {localAddress = LocalAddress {getFilePath = \"\"}, remoteAddress = LocalAddress {getFilePath = \"/ipc/node.socket\"}}"]
[db-sync-node.Handshake:Info:62759] [2020-11-12 06:02:17.54 UTC] [String "Recv (ServerAgency TokConfirm,MsgAcceptVersion NodeToClientV_3 (TInt 764824073))",String "LocalHandshakeTrace",String "ConnectionId {localAddress = LocalAddress {getFilePath = \"\"}, remoteAddress = LocalAddress {getFilePath = \"/ipc/node.socket\"}}"]
[db-sync-node:Info:62763] [2020-11-12 06:02:17.54 UTC] Starting chainSyncClient
[db-sync-node:Info:62763] [2020-11-12 06:02:17.55 UTC] Cardano.Db tip is at slot 13564796, block 4938498
[db-sync-node:Info:62768] [2020-11-12 06:02:17.55 UTC] Running DB thread
[db-sync-node:Info:62768] [2020-11-12 06:02:18.01 UTC] Rolling back to slot 13564796, hash 5d8ea0d4cf2d4f46cc91aa48e83c029691f836d7200e11e26402f9a2bcb25987
[db-sync-node:Info:62768] [2020-11-12 06:02:18.01 UTC] Deleting slots numbered: []
[db-sync-node:Error:62768] [2020-11-12 06:02:19.47 UTC] runDBThread: Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]
CallStack (from HasCallStack):
  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation
[db-sync-node:Error:62763] [2020-11-12 06:02:19.47 UTC] ChainSyncWithBlocksPtcl: Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]
CallStack (from HasCallStack):
  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation
[db-sync-node.Subscription:Error:62759] [2020-11-12 06:02:19.47 UTC] [String "Application Exception: LocalAddress {getFilePath = \"/node-ipc/node.socket\"} Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]\nCallStack (from HasCallStack):\n  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation",String "SubscriptionTrace"]
[db-sync-node.ErrorPolicy:Error:4] [2020-11-12 06:02:19.47 UTC] [String "ErrorPolicyUnhandledApplicationException Panic! applyHeaderTransition failed: [[NewEpochFailure (EpochFailure (NewPpFailure (UnexpectedDepositPot (Coin 831366000000) (Coin 831368000000))))]]\nCallStack (from HasCallStack):\n  error, called at src/Shelley/Spec/Ledger/API/Validation.hs:92:15 in shelley-spec-ledger-0.1.0.0-3QeazRqhkmeDSfJ73hDh1U:Shelley.Spec.Ledger.API.Validation",String "ErrorPolicyTrace",String "LocalAddress {getFilePath = \"/node-ipc/node.socket\"}"]

rhyslbw on 12 Nov 2020

~After chatting with Rhys on Slack, i suspect that in his case db-sync ran into problems over 1000 blocks before this error and what he is seeing is a result of it restarting and rotating the logs.~

The problem happens at epoch rollover.

erikd on 12 Nov 2020

The ones @rhyslbw and @mmahut caught both barfed on the same last applied block, hash 5d8ea0d4cf2d4f46cc91aa48e83c029691f836d7200e11e26402f9a2bcb25987. That is very unlikely to be a coincidence.

HIghly likely that block is the first block of a new epoch.

erikd on 12 Nov 2020

@rhyslbw what is the git hash of the db-sync version you are using?

The one @mmahut was using in #404 was commit https://github.com/input-output-hk/cardano-db-sync/commit/6187081a7ea66954c86094578bd37e01bca8aaec which is missing commit https://github.com/input-output-hk/cardano-db-sync/commit/afe68e08cf5f8b3b1b6690e411670908bc0f5942 which contains changes to the ledger-state config. This issue is about it dying in ledger state related code, but that change should not make a difference on mainnnet. The 6.0.0 tag is after the second commit.

erikd on 12 Nov 2020

This is a HUGE pain in the neck to debug without a fix for https://github.com/input-output-hk/cardano-db-sync/issues/256 .

erikd on 13 Nov 2020

@erikd I'm using the release tag commit 3e68f3011bb156b9b799ccf056f9a73281479f9c

rhyslbw on 14 Nov 2020

Did a LOT of work trying to recreate this issue, but it is not deterministic. I am currently running a version of this code that should better catch any errors (and abort immediately). I am hoping there is a chance of triggering this again on the next epoch boundary which happens in about 14 hours from now.

erikd on 16 Nov 2020

10 nodes, all of them crashed with this specific bug. The lstate's are:

65bce9e6463a324d612d24588afbdecc  13996555.lstate
77b5e894f8a22cb49605b9bfd474588a  13996568.lstate
12c4be3b0fac587d1b6485284e218404  13996582.lstate
f0b29f6768c836e7283f7033799ce146  13996626.lstate
ba72f63cf8185150c8120f3466756479  13996646.lstate
a2b45038665701084196a238b3beb329  13996669.lstate
7e8cccd8f0f1c3ac519ef7471a998ac1  13996713.lstate
ab304c279c8209e4b21a623b1a6dd80f  13996756.lstate

and using git rev 6187081a7ea66954c86094578bd37e01bca8aaec (which is a couple of commits behind the 6.0.0 tag).

mmahut on 16 Nov 2020

👍3

Current hypothesis is that ledger state gets corrupted at some point and that the corruption is only noticed at the epoch boundary.

erikd on 16 Nov 2020

Infinite loop NewEpochFailue

99d3e16a319a20ff689ca9582425ddae  13996555.lstate
8205deb9c2b3ad946a99bc7692d4434e  13996568.lstate
8eeb20d372cf5214db7c8287a052707b  13996582.lstate
7133fb72aa8194efa80e95c3fa4af1fb  13996626.lstate
f7199d4a131c6fd4649a76a51167275f  13996646.lstate
faa8d71771e8cc68703fa4f1f08dfce7  13996669.lstate
f6cbf62dad57439dc126f8b56061a863  13996713.lstate
504ea06cb925868c25c100d7d05d6afd  13996756.lstate

cardano-db-sync-extended 6.0.0 - linux-x86_64 - ghc-8.6
git revision 3e68f3011bb156b9b799ccf056f9a73281479f9c

dmitrystas on 16 Nov 2020

👍1

2 of 3 instances threw the NewEpochFailue error.
git revision 3e68f3011bb156b9b799ccf056f9a73281479f9c

0eb144b880dcb07c8347b560ea77db27  13996555.lstate
6ee65fc1f5d47fbb858e92770e109f0f  13996568.lstate
c526b055c731173bb7a94cbf3144855d  13996582.lstate
932f8a4807537c43332a4b9a91c0c4a7  13996626.lstate
95163e7b5351b04ae5909d221a4ee2e2  13996646.lstate
c584485911b8f246d01e37572b0f4175  13996669.lstate
449a7c5b2669288dfec995867507211a  13996713.lstate
ba8c8f7f1657727c826ca07be4f7d2e2  13996756.lstate

The snapshots of the instance not affected have been rolled out.

rhyslbw on 16 Nov 2020

👍1

The fact that the same ledger state file, eg 13996756.lstate has three different hashes is a little unexpected.

erikd on 17 Nov 2020

If the cause is a corrupted ledger state, probably this and #405 are the same issue

SebastienGllmt on 17 Nov 2020

@SebastienGllmt Yes, that is possible, but I have not even had a chance to look at #405 yet.

erikd on 17 Nov 2020

2 out of 4 instances thrown down
MD5 checksums of a failing instance:

189def79f03972649fdcdcd811def1bb  13996555.lstate
dc6865de6149fdf4879a4659bcf02ef0  13996568.lstate
85bf1965fdadee3ee42c10dfd32e0bdb  13996582.lstate
c452477105dc4041c17d718a32f12056  13996626.lstate
9eb0f1fd0a165a8eff3ec4835f370d6d  13996646.lstate
4d5180ba656234020a71f2d46f1d9d0e  13996669.lstate
521ae28570c1630a18bc721cc4707eaa  13996713.lstate
9a2485e192578c1d3c22059648fba79f  13996756.lstate

(the other failing instance has different hashes)

Healthy instance:

98d46070c972d7b4ec564e4053e29eda  14033709.lstate
5c2443fe558a928a86606136337f3648  14033754.lstate
6c180d350ba7becf0f02d698ac160397  14033800.lstate
d655e560b8a43e064b671266795d262c  14033812.lstate
99bfbf88ec497e7e40865c920f8e8e26  14033839.lstate
bec82d94fe348d843389853cd24a3e5b  14033845.lstate
14faba941a24f02b236d784af85f8d32  14033890.lstate
c96d18f7bdadebed8e3b723b4e6691fc  14033936.lstate

xdzurman on 17 Nov 2020

👍1

I tried to resync 10 different instances on commit bcd82d0a3eada57fdf7cc71670a46c9b3b80464f (as I need the metadata feature).

2 out of 10 had a corrupted version of the ledger state files (different from the rest). The correct sums are:

/var/lib/csyncdb/14040345.lstate | b1df6bdb2cf6f798d9baf83373a4698f
/var/lib/csyncdb/14040390.lstate | 5694a00b12b47052125178175289ba24
/var/lib/csyncdb/14040394.lstate | b45a5e4b82362def92225a3eec5d1afb
/var/lib/csyncdb/14040412.lstate | 8319589d1b66dffd97f5233c3dcfddd0
/var/lib/csyncdb/14040436.lstate | 9d0fec3f4693bd78f426c34c5aaa5d5d
/var/lib/csyncdb/14040462.lstate | 8afe16d592a57ed5ef79a27adf0803d9
/var/lib/csyncdb/14040481.lstate | bcc2648eb09b0d5503f6397569b33e67
/var/lib/csyncdb/14040527.lstate | 588d787363bfe02cf0fd34ac8f412dd4
/var/lib/csyncdb/14040553.lstate | e1f2f42a1ac49d2ec3a53351d1b267b5

I have also noticed an inconsistency in the files.14040345.lstate was missing on half of the instances, but these instances had 14040553.lstate instead.

mmahut on 17 Nov 2020

👍1

FYI, same issue again on the epoch 230 transition...

reliablestaking on 17 Nov 2020

The problem is:

There are two versions of a function to apply a block to a ledger state; a slow one that does full checking and fast one that does fewer checks.
Since db-sync is getting blocks that have already been validated by the node, it seemed sensible to use the fast version.
I had been told that the checks in the fast version included checking that the block's previous hash matches the value of the head hash in the ledger state, but this hash check was not being done.
The protocol occasionally allows more than one slot leader to mint a block for a specific slot, and the blocks they produce will have different hashes (ie the block includes data specific to the slot leader).
Since I am using the fast version and my code rolls back to a specified slot, it it possible for it to roll back to the correct slot number, but the wrong block (ie slot number correct, but wrong hash and therefore wrong block).
When it now rolls forward, the lack of hash checking means the problem is not detected until the start of the next epoch.

erikd on 17 Nov 2020

🎉3 👍2

@erikd Is it possible to make this a config toggle between full checking and fast checking? I'd prefer to running everything in "safe" mode and use extra resources to make sure it stays up.

CyberCyclone on 17 Nov 2020

@CyberCyclone Once the hash is checked, there is nothing else that can go wrong with probability greater than the chance of a 256 bit hash collision. The hash should have been checked. I thought it was being checked. Once it is checked, there is no reason to do more checking.

erikd on 17 Nov 2020

👍1

Awesome, great to hear! The way it was worded sounded like there was a lot more going on. But yeah, hash collisions aren't anything to worry about.

CyberCyclone on 17 Nov 2020

Since I am using the fast version and my code rolls back to a specified slot, it it possible for it to roll back to the correct slot number, but the wrong block (ie slot number correct, but wrong hash and therefore wrong block).

It should _not be possible for it to roll back to the correct slot number, but the wrong block_.

The chain sync instructs to roll back to a specified point on the chain (point being a slot+hash), but this point is guaranteed to exist on the consumer's chain. Yes it's very sensible to check, but if this check were to fail then that indicates a logic bug somewhere.

So I think this will need more investigation before we can call it fixed. Adding an assertion should detect the problem much more promptly at the point where it occurs, rather than much later at the epoch boundary. Adding an assertion is not itself a fix of course.

dcoutts on 18 Nov 2020

It should not be possible for it to roll back to the correct slot number, but the wrong block.

It is possible if the rollback only checks the slot number but not the hash.

erikd on 18 Nov 2020

The logging has now produced this:

[db-sync-node:Info:39] [2020-11-19 08:47:21.84 UTC] Rolling back to slot 8092720, 
        hash e1e78605937bb8cfc842d1ee7280b92fa9fce813c26fa66a88eaca74d7af9f05
[db-sync-node:Info:39] [2020-11-19 08:47:21.84 UTC] Deleting slots numbered: [8092760]
Ledger state hash mismatch. Ledger expects 6f1940937d806865a6e96b25a640deb8c1393852fd3d311dbd648e2bfa89056e
        but block provides e1e78605937bb8cfc842d1ee7280b92fa9fce813c26fa66a88eaca74d7af9f05.

which is a little odd. Restarting it results in:
```
[db-sync-node:Info:34] [2020-11-19 09:06:15.54 UTC] Database tip is at slot 8092720, block 389107
[db-sync-node:Info:39] [2020-11-19 09:06:15.54 UTC] Running DB thread
[db-sync-node:Info:42] [2020-11-19 09:06:15.55 UTC] getHistoryInterpreter: acquired
[db-sync-node:Info:39] [2020-11-19 09:06:15.55 UTC] Rolling back to slot 8092720,
hash e1e78605937bb8cfc842d1ee7280b92fa9fce813c26fa66a88eaca74d7af9f05
[db-sync-node:Info:39] [2020-11-19 09:06:15.56 UTC] Deleting slots numbered: []
````
Need to check the code for this.

erikd on 19 Nov 2020

I have a temprory work around fix for this. The work around comes from my work-in-progress debugging branch, but has not been full tested, QAed or released.

If anyone is running the 6.0.0 release and is worried about the epoch rollover taking place in ~12 hours, there is an erikd/tmp-fix-6.0.x branch (commit 3a6e7199c1f2) with the workaround. The workaround detects something going astray, panics, the execption is retried at a higher level and then db-sync continues.

There are no database changes relative to 6.0.0 so no resync is required.

However, running this version may detect an already corrupted ledger state (I am not even sure what that would look like) in which case a resync will be required.

erikd on 21 Nov 2020

😕1

After adding a bunch of debug code and then waiting for the problem to be triggered.

Turns out this issue is a race condition. From the logs:

[2020-11-21 08:27:09.90 UTC] insertShelleyBlock: epoch 230, slot 14380936, block 4978632,
    hash f984eead753a149efad752dd58471d0c53c3fcf973d281acf4fdcbc6fda799c7
[2020-11-21 08:27:34.22 UTC] insertShelleyBlock: epoch 230, slot 14380962, block 4978633,
    hash bfe35e62b322d397fa6c5080ccd8294c0d2eaca5695e604df59f27f82292227a
[2020-11-21 08:27:36.69 UTC] loadLedgerState: slot 14380962
    hash bfe35e62b322d397fa6c5080ccd8294c0d2eaca5695e604df59f27f82292227a
[2020-11-21 08:27:37.15 UTC] insertShelleyBlock: epoch 230, slot 14380964, block 4978634,
    hash 1ef4771244b95d35c59371521d19fc145646f89f28bf7a18c4f6c8d7485da2b3
[2020-11-21 08:27:40.01 UTC] Rolling back to slot 14380962,
    hash bfe35e62b322d397fa6c5080ccd8294c0d2eaca5695e604df59f27f82292227a
[2020-11-21 08:27:40.02 UTC] Deleting slots numbered: [14380964]
[2020-11-21 08:27:40.35 UTC] ChainSyncWithBlocksPtcl: FatalError {fatalErrorMessage = "Ledger state hash
    mismatch. Ledger head is slot 14380964 hash
    1ef4771244b95d35c59371521d19fc145646f89f28bf7a18c4f6c8d7485da2b3 but block previous hash is 
    bfe35e62b322d397fa6c5080ccd8294c0d2eaca5695e604df59f27f82292227a and block current
    hash is 136956bd1c6ce536e3c3bb0cef07b3e380441522317c88274f1455a7b11ca2d5."}
[2020-11-21 08:27:41.35 UTC] Starting chainSyncClient
[2020-11-21 08:27:41.36 UTC] Database tip is at slot 14380962, block 4978633
[2020-11-21 08:27:41.36 UTC] Running DB thread
[2020-11-21 08:27:41.54 UTC] Rolling back to slot 14380962,
    hash bfe35e62b322d397fa6c5080ccd8294c0d2eaca5695e604df59f27f82292227a
[2020-11-21 08:27:41.54 UTC] Deleting slots numbered: []
[2020-11-21 08:27:42.54 UTC] loadLedgerState: slot 14380962
    hash bfe35e62b322d397fa6c5080ccd8294c0d2eaca5695e604df59f27f82292227a

Basically what happens is:

Block arrives via chainsync and gets put in the queue.
Rollback command arrives and the rollback on ledger state is done immediately.
Block is read from the other end of the queue and applied to ledger state.
New block arrives, put on the queue and when it arrives at the other end the hashes do not match because an extra, wrong block has already been applied.

The fix is to move the code to rollback ledger state from the write end of the queue to the read end.

erikd on 22 Nov 2020

🎉3

Fixed on master in https://github.com/input-output-hk/cardano-db-sync/pull/413 .

There will also be 6.0.1 release fixing this.

erikd on 23 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Orphan blocks not removed from Block table

erikd · 10Comments

Relate rewards and withdrawal back to pool

erikd · 4Comments

DB lookup fail in insertTxIn: tx hash (erikd/shelley-32byte-address-hash)

cardanians · 15Comments

Add nullable slot_leader.pool_hash_id

rhyslbw · 4Comments

The rewards was distributed to non-registered stake address

dmitrystas · 28Comments