Cardano-db-sync: Orphan blocks not removed from Block table

Created on 30 May 2020  ·  10Comments  ·  Source: input-output-hk/cardano-db-sync

The validate program reports:

All transactions for blocks in epoch 195 are present: Failed on block no 4224831: 
expected tx count of 2 but got 3

and a simple query:

select * from block where block_no = 4224831 ; 

results in two rows where only one was expected.

# select id, slot_no, block_no,previous,tx_count,time from block where 
    block_no = 4224831 ; 
   id    | slot_no | block_no | previous | tx_count |        time         
---------+---------+----------+----------+----------+---------------------
 4265744 | 4226980 |  4224831 |  4265742 |        2 | 2020-05-29 08:58:11
 4265743 | 4226979 |  4224831 |  4265742 |        3 | 2020-05-29 08:57:51

The block with the lower slot number was obviously orphaned, but the fact that no later block extended that chain was not detected and hence, the block not removed from the database.

Blocks that are orphaned like this should either be removed from the database, or avoided in the validate process.

bug priority high

Most helpful comment

I think the block should be removed on rollback, and we should use cascading deletes to delete everything that is logically within the deleted blocks.

There is no point trying to preserve blocks that have been rolled back. This is not a monitoring tool for monitoring block propagation within the network. It is only observing at one place, and as a node client, so it would be very unreliable when used in that role.

All 10 comments

The simplest solution to this is is to remove blocks which have been orphaned like this.

As an alternative to removing them, we could move them to an separate orphaned_blocks table but then we need to decide what we want to do with the txs attached to that block. My initial feeling is they should just be ignored/removed.

Having an orphaned_block table may be useful when we switch to Praos, but can be easily implemented even now when db-sync really only supports Byron.

My very first thought was also: clean the orphaned block out, but keep it somewhere.

It could become even an option for the node to store all detected forks including all blocks from the abandoned side arm in this table. Purging after x slots is simple. diagnostics and tracing of what's happened is much easier then.

The question about lost TXs is interesting. As a quick idea, without knowing if it's cryptographically possible and 100% safe:

If a (db-sync enabled) node detect TXs from orphaned blocks, he could try to reinsert them in his mempool. Not broadcast it further, but only process it if it's his turn to produce a block.
(broadcast shouldn't be necessary because many other pools detect the orphaned block with its TXs and reinsert it into their mempool from that source)

If the utxo owner meanwhile (until forks are resolved and orphaned blocks detected) submitted a second attempt, the previous TX becomes invalid. If instead the tx is still valid, it will automatically appear in a new valid block again.

Wallets need to be able to handle a situation then, where they might see their tx in a block, who then is orphaned, and some blocks later the same tx is placed in another, then hopefully valid block.

If a (db-sync enabled) node detect TXs from orphaned blocks, he could try to reinsert them in his mempool. Not broadcast it further, but only process it if it's his turn to produce a block.

Its important to note that db-sync is separate from the node and simply follows the node's chain.
Specifically, db-sync does not maintain its own copy of the mempool. In the case I detected, the transactions in the orphaned block were almost certainly included in later non-orphaned blocks, so db-sync should just drop them.

The wallet of course has different behavior from db-sync.

If the TXs are almost certainly processed by other block producers then ofc there is no need to "re-inject" them with some tool from db to node's mempool.
Question is if this is the case in all slot/height battle variants. If not next question is if it's useful (intended) to let one of the nodes who detect unprocessed TXs in an orphaned block, re-insert them into his mempool, based on the db-sync database

db-sync should really only be used to view what has happened in the past. The node itself will pull transactions out of orphaned blocks and put them back in its own mempool. db-sync is not intended to be part of that.

However maintaining a list of block orphaned within the last 100 or so slots may be useful for debugging.

I think the block should be removed on rollback, and we should use cascading deletes to delete everything that is logically within the deleted blocks.

There is no point trying to preserve blocks that have been rolled back. This is not a monitoring tool for monitoring block propagation within the network. It is only observing at one place, and as a node client, so it would be very unreliable when used in that role.

I agree we should keep db-sync as pure as possible a reflection of actual chain state.

However, it would greatly simplify my life if ALL blocks made it into db-sync, and then were deleted on on rollbacks. That way I can trigger on the postgres delete and do what I want with the data.

This is needed by other clients and Sam says it is critical for any db-sync release, including Testnet.

It seems like rollbacks aren't working as expected. We see multiple blocks in the DB with the same block height. Here's an example (on mainnet):

cexplorer=# SELECT * FROM block WHERE block_no=4224831;
id | hash | slot_no | block_no | previous | merkel_root | slot_leader | size | epoch_no | time | tx_count
---------+--------------------------------------------------------------------+---------+----------+----------+--------------------------------------------------------------------+-------------+------+----------+---------------------+----------
4225044 | \x941a71094c7b10243845a82e53dc45959d8fde5f6d87e463efe660aa9611b965 | 4226979 | 4224831 | 4225043 | \x95549f5fcfc370eb4b24c981a6053f909bdef6418ce896828c6be18fa31ab406 | 6 | 1731 | 195 | 2020-05-29 08:57:51 | 3
4225045 | \x275ee8b9ae441e6e32e1f7f36290e6a048722619d91195773a07b06682418209 | 4226980 | 4224831 | 4225043 | \x6eb04497431d77657a94b0eee70dae59e7403980d4fc9d06ba5295436b8f0f54 | 7 | 1418 | 195 | 2020-05-29 08:58:11 | 2
(2 rows)

Was this page helpful?
0 / 5 - 0 ratings