This question may be a bit vague, but I'm unsure how to make it more precise.
While using the cluster sharding extension, you have to provide some sort of persistence journal so that the plugin can store its metadata (ShardRegionAllocated, etc...).
These metadata are used when new actors are instantiated / moved across nodes to recover from their frozen state.
Suppose that for any reason your journal becomes corrupted (loses one entry, duplicates an entry, whatever). This leads to pretty bad exceptions at the actor's startup (Persistence recovery failure), possibly terminating the whole region if not correctly handled.
What is the best way to manage this scenario? (I'm asking for ideas at any level of the stack, from the supervisor's policy to some sort of intervention directly on the journal). Thanks,