# Fixing inconsistent `ChangeSet`s in [`PR#1203`](https://github.com/bitcoindevkit/bdk/pull/1203) ## The problem `PR#1203` at commit [`94b9751841b935969bc6031f612cd10d5e872512` ](https://github.com/bitcoindevkit/bdk/pull/1203/commits/94b9751841b935969bc6031f612cd10d5e872512) has a bug where the `KeychainTxOutIndex` constructed from individual changesets may be different than what would be constructed from the aggregate of those `ChangeSet`s. This problem is shown in a test of commit [`0401c9b79f9a00b65770bc3562aa2d3e7a4b916e`](https://github.com/bitcoindevkit/bdk/pull/1428/commits/0401c9b79f9a00b65770bc3562aa2d3e7a4b916e). ```rust /// The index as of commit `94b9751841b935969bc6031f612cd10d5e872512`. pub struct KeychainTxOutIndex<K> { keychains_to_descriptor_ids: BTreeMap<K, DescriptorId>, descriptor_ids_to_keychain: BTreeMap<DescriptorId, K>, // ... other fields excluded ... } pub struct ChangeSet<K> { pub keychains_added: BTreeMap<K, Descriptor>, pub last_revealed: BTreeMap<DescriptorId, u32>, } ``` The `keychains_to_descriptor_ids` field of `KeychainTxOutIndex` and the `keychains_added` field of `ChangeSet` allows for multiple keychains (`K`) to be associated with the same descriptor. However, `descriptor_ids_to_keychain` (which attempts to be reverse index of `keychains_to_descriptor_ids`) does not. Therefore, if a descriptor is associated with multiple keychains, only one keychain will be returned if we do a reverse lookup with a given descriptor. This reverse lookup is used when returning spks/txouts/outpoints from the index (since it is helpful to return the associated keychain of this data). **Why would the behavior of this reverse lookup be inconsistent when we apply changesets in aggregate v.s. one by one?** The reason lies in the order in which `descriptor_ids_to_keychain` is updated with `(K, Descriptor)` pairs from `ChangeSet::keychains_added`. `descriptor_ids_to_keychain` is a map with `DescriptorId` as key, so later-seen descriptors will take precedence. For an aggregated changeset, `(K, Descriptor)` pairs will be introduced purely in order of the `Ord` impl of `K`. For applying individual changesets (of the aforementioned aggregate), the order will be different depending on both the `Ord` of `K` in individual changesets, but also the order of the changesets themselves. ## Constraints for coming up with a solution 1. Applying changesets of a specific order should have the same result as applying the aggregate of those changesets (appended in the same order when aggregating). 2. The changeset structure allows for different keychains to be associated with the same descriptor. So we must allow this behavior. 3. The changeset structure allows for replacing the descriptor for a given keychain. So we must allow this behavior. Constraint `1.` is a must have. Not having it leads to unpredictable behavior. Constraint `2.` can be avoided if we allow for *invalid changeset*s, i.e. changesets that associate multiple keychains with the same descriptor. However, that will be horrible. Constraint `3.` can be avoided if we only apply the first descriptor for a given keychain. If a changeset introduces a different descriptor in the future, the `Append::append` implementation just ignores it. However, this makes changesets harder to reason with. Therefore, we should design a solution that satisfies all three constraints. ## Solution 1. This is the solution presented in [`PR#1428`](https://github.com/bitcoindevkit/bdk/pull/1428). ```rust pub struct KeychainTxOutIndex<K> { keychains_to_descriptor_ids: BTreeMap<K, DescriptorId>, descriptor_ids_to_keychains: HashMap<DescriptorId, BTreeSet<K>>, // ... other fields excluded ... } ``` With this structure, `descriptor_ids_to_keychains` is directly an inverse index of `keychains_to_descriptor_ids` (and we maintain it as such).