<- Last Update

This week was spent implementing the original MIP solver-based algorithm for the Maximum Coverage Problem. Before jumping into the implementation of the new approach, I'll explain how the currently implemented Greedy Algorithm works.

The Greedy Implementation

The maximum_cover accepts an iterator over a list of objects which implement the operation_pool::MaxCover trait, which is defined as follows:

/// Trait for types that we can compute a maximum cover for.
///
/// Terminology:
/// * `item`: something that implements this trait
/// * `element`: something contained in a set, and covered by the covering set of an item
/// * `object`: something extracted from an item in order to comprise a solution
/// See: https://en.wikipedia.org/wiki/Maximum_coverage_problem
pub trait MaxCover: Clone {
    /// The result type, of which we would eventually like a collection of maximal quality.
    type Object: Clone;
    /// The intermediate object type, which can be converted to `Object`.
    type Intermediate: Clone;
    /// The type used to represent sets.
    type Set: Clone;

    /// Extract the intermediate object.
    fn intermediate(&self) -> &Self::Intermediate;

    /// Convert the borrowed intermediate object to an owned object for the solution.
    fn convert_to_object(intermediate: &Self::Intermediate) -> Self::Object;

    /// Get the set of elements covered.
    fn covering_set(&self) -> &Self::Set;
    /// Update the set of items covered, for the inclusion of some object in the solution.
    fn update_covering_set(&mut self, max_obj: &Self::Intermediate, max_set: &Self::Set);
    /// The quality of this item's covering set, usually its cardinality.
    fn score(&self) -> usize;
}

To calculate the maximum cover for Attestations, the trait is implemented for AttMaxCover defined as:

pub struct AttMaxCover<'a, T: EthSpec> {
    /// Underlying attestation.
    pub att: AttestationRef<'a, T>,
    /// Mapping of validator indices and their rewards.
    pub fresh_validators_rewards: HashMap<u64, u64>,
}

The Algorithm

For each aggregation

A, for each validator
V
in that aggregation, we check how many rewards can be earned by each validator. This is represented by the HashMap fresh_validator_rewards, which is unique for each
A
.

The score of

A is defined as the sum of the rewards in its fresh_validator_rewards.
Note that a validator will receive the reward for only one attestation, therefore if it is a part of multiple attestations, it will be counted only once.

This also means that the score of two different attestations can only be added directly if there are no common attestors. If an attestor is common, it needs to be removed from one of the attestations before adding its score.

Therefore, the algorithm on a high level is:

For i in 1..128:
	1. Find attestation A with the max score and add it to the result
	2. Remove this attestation from the input
	3. For all other attestations, remove the reward for all other attesters in A

The MIP Approach

The general mechanism of using a MIP solver is as follows:

  1. Define a set of linear variables that encodes a "decision". This decision could be selection of a particular element, selection of a set of elements, etc.
  2. Define the objective function in terms of the inputs and the decision variables. Then define the desired outcome, for example, whether to maximise or minimise the function.
  3. Define the constraints that apply to the decision variables under which the objective function needs to be optimised.
  4. Run the MIP solver and profit.

For the problem of computing the max cover over a set of aggregated attestations, the representation is as follows:

  1. Pre-processing:
    1. Compute a set of unique attesters by taking the union of all attesters in the individual aggregates, then re-assign them indexes from 1..N
  2. Decision Variables:
    1. xi{0,1}
      whether to select the i-th aggregated attestation.
    2. yi{0,1}
      whether to select the i-th validator.
  3. Objective Function (to maximise):
    1. Sum of the rewards earned by all unique attesters in the solution.
  4. Constraints:
    1. xi128
      (can only select atmost 128 aggregates).
    2. A validator can be considered "covered" iff one of the aggregates containing it is included in the solution. (The notation is quite confusing for this one so just spell it out in plain English).

Implementation

Notice that to define the objective function, we need access to the rewards for each individual attester, however MaxCover's score() returns the rewards of the total aggregation. After spending a lot of time thinking about the best way to add this capability, I decided to introduce a new trait called MipMaxCover and its implementation for the AttMaxCover struct as follows:

pub trait MipMaxCover<'a> {
    type Element: Clone + Hash + Ord;

    fn covering_set(&'a self) -> &'a Vec<Self::Element>;

    fn element_weight(&self, element: &Self::Element) -> Option<f64>;
}

Any struct implementing this trait should be solvable by the MIP-based approach to the Maximum Coverage Problem. I completed the implementation of a generic MIP Solver for the MCWS, tested it against the existing unit tests for the greedy solution and confirmed that the optimal solution (by coverage) is being produced by this implementation. A WIP Pull Request with the implementation can be found here.

Issues with the current implementation

While the implementation produces a solution that has optimal coverage, I noticed that for some inputs the solution would include redundant sets. Putting this another way, the solution contained sets which if removed, would not affect the coverage of the solution.
Looking at the objective function, this makes sense because it does not capture the idea of optimising for the smallest number of sets in any way, it simply focuses on maximising the coverage as long as the total number of sets

128.
I was curious if this would be a concern, so I discussed this issue with @paulhauner, who confirmed that this would be an issue since we would be wasting network bandwidth trying to send redundant attestations, and also these would be a part of the blockchain permanently.

As of now, I don't have a concrete way of solving this issue. In an ideal scenario, there should be a way to include this as a constraint in the objective function. I'm not certain if it's possible to specify multiple constraints (with a preference for the primary constraint and a secondary constraint), or if there exists a single mathematical function which can account for both constraints simultaneously without interfering with the other.
A naive solution which comes to my mind would be the objective function

f=k1wk2xi where
k1
and
k2
are constants selected to maximise the influence of the sum of weights, but prefer the solution with least aggregates if the sum of weights (coverage) is same.

Random Ideas for Optimisation

While I'm not at a stage where I can start thinking about how to optimise the processing time, these are some thoughts which came to my mind and I want to mention them here just as a record for the future:

  1. Parallel computation of sub-problems: The papers from Satalia mention a procedure to break down the problem into smaller independent sub-problems and a way to combine their solution to solve the main problem. While this in itself should result in some improvements, there could be a way to compute the solution of each problem in parallel (each on a different process?) which could result in performance gains.
  2. Using a different MIP Solver: The good_lp crate I'm using is a wrapper supporting a variety of MIP solvers, it could be a good idea to compare the performance of each. Based on the description here, the SCIP solver seems to be a promising alternative.
  3. Vectorise the calculations: Vectorizing calculations using stuff like SIMD tends to result in huge performance gains. I don't have a lot of experience with this and am not sure if this is going to be useful here, but it's still something worth thinking about in my opinion.

Plan for next week

  1. Spend some time figuring out how to exclude redundant sets
    1. Modify the objective function to account for this, or
    2. Introduce a post-processing step to filter out redundant modules
  2. Implement the decomposition approach.