# Garbage collection for interned values Tracked at https://github.com/salsa-rs/salsa/issues/597 ## Goal Able to "cap" the number of interned entries and throw out old ones when there are too many. Limits: * Can never throw out an interned entry created in the current revision. ## Overview the plan * When a query `Q` *interns* (creating a query, not reading it) a new value `V` * Hash the input and find the existing slot * If no existing slot, allocate a new slot and record the current revision is recorded in `I` * `Q` records an input edge to `I` * Interned value `I` needs to store * id * revision when value was interned * When a query `Q` *reads* from `I`: * Check the revision matches and then return the data ## Data Given ```rust #[salsa::interned] struct Type<'db> { ... } ``` Today we have: ```rust struct Type<'db>(salsa::Id); ``` but we will need ```rust struct Type<'db>(salsa::Id, Revision); ``` the data for an interned slot will look like this ```rust struct InternedValue<V> { value: V interning_revision: Option<Revision>, accessed_revision: Cell<Revision>, } ``` When you read from `Type<'db>`, you index to find the `InternedValue` and you assert that the revision is equal to what you expect. This assertion should never fail unless people are leaking data outside of salsa's system or have non-determinstic function. Maybe we can limit this to debug builds. ## What we want... The ability to "drop" some arbitary interned value (that was not read in this revision yet) and keep going. This will set the `interning_revision` of `I_X` to `None` (to start). Suppose in Revision R0... * In function F, intern "X" to I_X * In function G, we read from I_X to get "X" In revision R1, we drop I_X and call `G`: * How does `G` get access to `I_X`? * We can't have passed it directly, the `'db` lifetime will prevent us from holding the ref across revisions * Must have been read from the field of a tracked struct, for example * there will be some dependency on F * When we validate F: * we see a read of I_X * because the `interning_revision` is `None`, it will be considered to have changed in `R1` * We have to re-execute F: * which we re-intern "X" and get `I_X1` ... etc. How do you decide what to drop? * LRU * when you access the interned data, you want to promote it in the LRU list * this needs to be very cheap therefore * maybe do LRU-k? https://www.cs.cmu.edu/~natassa/courses/15-721/papers/p297-o_neil.pdf OR: * we are tracking the revisions when things are accessed... so look for something with an old revision? ## Mentoring instructions See tracking issue https://github.com/salsa-rs/salsa/issues/597