Polonius 2021-07-30 sprint ideas

--- tags: "2021-07-03 sprint" --- # Polonius 2021-07-30 sprint ideas ## High-level roadmap * First milestone: **feature complete** * Description: * [ ] Able to run rustc tests and gives the correct errors (the blessed expectations need to be verified) * [ ] Error messages can be terrible * [ ] Performance has to be good enough to run the test in reasonable time :) (ignoring move/init analysis, this is already the case) * Covers: * [ ] Move and initialization analysis (can be bypassed, since these errors are already emitted by rustc) * [x] Subset errors (aka, universal region errors) (modulo [PR #156](https://github.com/rust-lang/polonius/pull/156)) * [ ] Optimization: use Location Insensitive results to pare back the placeholders for which we do propagation * [x] Illegal access errors * [ ] Higher-ranked errors: * have polonius do something simple (e.g, [along these lines](https://ecstaticmorse.net/posts/quantifier-elimination/)) for now * [ ] Optimization * Expects as input: * Required origin relations from type-check * Control-flow graph from MIR * * Next milestone: ## Task list (and assignees) * [x] Review https://github.com/rust-lang/polonius/pull/156 -- nikomatsakis * [ ] Review https://github.com/rust-lang/polonius/pull/157 -- nikomatsakis * [x] Work with Dylan MacKenzie to get a plan around HRTB -- nikomatsakis * [x] Refactor parser to not use lalrpop -- domenicquirl * [ ] Begin work on a standalone visualization tool for Polonius fact generation -- amandas * [ ] continue work on integrating the location-insensitive results with the location-sensitive analyses -- lqd ## Preplanning here's a sample of what's been on my mind (some for the last 3 years) that I thought could be interesting tasks and discussion points for the next sprint: - catch up - update on john harrison discussions: even if it's not yet at the polonius topic and only discussed chalk, we depend on chalk so it's of interest - plan for higher ranked subtyping and chalk interaction: this blocks us on chalk, can we do something about it ? can we do something similar in spirit to what was done to the trait solver to support GAT without waiting on chalk: local changes that sidestep the issue. If this is more about the datalog implementation than the model itself, we can talk about moving off of datalog. - make a rustc branch to update fact generation to current polonius master, so that people can work -> https://github.com/lqd/rust/tree/polonius-master - reviewing and landing the subset errors PR - reviewing the already landed docs PR - talking about the work dylan has done on docs/book - talking about the WIP PR removing the materialization of placeholders' liveness at all points, and possible datafrog improvements to make it less painful - removing the Locations::All hack: an alternative to remy's "outlives_everywhere" relation to fix the fact generation OOMs, and fixing a big efficiency problem since that's >95% of the 860K subsets in clap (33k are regular ones; unknown if these stats are representative or not) especially in the datalog model like all "things are true at each point" are (niko has already started on this) - model: reachability/odisseus. loans are not fundamental IMO, loans and issuing/introducing origins are bijective, origins are not really sets of loans (as we saw with the equality variant) (already in progress) - model: SSA -> arielb's, dylan's explorations - model explorations: equality (tease out the interesting things buried among things that didn't work out in the equality variant, esp to fix the current "polonius imprecision") - cfg compression: removing the need for 2 points per MIR statement (I guess these exist because of datalog constraints on the needed ordering) and check if it's actually needed in to have both in all analyses and if we can avoid that easily (move/init, liveness) - cfg compression (intra-block): limiting intra-block edges to where they matter (old PR experiment) and removing the others - cfg compression (inter-block): computing subsets/loans on cfg blocks/SCCs (thoughts and experiments have been done by remy and dylan) - subset graph compression: computing subsets/loans over subset SCCs - making loc ins a bunch of reachability queries over the cfg and subset condensation graphs (already in progress) - move/init analysis: - debugging move/init analysis (possibly beginner friendly) - completing move/init analysis, partial moves and the likes - make move errors early return in `compute` (beginner friendly) - understanding the current limitations of move/init analysis, making rustc tests pass with it, validating it - maybe bypass move/init + liveness ? going directly to loan analysis (possibly resurrecting some of the initial fact generation code) (possibly beginner friendly) - add move/init facts to test parser (beginner friendly ?) - test parser: move away from lalrpop (beginner friendly) - improve tests and validation: more tests, more features in the test framework, refactor tests (/tests, modules per feature/analysis), make it possible to add tests and run them without compiling using text files with the test facts format (some of this is beginner friendly) (inspiration: the NLL tests from https://github.com/nikomatsakis/borrowck/tree/master/test, and this zulip topic https://rust-lang.zulipchat.com/#narrow/stream/186049-t-compiler.2Fwg-polonius/topic/unit.20testing) - benchmark and stats from amanda -> repo: https://github.com/amandasystems/msc-polonius-fact-study - make amanda's thesis pdf available again: https://rust-lang.zulipchat.com/user_uploads/4715/Lng_BM1wQiSsZVZlLnpNRQ8R/Thesis.pdf - enriching the data from rustc to polonius: similar effort to the shared type library, we're missing a lot of info (SCCs, CFG blocks, dominator info, etc) that is already computed - working on sharing info between LocIns and Naive (for testing purposes), LocIns and Opt in the Hybrid variant (already in-progress; also niko's comment in GH issues) - beginner friendly: fix typos in rules, check that they match the hackmd, number the rules so that they're easier to refer to - beginner friendly: update the "state of ui tests" to the most recent version of polonius and ui test suite (and also for the PR about subset errors) ---------- update 1: - polonius debugging / visualization: either in polonius, or on rustc à la -Zdump-mir=polonius including some facts in the mir dump - more stats and logging/querying of provenance information in datafrog (*some* hacky work has been done about fact logging) - talk about Vytautas' https://github.com/rust-lang/rust/pull/86977 and in general Polonius users like prusti: what kind of stability do we want to maintain for users who rely on polonius implementation details like the facts structure and output. Also in general, should people be using polonius like this, or in other formal methods work (as what somewhat advertised in the recent verification workshop), or the visualization work in another thread: with its known bugs and imprecision, and incompleteness, could we quickly discuss these kinds of things to ensure they're not being built on faulty foundations ? - make polonius take ownership of the facts to stop cloning them (and other assorted fact changes, like changing ordering when the rules expect a certain key first) (beginner friendly) ---------- update 2: - model discussion about features we've at least talked about: interaction with variance (there's probably nothing to be done on our side here), or with member constraints (which are more directly related) - the rustc UI tests seem to show differences with NLL output for some 'static related errors, closures, or some implementations dealing with higher-ranked things: review the existing blessed test expectations in these areas, and try to understand/reduce them to see if the differences are in the rules, the output data, the input data, or in diagnostics (somewhat beginner friendly). ---------- update 3: - this one is for dylan: van Emde Boas layout for static/input relations in datafrog :joy: - some PRs have been opened recently in datafrog - try using soufflé (likely beginner friendly) dylan and rémy have souflé .dl files for the analyses IIRC (or at least a way to have them but could require dumping some of the liveness data from that part of the process) --- Amanda's thesis **6.5** **6.5.1** Illegal Subset Relations Polonius currently does not verify that a subset relationship it finds between prove- nance variables is actually valid in itself. For example, this unsound code would not generate an error in today’s Polonius: ```rust fn pick_one<'x, 'y>(x: &'x [u32], y: &'y [u32]) -> &'x u32 { &y[0] } ``` lqd: this has since been implemented as part of the illegal subset relationship errors ("subset errors" for short; universal region errors in rustc parlance) **6.5.2** Analysis of Higher Kinds ```rust for<'a, 'b> 'a: 'b ``` **6.5.3** Addressing a Provenance Variable Imprecision Bug (I wasn't aware of this) ```rust let mut z: u32; let mut x: &'x u32; let mut y: &'y u32; if something { y = x; // creates `'x subset-of 'y`. } if something { x = &z; // creates {L0} in 'x constraint. // // at this point, we have // `'x subset-of 'y` and `{L0} in `'x`, // so we also have `{L0} in 'y` (wrong). drop(x); } z += 1; // Polonius: false positive error drop(y); ``` ---- dylan's comments: As you could probably guess from https://rust-lang.zulipchat.com/#narrow/stream/186049-t-compiler.2Fwg-polonius/topic/Quantifier.20elimination.20for.20HRTBs, I'm most interested in reaching feature-parity with the in-tree borrow checker. IIUC, the biggest hurdle is HRTBs, but maybe that's just solved by chalk (though I'd like to understand exactly how), and we'd be better off thinking about other things. For example: > - the rustc UI tests seem to show differences with NLL output for some 'static related errors, closures, or some implementations dealing with higher-ranked things: review the existing blessed test expectations in these areas, and try to understand/reduce them to see if the differences are in the rules, the output data, the input data, or in diagnostics (somewhat beginner friendly). Besides that, here's some ideas from @_**lqd**'s list (thanks btw!) that I have thoughts about: - Handle placeholder liveness more efficiently: - @_**lqd** has basically solved this, it just needs sign-off from Niko. - I do think that having a datalog frontend that allows for succinct disjunctive rules would greatly benefit this type of stuff going forward, but a bespoke datalog compiler is a pretty big undertaking. My impression when I first got into Polonius was that it would be quite nice to have an interface to Souffle for faster prototyping. I'm interested in Niko's thoughts on this, because they switched away from Souffle relatively early in the development process. Perhaps it's not as nice as I'm making it out to be? - Removing `Location::All`. - I haven't looked into this more than superficially. Once again, it seems like we already know the correct approach (see #50938), but we need to think through its consequences and make sure they're acceptable. - SSA / region equality - I was exploring an alternative execution model for Polonius where the subset relationships between regions (now origins) were fixed, and instead we created a brand new region for a given variable when its subset relationship changed in the CFG. That new region would be tied to the previous region for the variable based on whether it was live. (There's a few more details, but I'm still loading all this back into cache) - This was inpsired by SSA, but the transformation occurs on regions rather than variables. - This handles equality constraints between regions quite nicely, since they no longer change across the CFG. - However, I couldn't find a way to handle killed loans elegantly, and besides this is just an optimization, AFAICT the optimized variant is fast enough already. I don't think this is high priority. - Move/init analysis - This isn't a top priority for me RN, since it's trying to replicate the existing analyses in `rustc` rather than breaking new ground. However, I think this will change once librarification is further along, because it will be important for analysis/verification tools built on top of Polonius. - As I understand it, datalog version is quite slow at the moment because the set of init/uninit facts is dense. I've looked at a few ways to mitigate this (all of them mentioned by @_**lqd** already). We could ignore uninteresting points in the CFG (this may already be implemented?), incorporate basic blocks into the datalog rules to avoid explicitly computing intrablock init/uninit facts, or extend Datalog with an alternate way of storing facts (sets of tuples are not ideal for dense relations, Souffle has a bespoke data-structure for this). - Hoping to talk with @**Amanda Stjerna** about some of this, since I suspect they have more insight than I do. Mostly, I'm interested in Polonius as a way to implement powerful, lifetime-aware alias/pointer analyses for use in optimization and formal verification. I assume that making Polonius a viable candidate for the actual upstream borrow-checker is an important part of this, but maybe we could be doing more to help the Prusti folks (and others working on formal verification) in the interim?