Shield proposal

--- robots: nofollow, noindex tags: shield --- # Shield proposal Having talked with several folks on the team regarding what we want to see out of Shield and Repo Health in general. This write up is an attempt to synthesize those points of view into a plan that strikes an acceptable compromise between all opinions. Consider this a starting point and if you are reading please give feedback! ## Credit Thank you all who put together various resources which were leveraged to bring together this doc: - [Shield PM triage](https://microsoft-my.sharepoint-df.com/:w:/r/personal/aneeshak_microsoft_com/_layouts/15/Doc.aspx?sourcedoc=%7B00144E1E-EE94-4F98-9458-C35D4990CAF7%7D&file=Fluent%20UI%20Shield%20PM%20Triage.docx&action=edit&mobileredirect=true) - Aneesha - [Shield wiki](https://uifabric.visualstudio.com/UI%20Fabric/_wiki/wikis/UI-Fabric.wiki/74/Workflow) - JD - [Fluent UI Issue triage](https://microsoft-my.sharepoint-df.com/:w:/p/aneeshak/EYp-9Yv_uypFk-skO55vQX0BR_WZBlnH_nw-aLdGmipM3Q?e=d2wl65) - Aneesha - [Shield cheatsheet](https://hackmd.io/wJwTino2T1i_v0r7RXEMCQ) - Paul - [Template responses for Shield](https://hackmd.io/IcudV52OS3ufaaMcOq2ktA) - Paul ## Table of contents: [TOC] ## Super high level purpose of shield You are here to talk to the customer or patner, understand their issue or question, help find a fix for their issue. Identify and get the issue to the point where either thier is a recomended fix or we know we won't fix it until later (a component refresh). Avoid just being a pass-through, if you find yourself just routing things we can write automation for that. 1. Solve the customer problem, and determine the path forward. **Investigate to turn new issues into fixable issues**. 3. Follow up on PRs from outside the team. Initial review yourself, but also **find the engineer who can take point** (nag them a little if needed, raise issue if you have to nag too much). 4. Help answer questions on stack overflow and Teams. **Try to answer yourself**, but also feel free to @ mention folks that you think can answer quickly. :::info From our 9/22 discussion: We cannot build all the things and we cannot fix all the things ~~- our team is not resourced to do so - so~~ it takes a community to make our product great What we fix and what we welcome the community to fix: 1. We fix the unfixable issues - large API pattern changes, rewriting components to support a desired a11y patterns, converving two react libraries into one 2. We fix critical a11y issues - supporting our 60 day SLA 3. We root cause issues and determine if there is an acceptable workaround or idenitfy the solution 4. For issues that are root caused and solution identified, we welcome to the community to contribute the fix if they need it within a desired timeframe - allowing them to unblock themselves - and are labeled as such 5. We reserve time ~~X number of days for~~ per month for batching up bug fixes which we will address based on: 1. Customer Impact 2. Priority 3. Amount per component ::: ## Goals of Repo health Why do we care about Repo health and Shield? Really it comes down to supporting customers, fixing issues, and improving the library in collaboration with the community. 1. First priority is to make sure we are effectively tackling issues our customers file. 2. Second is to encourage and support contribution from partners and external customers. A related topic to repo health and shield is PR reviews. We need to ensure PRs are moving through review efficently. Especially PRs submitted from outside the team. This is somewhat a means to an end to help us fix issues more efficently. For a discussion on PR review expectations see this: [Proposal: Pull Request review expectations](/42hfCnfXT96jDsziov9_Ew) ## Core Concepts Below are the core concepts or principles in this proposal for how we manage issues in our repo. #### Clear accountabilitiy If an issue has a `Needs: <something>` label then it should be clear who has the next point of follow (engineer, design, accessibility poc, prioritization, etc) #### Track high pri closely [react](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+-label%3A%22Fluent+UI+react-northstar%22+sort%3Acreated-asc+label%3A%22Priority+1%3A+ASAP%22) | [react-northstar]() | [react-next]() | [web-componets]() Track high priortity issues and fit them in, interleaved with regular work #### Batch issues for efficiency Group open issues by Component/Area and monitor to determine when we need to schedule specific time to fix the issues #### Periodic and planned component API/Beahvior refresh Delay significant breaking changes or net new features until we decide to make a pass on the component where we refresh its surface and bulk make multiple breaking changes (as part of a major release). This batching is distinct from the batched bug fixes described above #### Use Monitoring to ensure it all is running smoothly Use automated monitoring to keep an eye on specific areas such as turn around time, and # of issues per component. #### Don't carry a large inventory of issues Maintain a minimal backlog that we can reason about. Publish a roadmap so we can explicitly close or delay issues we won't or can't tackle right now. ## Types of issues It seems like to help frame the conversation and build a proposed process, we should first enumerate the types or categories of issues we tend to deal with. From here we can then break down a process that deals with each of these types. #### Non-breaking bug - `Type: Bug` Key Challenges: How do we make time to fix these? Shield doesn't have bandwidth nor perhaps the component expertise to fix many of these issues. - Code is supposed to do something, it doesn't, it's broken. Fixing it is unlikely to break customers - Examples: #### Breaking bug or feature - `Type: Breaking` Key Challenges: Unless we can hide the behavior change behind a toggle, these changes will result in breaking customers, and must be made in a major release. - Proposed change to behavior, DOM structure, API surface - Code works one way, customer proposes this is not the way it should work, and proposes a change. Fixing this is likely to break customers. - Examples: #### Additive feature - `Type: Feature` Key Challenges: These are often constrained by resourcing. We simply can't find the time to implement the feature. - Issue is a proposal to add new functionality that is purely additive - Good examples are new props - We can take a breaking and turn it into an additive feature by making a prop to turn it on #### Totally New Component - `Type: New Component` Key Challenges: Accepting new components beyond those already on the roadmap has maintainence cost that we can't take on right now. - In the future, the answer here is to host your own set of components and add them to the directory - There are a few possible states here - New Compoent is on our roadmap - Accept contribution - New Component is something customers have asked for - Consider contribution - New Component is totally new not on our radar - Close issue #### Accessibility - `Area: Accessibility` Key Challenges: These are often urgent (60 day SLA), difficult, and potentially breaking customers. - Bug in Accessibility behavior. We pretty much need to fix it, and find a way to mitigate the impact of the change on existing customers. - If the change for Accessibility could break normal behavior we can use an additional prop to make the behavior toggleable. - Examples: ## Triage flow, how issues come in The high level triage flow can be described as: 1. **New Issue** New Issue is filed, starts with `Needs: Triage` label 2. **Daily Triage** `Needs: Triage` issues are triaged daily and labeled appropriately. - An issue may need further follow up indicated by the addition of a `Needs: <something>` label. Issue gets assigned to appropriate point of contact for the `Needs: <something>` label. - An issue may be closed during triage - An issue may be routed to component owner (only if repro is well understood and we are ready to fix it) 3. **Follow up** If an issue contains a `Needs:` label it is the responsibility of the appropriate point of contact for that area which is also the Assignee (as described below) 4. **Maintain a minimal backlog** Finally an issue that we would accept to fix, think it is worth fixing, and is ready to be fixed comes to rest on our backlog as a normal issue without any `Needs:` label 5. **Monitor and fix** We will use monitoring of issue counts, particularly issues associated with components, to determine if it's time to stop and fix our backlog. Once we get the backlog under control we can also look to spend time to stay on top of issues by regularly making time in the schedule for bug fixes ### 1. New Issue When a new issue comes in, the bot will attach the `Needs: Triage` label to it. **If an issue needs a re-consideration by triage the label can be re-added.** The `Needs: Triage` label indicates that the issue should be triaged as part of the daily issue triage. See step #2 below ### 2. Daily Triage Each day PM triages all issues marked as `Needs: Triage` and moves them on to the next stage. Shield dev presence is optional for this triage, but recommended. - Expectations: - Zero `Needs: Triage` each day (measured by metric) - Repro is confirmed and issue is understood before handing off to component owner - PM Role: Replying to and labeling issues (label with `Needs:`, `Type:`, `Component:`) - Dev Role: Confirming repro before hand-off, understanding issues, following up on issues with `Needs: Investigation` label - If issue is ready to be fixed now, it should have no `Needs: ...` labels and typically be of `Type: Bug` - If an issue is worth following up on apply the appropriate `Needs: ...` label and ping the person to follow up on it - See: [Deep Dive on Needs label](#Deep-dive-on-Needs-labels) - If `Type: Breaking` or `Type: Feature`, associated with a component, and not urgent - Close for consideration as part of component refresh - Add this issue to epic issue for component - Close issues as appropriate - See: [When to close an issue](#When-to-close-an-issue) - If issue needs more investigation from current shield dev (perhaps through consulting with component owner) - Mark as `Needs: Investigation` and leave it assigned to the shield dev - Exit states: - Issue gets closed - Issue marked with new `Needs:` label with clear next point of contact for action - Issue is properly labeled, proritized, and ready to be fixed. `Needs:` label is removed and it joins the backlog ### 3. Follow up - We use the `Needs:` labels to track the follow up on the issue. A `Needs:` label indicates the issue is not ready to be fixed, and needs something to unblock it. Issues in this state are not ready to be fixed. - See section below on [Deep dive on `Needs:` labels](#Deep-dive-on-Needs-labels) ### 4. End state for issues - At the end once the issue has been triaged, repro'd, investigated, prioritized, etc. it is ready to be fixed. At this point the issue joins the ranks of all the other issues, there is no longer a `Needs:` tag on the issue. - This is the end state for the issue process and these issues are now ready to be fixed. - Ideally the issues should be in a state where someone with the appropriate expertise could pick up the issue and fix it. - Also the issue should be such that if someone does fix it, as long as the fix is valid, we would take the fix. We should not have issues in this final state that we would not take due to them breaking our customers, or disagreement on long term viability of the change. ### 5. Monitor and fix - A key part of this process is on going monitoring of repro health. We should build ontop of the existing repo health monitoring Mak does to make sure these methods are working. - The metrics we should monitor regularly are: - Issues with `Needs: Triage`. - Goal: bounce off zero daily - Issues with `Needs: ...`. - Goal: bounce off <25 regularly - Open issues on the backlog (no `Needs: ...` label) per component. - Goal: (not ever increasing) - Open PRs from outside the team. - Goal: Bounce off zero ## Deep dive on `Needs:` labels The way `Needs:` tags should work is that any issue should only have 1 `Needs:` tag. The tag's purpose is to communicate who has a immediate next step to move the issue forward. Main use of `Needs:` label today is as follows. This pretty much showcases where we need to make improvements to our process. - Without any `Needs:` label: - [300 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+-label%3A%22Needs%3A+Actionable+Feedback+%3Afemale_detective%3A%22+-label%3A%22Needs%3A+API+Breaking+Change%22+-label%3A%22Needs%3A+Attention+%3Awave%3A%22+-label%3A%22Needs%3A+Author+Feedback%22+-label%3A%22Needs%3A+Backlog+review%22+-label%3A%22Needs%3A+Behavior+Breaking+Change%22+-label%3A%22Needs%3A+Design+%F0%9F%8E%A8%22+-label%3A%22Needs+Dev+Input%22+-label%3A%22Needs%3A+Discussion+%F0%9F%99%8B%22+-label%3A%22Needs%3A+Prototyping%22+-label%3A%22Needs%3A+Triage+%3Amag%3A%22) - `Needs: Backlog Review`: - [268 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Backlog+review%22) - `Needs: Discussion`: - [98 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Discussion+%F0%9F%99%8B%22) - `Needs: Design`: - [37 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Design+%F0%9F%8E%A8%22+) ### Proposed `Needs:` issue labels going forward **Needs: Author feedback** Current: [2 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Author+Feedback%22) We have replied to the author and need their input to move forward. **Needs: Attention** Current: [9 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue++label%3A%22Needs%3A+Attention+%3Awave%3A%22+) - We are doing well keeping on top of this Issue is understood and prioritized to fix now. Assignees are responsible for following up on this issue. It is officially on their plate. **Needs: Design Input** Current: [37 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Design+%F0%9F%8E%A8%22+) - We don't have a ton, but we also don't have a good method for resolving these This issue needs input from our Design point of contact (Angela). Assign issue to proper POC **Needs: Accessibility Input** Current: None - New Tag This issue needs input from our Accessibility point of contact (or Accessibility Triage). Assign issue to proper POC **Needs: Investigation** Current: None - New Tag Expected follow up: Assignee **Needs: Backlog review** Current: [268 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Backlog+review%22) - We are not doing well reviewing these. Issue needs further consideration in weekly backlog review. Note: Please do not use this to avoid making hard decisions. It is inefficent to just punt everything to a weekly backlog triage meeting. `[jslone] I propose maybe we shouldn't have this label? It will become very easy over time to just file most issues as needs backlog review. Mostly just calling out this is a trap and dangerous.` `[mak] I agree here, this seems like a escape hatch we might want to remove.` **Needs: Actionable Feedback** Current: [1 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Actionable+Feedback+%3Afemale_detective%3A%22) ~~[jslone] Propose we remove this label and replace with Needs: Author feedback~~ `[mak] We could certainly do that, however, Accionable Feedback is applied when the author didn't fill in the template or has filed an issue that is not relevant to our repo at first glance. It has a more agressive closing time and we have rules for the bot to input a message so we don't have to spend time there.` `[jslone] Good point, if we actually use this functionality we can keep it.` @paulgildea Do you see this label getting used in daily triage? `[jslone] paul came back with a reply saying this is useful in triage. So, let's keep this` ### Removed `Needs:` issue labels **Needs: Discussion** Current: [98 Open](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Discussion+%F0%9F%99%8B%22) Proposal: Remove and replace with `Needs: Investigation` `[jslone] Propose we remove this and make discussion to determine the right path forward part of the Needs: Investigation and the assignee of the issue` `[jslone] I'll observe this is another place where issues go to die` `[mak] I agree on removing this one.` `[pagildea] I agree. It becomes another list we need to monitor.... poke folks on.` **Needs: Dev Input** and **Needs: Prototyping** Current: [9 Open - Dev Input](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs+Dev+Input%22+) --- [3 Open - Prototyping](https://github.com/microsoft/fluentui/issues?q=is%3Aopen+is%3Aissue+label%3A%22Needs%3A+Prototyping%22+) `[jslone] Propose we combine this with Needs: Attention` `[mak] I agree here as well` `[jslone] Paul made a good point that this would make more sense as Needs: Investigation` **Needs: API Breaking Change** and **Needs: Behavior Breaking Change** Current: only 1 issue with either of these labels Proposal: Replace both of these with the label `Type: Breaking` `[jslone] Propose we remove the "Needs:" prefix here and these become simply additional tags we can apply` `[mak] These should be renamed to the "Type: Breaking" described above right?` `[jslone] Yes! Thank you, I added that type after writing this section and didn't make it consistent. I've updated it now. ` ## When to close an issue Deciding when to close an issue is a complex calculation that generally requires judgment. A collection of reasons and templates to close an issue is captured here: [Template responses for Shield](/IcudV52OS3ufaaMcOq2ktA). Also here: [Fluent UI Issue Triage and Expectations.docx](https://microsoft-my.sharepoint-df.com/:w:/p/aneeshak/EYp-9Yv_uypFk-skO55vQX0BR_WZBlnH_nw-aLdGmipM3Q?e=hSczDv) Some classic reasons to close an issue: - Does not align with Roadmap - Issue is non-trivial and does not align with current or near future roadmap (and is not something we are looking for a contribution for) - In legacy area - Feature additions to components in maintence mode, or fixes in end of life browsers such as IE11 and Legacy Edge - Esoteric Accessibility issues - Accessibility issues that don’t repro on Narrator + Edge Chromium (or only repros in Scan Mode), component is following the ARIA spec, and error is not flagged in Accessibility Insights Tool. - Issue is a question (routed to StackOverflow) - Issue is a duplicate, by design, or does not repro ### Appendix - Scratch - Outputs - Needs: Attention - Needs: Author feedback - Needs: Backlog review - Needs: Design - Needs: Discussion - Needs: Dev Input - Needs: Actionable Feedback - Needs: API Breaking Change - Needs: Behavior Breaking Change - Needs: Prototyping ## Expectations of Shield and our own team `[aneesha]` (notes from mtg with Paul, Justin, Markus) **Objective: Engage with customers.** Outcome: Customers feel heard and feel included. Social coding. KR (quarterly): PM + Eng collaboration landing issues and routing properly (end state of Shield lifecycle reached). AOL Metric: Ratio of incoming to fixed of type. Total number of issues. Monitor: * % incoming issues closed each week. (are we keeping up?) * % needs triage should be 0. * Longer period of time: * Issues of type Bug should be driven to 0. (Everything else tie to Component Epic and close) * Median open requests per component. Types: * Bug (broken -> fix -> close) * Feature (ex: new prop request) * Put in an Epic Issue for each component -> close so it's out of backlog. Status on Epic Issue: not planned & communicate that it's closed but we'll look at in the future. If you need a workaround right now, we can help you come up with one. ([Component Epics](https://github.com/microsoft/fluentui/projects/32)) * Breaking request (bug or feature that will break the DOM etc) * add a prop to allow reporter to do it in a non-breaking way * New component proposals * Exception possible: high pri (label) partner requests * Shield Dev drives this work - if it requires a 2 week project etc then set that up.