# Fairlearn Community Call Meeting Notes
## 2022-06-02
- Discussed issues with CircleCI
- Miro raises the need to have proposals for API additions to frontload discussion of enhancements before working on them
- May want to use an .rst doc for a proposal for new API elements
- Based on the issues from the ranking PR... the research paper
## 2022-05-19
- Newer community members
- lqdev joining from .Net framework, thinking about how the two communities might benefit from working with each other
- Build integration points and components in .Net to support Fairlearn integration
- Are there elements that could be integrated? e.g., MetricFrame data structure
- But many ML.NET users don't want Python dependencies in their pipeline
- Alice wants to start working on learning resources aligned with her internship research
- Miro
- Realization that new technical PRs can sprawl a bit, and may need more planning for conceptual and technical
- Manojit
- Meeting with OpenTeams and Quansight Labs for open source talent network - to find e.g., someone who knows both Sphinx and front-end etc
## 2022-05-12
- Finishing documentation of low-hanging fruit for exponentiated gradient
- Recent papers on fair ranking reconceptualized older metrics, so may be worthwhile to pause and think more about proportional exposure, etc
- Should probably reflect more to make it future proof first
- For milestone release
- Adversarial debiasing
- Model comparison plot
- Construct validity is written more for social scientists than CS...
- Conversely, our fairness metrics are written more for CS / technical audiences than social science oriented readers
- Structure of documentation
## 2022-05-05
- PyCon follow-up
- USDS is interested at measuring interventions to support the "bottom 40%"
- Kevin presenting at the meeting
- Adrin needs feedback on PRs from Miro and Richard
- CI - Richard and Roman can give feedback on
- Miro will look at Lint, provide an example of ranking metrics from Asia and Fernando's ranking paper, and... something about Matplotlib?
- Ranking may not make it into this release
- Presentation from Kevin Kho on using [Fugue](https://fugue.readthedocs.io/en/latest/introduction.html) for PySpark dataframe
- For handling larger datasets with parallel / distributed processing across clusters
- Kevin will open an issue pointing to his notebook to discuss what kinds of examples would be helpful here, e.g.,
- communicating the value of Spark
- how much lift there would be for Fairlearn users to integrate Spark into their pipeline
- Miro - proposals for faculty and students to contribute, specifically around technical/algorithmic research contributions
- Template for proposals from Phil for other Python open source projects
## 2022-04-21
- OpenML changed the website to make it harder to upload datasets
- Now that that's up, Rens can finish the Correlation Remover PR [#1029](https://github.com/fairlearn/fairlearn/pull/1029)
- Ranking metrics
- Discussion about the names (ranking exposure and ranking utility)
- Other PRs for milestone:
- Need to follow up with Sean and Bram for the remaining PRs (997, 973, 947)
- Adrin will start working on CircleCI docbuild with Richard
- May re-evaluate [sphinx multi-version](https://github.com/Holzhaus/sphinx-multiversion), Richard will update and send out
## 2022-04-14
- Issues with docbuild from CircleCI side (haven't checked Fairlearn build in ADO)
- GitHub actions has an option to publish to GitHub pages
- Nightly builds are green, other ML packages are failing
- Issues with final step to push website build to GitHub pages
- CircleCI builds are failing since the update, so we can't check the webpage in CircleCI, only when we compile it on our end
- V.8 milestone release
- Ranking metrics may be a can of worms
- Scikit learn for information retrieval
- May want to split metrics from plotting (i.e., more well-defined/scoped from more experimental)
- Some PRs may not make it into the release (adversarial debiasing and ranking metrics)
- Hospital re-admissions dataset uploaded to OpenML (but not yet appearing on the website)
- How do we help scope issues in advance of contributors writing PRs
- e.g., the ACS Public Coverage task was not well-defined, so it's not clear that it'll get merged, even after all the effort was put in
- How do we prevent this from happening in advance?
- May want new tags for issues (e.g., experimental; exploratory; needs scoping, etc)?
## 2022-03-31
- Discussed examples dataset discussion (Miro). Conclusions:
- Open issue for adding the hospital re-admissions dataset to OpenML (including proper referencing for license)
- Open issue for adding hospital re-admissions dataset to fairlearn.datasets (including documentation)
- Which PRs should be included in the release: go through all PRs and distribute to maintainers for reviewing.
- Discuss the options of going with .py or allowing markdown instead of .rst to deal with frictions of preparing notebooks in Jupyter that would be included on Fairlearn.
- SciPy tutorial was easier in colab than in .py. Simple route for contributors to get this done.
- Sphinx does not handle mdx (instead of rst).
- JupyText for converting .ipynb to .py? Some of it can even do with bash. Dealing with header macros is tricky & converting code sections from markdown to .rst.
- Suggestion: PR is in the form of a .py (in markdown style) and then manually adjust to .rst format (work can be divided)
-
## 2022-03-24
- For milestone, should we just set a date and release?
- Are there some PRs that are very close to being ready and we can push an extra week?
- Manojit working on converting PyData Global tutorial
- Sphinx gallery to .py, but not vice versa
- Need to change headers
- Sphinx can support Markdown, but Markdown cells don't
- What is the workflow from going from a notebook to a .py version of a notebook?
- Code and comments - processing comments as .rst
- The tool does everything except comments, which are Markdown, not .rst
- What tool?
- JupyText
- pandoc?
- Ideally, could point contributors (with an existing .ipynb) to a tool, with some pointers to specific things to change
- But headers are hard, back-quotes, etc
- Should create an issue to raise this as a larger discussion
- What is the workflow for converting .ipynb to .rst?
- Examples dataset discussion (Miro)
- Should we just swap out the ACS Income dataset for UCI Adult dataset?
- Or use hospital re-admissions dataset instead?
- Not clear that ACS Income is better, since the task is so artificial
- If we use the hospital admissions dataset, we'd need to use the pre-processed version
- So should we re-upload this pre-processed dataset to OpenML?
- Then, we'd want to use this in the existing documentation (e.g., quickstart, user guide)
- Point users to the tutorial
- And also use this as the primary use case and example for functionality going forward
## 2022-03-17
- Requesting feedback on the EY notebook - need to be clear that this is a semi-synthetic dataset, with synthetic feature
- Question about using JupyterBooks for example notebooks - may be a good compromise between .rst and .ipynb?
- For now, Manojit will convert it to .py
## 2022-03-10
- Manojit has an open PR for the EY Notebook and wants feedback
- Richard just wanted to flag the open PRs for the milestone
- OpenML
- Is this from OpenML to us or vice versa? Or both?
- Question of scope - and question of top-down development of datasets that have fairness issues
- Ask OpenML to have a field to add link to known fairness issues or Fairlearn documentation (or other?)
- Or open an issue about this specifically
## 2022-03-03
- 2 PRs closer to next milestone - Richard and Miro started going through the backlog of PRs
- MetricFrame
- Conversation with Harsha - about how to do uncertainty quantification, ready for a PR
- Paper discussion of Watkins et al., 2021, led by Manojit, with Elizabeth, Mike, and Jiahao who authored the paper
## 2022-02-24
- Getting ready for the milestone release (end of March), some PRs are still in draft form
- CorrelationRemover doc needs review
- ACS dataset docs need review
- Thresholder and RejectOptionClassifier (907) is >2000 lines of code, and needs review, but may take substantial time
- Waiting for a reviewer
- MetricFrame performance - should be ready?
- Python 3.10 merge failed, but should be easy
- Ranking metrics - needs some attention, but contributor said he'd get to it last week
- Adversarial debiasing - needs an update
- Model comparison plot - needs an update
- Error bars - waiting on Miro
- Cost-sensitive classification -
- Let's add explanatory materials (e.g., talks, blog posts) to resources
- Parity AI report
- Praised Fairlearn for not fixating on the 4/5ths rule, but people expect that from fairness toolkits, like AIF360
- So we should add some sort of caveat about these thresholds to our documentation (not just that we don't support it, but why not and maybe what to do otherwise)
## 2022-02-17
- Discussed potential funding for a technical writer from Google's [Season of Docs](https://discord.com/channels/840099830160031744/872866935610703952/943902306917822535)
- We're already scoping and prioritizing documentation for the learning resources anyway, so this could be a nice way to get support for that effort
- They need a proposal by end of March
- Not sure how we feel about Google sponsorship though... (the model is that Google provides $5-15,000 to hire a technical writer (not necessarily from Google))
- **Action items:**
- Reach out to maintainers and advisory board about the question of Google funding
- Scope and prioritize documentation
- Hilde opened new issues for enhancements from AIF360
- May have some student interns to work on them
- We're running into bottlenecks from not having enough technical ML / stats reviewers
- Should we reach out to faculty at universities? Their PhD students?
- How do we support contributors moving from writing PRs to reviewing?
- What expertise do people need - to implement work from a paper into code that actually works
- **Next steps:** Plan how to support engagement before we have the FAccT meetup (w/academics, practitioners, etc)
- **Action item:** Hilde will reach out to others at Eindhoven to see:
- What would the ideal format for that engagement with academics be?
- What would they want to get out of this?
- **Let's revisit this discussion**
- What other forms of expertise do we want more of?
## 2022-02-10
- Next release
- Set a milestone for release
- Release manager (Adrin), pings people for review and sets a deadline for review and merge
- Aiming for 3/31 for v0.8, to get it in before PyCon in 4/27
- Miro went through AIF360 to look at features
- Pre-processing techniques for feature transformation
- May be mostly just implementations of research papers
- Less clear which would be most useful to data scientists
- May need some caveats in their use - "we're just a library"
- Revisit question about joining numfocus
## 2022-02-03
- Recommendations from Denae Ford Robinson for open-source communities to be more welcoming to newcomers (now an issue #[1018](https://github.com/fairlearn/fairlearn/issues/1018))
- No clear path for first time contributors
- Two main channels - from website (contributor guide) or Readme
- No need to reinvent the wheel - can link to existing docs from pandas, Dask
- Three general personas
- First-time *users* of Fairlearn
- Sending users to User Guide
- First-time *contributors* to Fairlearn
- More than just a link - send them to the Code of Conduct and the Contributor Guide
- People just generally interested in Fairlearn (e.g., learning more about fairness)
- Send them to the website landing page
- More prominent Code of Conduct - linked visibly in footer, (maybe not) navbar
- Continue to add and maintain "good first issues"
- Some version issues with Sphinx, documentation clarity, font sizes
- Yogendra can create additional issues
- TODO: add clarity about versions (deprecated versions no longer supported; most stable version (which does not have most up-to-date content) and current build, which may need a different pip install)
- Miro will open this (but follow up on 02-10 to check)
- Laura - PyCon Spanish track accepted
- Translation work is continuing, but need the translation tool to accept Fairlearn as open source
- Manojit - PyCon tutorial accepted
- Hilde - new student joining soon
## 2022-01-27
- Adrin working on pipeline for sensitive features with scikit-learn
- May need some work on the Fairlearn side to make it scikit-learn compatible (right now, some hacks currently)
- Sean working on reweighting preprocessing technique
- May not fit in preprocessing (since it depends on sample weights)
- Adrin: could build off of imbalance learn (resample, rather than transform)
- Miro: Pre-processing may be a misnomer, since it's during training (not deployment), but the terminological debate may not be worth quibbling about - but (Hilde:) may be confusing
- Another option that might clarify, but is in conflict with how people use the terms
- Miro: this is the most basic form of reduction - changing one algorithm into another (other things: resampling, relabeling)
- Adrin: other reductions approach wrap an estimator to do something to the output, but this is changing the input, so not really reductions
- Decision: use pre-processing, but make it super clear in documentation how pre-processing algorithms (e.g., CorrelationRemover) relate to fairness implications of other pre-processing data decisions (e.g., handling missing data)
- Miro: the fairML textbook definition of pre-processing would not consider reweighing pre-processing
- Look at ExponentialGradient and other meta-estimators
- Adrin: look at sklearn's [imbalanced-learn](https://imbalanced-learn.org/dev/references/generated/imblearn.FunctionSampler.html)
- Richard's MetricFrame PR: Adrin suggests running a benchmark to evaluate performance
- TensorFlow is throwing errors, may need to use scikeras rather than pinning TensorFlow
- Richard: may need to take another look at builds (Adrin will help as needed)
-
## 2022-01-13
- Dataset documentation
- Interns are realizing we probably shouldn't use these datasets, but at least want to document risks of using them for benchmarking
- For UCI Adult, the code is there, so we want to add documentation
- Conversation with author of the Retiring Adults paper on the issue (#984) was great
- Roman wants to pick up the synthetic dataset PR (#907), and Miro might contribute a code snippet
- Scikit-learn's version doesn't require relationships between features
- Proposal to create a single synthetic dataset _and_ option for users to create their own dataset (starting with make_classification)
- Miro will create an issue for the single dataset creation option, and Roman will work on the PE #907 option
- Other updates
- Richard's PR improves MetricFrame's performance, ready for Adrin's review
- Sean and Bram Anders will both tag people for review if need be
- For some PRs, can mark them as stale, so new contributors can see it's available to work on
- Plan to do a Fairlearn (or open-source fairness toolkit) social event / meet-up at FAccT, maybe after the main conference programming
- For website, Miro just marked #986 (implement navbar outline) as help-wanted (even though the learning resource group is finalizing the specific substructure of Learn)
- For #1002 (add new illustrations to homepage), this might require specific skills, which we may not be able to bring people on for without hiring them
## 2022-01-06
- Other ways to use meetings
- Working sessions, paper discussions
- FolkTables
- Two "micro-interns" working with Roman for the next month on documenting existing datasets, connecting with FolkTables authors
- Roman reconnecting with Kas
- Implementing plots for ROC curves for groups, added tests
- Hilde's students
- Bram's ranking metrics
- Needs input from Miro
- Bram and Sean will finish in February
- Manojit submitted PyCon US proposal
- Will find out at end of January
- Richard has an open PR for performance improvement for MetricFrame
- Figured out how to avoid breaking things, converting lists to numpy arrays - would like reviews
- Community call timing / purpose
- Can ask a question
## 2021-12-16
- Laura is working on translation - sphinx rendering - she'll work with Roman together
- How do we handle more experimental things (like adversarial debiasing PR), that are available in other libraries?
- Maybe create it as a notebook, not a PR
- Maybe demarcating it as an experimental namespace/module
- Something similar to blog post on Boston housing dataset, or like the ThresholdOptimizer (like Roman's PR on documentation for that)
- Maybe use the scikit-learn approach - if these approaches appear in other toolkits, toolkit comparisons, etc, we should incorporate them
- Don't throw a warning (otherwise why is it in the library), but put those caveats and context in the API docs, user guide, etc
- When creating a PR for a new method, you're expected to have documentation in the API docs and user guide
- Docs formatting
- Example notebooks
- Sphinx gallery
- .py files (or .ipynb?), but this has it's own syntax that's difficult to grasp
- Hard to download?
- IS this difficult to consume?
- Quickstart
- .rst (restructured text), has code that compiles when the website compiles
- But it's not a notebook, can't download it
- Code blocks don't have context carry over
- For SciPy tutorial:
- It has both worked out examples and prose
- Example notebooks
- Should be self-contained
- For user guide
- .rst may be easier to write, but still requires formatting
- Goal is to make creating and editing the content as easy as possible
- Miro - two options:
- either keep doing what we're doing and provide guidance on when to use which format for what type of content
- Or move everything to a single format (e.g., the executable books that Roman shared)
- Let's not introduce a 3rd format
## 2021-12-09
- Only 3 of us - we decided to cancel
## 2021-12-02
- Add additional documentation to contributor guide
- Sphinx / docstring issue that Allie ran into
- Line length
- New line for each sentence, but cut off at ~79 (except for URLs)
- Does this still make sense for prose?
- Richard has a new PR for MetricFrame (#985) that is ready for feedback
- Harsh is working on error bars (Issue #999)
- Do we have anyone with Spark knowledge?
- May have scalability issues without it
- Can wait til we have an issue posted
## 2021-11-18
- Updates/questions
- Discussion of skorch dependencies
- Discussion
- Translation of documentation into Spanish
- Which parts of the documentation get changed?
- Start with API docs
- Should we add a date it was translated?
- Possibly do it like a snapshot at the release version level?
- Organizationally, should this be a separate project in the same organization?
- No, Laura will create a totally separate repo and we can link to it on the Fairlearn website somehow
- If there's enough activity, then it could be possible to bring in later
## 2021-11-11
- Updates
- Agenda items
- Discussed Fairlearn [learning objectives](https://docs.google.com/spreadsheets/d/1L9oIOUwGcC4nJUPwZqjiRERHfQFrbZwUJ-uQ9sfFe1Y/edit#gid=0) (sections 2 and 3)
## 2021-11-04
- Updates
- PyData Global
- Recording from tutorial will be available in December/January
- Tutorial notebook is up on the repo [here](https://github.com/fairlearn/talks)
- Potential next steps:
- Writing a blog post based on the PyData panel
- Potential lead - Babel.AI
- Richard's MetricFrame performance [PR](https://github.com/fairlearn/fairlearn/pull/985)
## 2021-10-28
- Updates
- Discuss naming of exposure/utility ratio for ranking metric [#974](https://github.com/fairlearn/fairlearn/pull/974)
- Miro's proposal: could use a more general proportional exposure (with utility as a special case)
- PyData Global tutorial today
- Richard working on PR [#985](https://github.com/fairlearn/fairlearn/pull/985) to improve MetricFrame performance
- Agenda
- Discuss Fairlearn learning objectives and proposed structure
- See spreadsheet [here](https://docs.google.com/spreadsheets/d/1L9oIOUwGcC4nJUPwZqjiRERHfQFrbZwUJ-uQ9sfFe1Y/edit?usp=sharing)
- Could feature questions either on:
- a page on the Learn tab
- the homepage (a la Google PAIR Guidebook)
- Maybe FAQ?
## 2021-10-21
- Updates
- PyData tutorial and panel next Thursday
- Learning resources working group will present at next week's community call
- Agenda
- Adversarial debiasing PR [#973](https://github.com/fairlearn/fairlearn/pull/973)
- To be renamed to "adversarial mitigation"
- You want an adversarial network that cannot predict sensitive features
- But how should it handle multiple types of sensitive features? (continuous, binary, categorical)
- Miro: Better to start with an implementation of what they did in the paper, and then future PRs could extend to handling multiple
- In the paper, all sensitive features of the same type (using MSE, norm squared for multivariate)
- Better to focus on a single type of sensitive features (e.g., binary, in the paper and other libraries), have an error message that multiple sensitive features are not supported
- Some open PRs
- Boston Housing dataset blog post - Roman and Michael recently reviewed, Allie will take a look
- Synthetic dataset creation [#907](https://github.com/fairlearn/fairlearn/pull/907) - Roman will take a look at the latest commits from Corey
- Plotting ROC curves [#869](https://github.com/fairlearn/fairlearn/pull/869) - Roman will follow up with Kas
- Notebooks in notebooks directory are in iPython format, now that we have mature notebooks, can we create issues to put these into .py or .rst format (EY notebook; SciPy tutorial)
- Would be good to create issues for converting these to .py (so people can download)
- Could be better as examples of how to use the code in the docstrings
- Could delete other notebooks (or triage)
- SciPy notebook - dataset is more authentic, even if the task is a bit more artificial (or at least a reproduction of the task in the Obermeyer et al. paper)
- **Action item:** Could take what we have and do it in .py or .rst so people can download and work on it
- **Action item:** Separate out a smaller section of the notebook (to cover similar tasks as the EY notebook), with some sociotechnical context in a paragraph, then point to tutorial for more detail
- EY case study
- Currently covers assessment and mitigations
- the task is more authentic, but the actual dataset is not available, and the UCI dataset is inappropriate, so how could we handle this?
- create a synthetic dataset?
- ask EY for the set of features and their correlation matrix to supplement the synthetic dataset?
- **Action item:** add caveat/disclaimer (in red) about the inappropriateness of the dataset, and add a call to action to ask if people have a more appropriate dataset to let us know
- **Action item:** would be great to have a blog post on UCI (or German credit dataset) like for the Boston Housing dataset, since that dataset was used in Taiwan
- Possibly, could try the EY case study with the German credit dataset?
## 2021-10-14
- Adrin mentioned https://github.com/scikit-learn/scikit-learn/issues/21324 which might show up in our builds as well
- supporting python 3.10 [[PR](https://github.com/fairlearn/fairlearn/pull/981)]:
- packages don't have binaries out yet, so builds will take ages; instead wait until those are out to merge
- supporting python 3.6 for the next release (or not), [[PR](https://github.com/fairlearn/fairlearn/pull/980)]:
- not enough to warrant a release unless there are other significant changes
- special case in tests doesn't seem to be resolved in 3.7 yet, so we'll have to keep it around a little longer. Alternatively, consider removing the test case altogether.
- new version of `MetricFrame` in [WIP PR](https://github.com/fairlearn/fairlearn/pull/975)
- Who's interested in participating in this workstream? Richard, Adrin, Miro
- Should it be a separate (recurring) call or just an occasional discussion on the community call? Alternatively, start with WIP PR and get feedback.
- any comments/questions/concerns on existing open PRs or issues?
- Allie is looking for confirmation that issues are resolved, and perhaps another round of feedback (this is on the fetch_boston doc PR).
- "Retiring Adult" authors reached out to perhaps add their new datasets to Fairlearn including documentation. They will open an issue to start the discussion if we're okay with that.
- Miro: we should consider adding the dataset from our SciPy tutorial since we already did the work to contextualize it
- Roman to look into setting up channel for agreeing to terms of use to keep bots out. We may open the call up a bit more to encourage users to join with questions and concerns, so the strong moderation capabilities of Discord will come in handy.
## 2021-10-07
- Updates
- Include CI step for scikit-learn's nightly build [#965](https://github.com/fairlearn/fairlearn/issues/965)
- Ranking metric
- Bram working on metric for rankings, and only need y_pred, not y_true
- Came across issue [#756](https://github.com/fairlearn/fairlearn/issues/756)
- Long discussion, not sure what to do to handle that
- Richard is going to try to implement something using Pandas group_by as a prototype to handle that better
- List sensitive features, control features, columns to evaluate
- Wants flexibility on kwargs to support future use cases
- Questions:
- What should the metric function look like? (natural signature for function)
- How should it be used with MetricFrame?
- Action:
- For Bram, just call it y_pred for now, and can revisit when Richard has a larger fix
- Richard will work on implementing his prototype
- Miro will take a look at the 756 issue
- For future:
- if y_true and y_pred are not provided, could provide sliced sample in MetricFrame
- How to handle this for tasks that aren't supervised learning problems, like clustering, etc
- Boston housing documentation
- No blockers, just working through comments (not sure how to)
- Reference formatting in bibtex
- There's a bibtex package for Sphinx
- Suggested opportunities to connect to construct validity and abstraction traps
## 2021-09-30
- Updates
- Presentations
- PyGotham presentation coming up
- PyData panel accepted and scheduled
- Internal Microsoft presentation
- Scikit-learn update is out, and triggered issues (correlation-remover)
- Recent submitted PRs
- Adversarial de-biasing [#785](https://github.com/fairlearn/fairlearn/issues/785)
- May be strange to put it in a single fit function, but would make it easier
- What stopping criteria to use? (max_iter, others from stochastic gradient descent (e.g., histogram gradient boost from xgboost))
- May make it difficult for users to print loss function.. one option would be to use partial fit
- Broader question: how are we handling other non-blackbox integration of other packages (e.g., Keras, Pytorch, etc)?
- Allie submitted a PR for fetch_boston documentation [#961](https://github.com/fairlearn/fairlearn/pull/961)
- Website issue scoping
- What has been already done:
- Decided on new website structure
- Cross-check compatibility of the current structure page
- Have a set of UI designs
- [Ideation phase of illustrations]
- Website implementation work
- Implement the proposed website hierarchy (e.g., [#838](https://github.com/fairlearn/fairlearn/issues/838)), i.e., structure the documentation into sections just using the `.rst` / `.py`
- There are no current blockers on this
- Basically anything that's just "content" rather than styling changes
- Implement Vanessa's design proposal
- How best to stage this?
- We need somebody with front-end CSS / HTML / javascript expertise + Python/Sphinx
- Low-hanging / bridge UI work (?)
- What can be accomplished without changing the current template
- We need somebody with front-end CSS / HTML / javascript expertise + Python/Sphinx
- Some ideas:
- Make our navbar consistent between the landing page and generated pages
- Milestones for front-end contributor
- Global UI ([#833](https://github.com/fairlearn/fairlearn/issues/833))
- Design language ([#846](https://github.com/fairlearn/fairlearn/issues/846))
- Mobile design ([#845](https://github.com/fairlearn/fairlearn/issues/845))
- Responsive grid ([#834](https://github.com/fairlearn/fairlearn/issues/834))
- Next steps:
- Check with Vanessa what's the final website hierarchy (for the final website) - maybe that's just the [Figma](https://www.figma.com/file/wt76z0M87RBDDEoyAEpD2g/OPEN-SOURCE-Fairlearn-Redesign?node-id=2535%3A5899)
- Create issues:
- Refactor the existing structure
- Move things
- Rename sections
- Link sections
- Create new content
- Clean-up:
- Remove stale issues
- Lock stale discussions / transfer into issues if relevant
- Meet w/ Vanessa + front-end engineer
- Scope the front-end work into issues
## 2021-09-23
- Updates
- Manojit talked to maintainer from [Fugue](https://fugue.readthedocs.io/en/latest/introduction.html) about possibility of parallelizing MetricFrame to speed it up on multi-core system
- What would actually need to be done to make this work? Need handlers on back-end?
- How much is this actually needed?
- For large datasets (e.g., images), there might be issues, but it depends whether the input is all the predictions, or the objects themselves
- This may be resolved by writing guidance to pass predictions rather than the objects?
- May be a larger question about dataframe interoperability (e.g., [this RFC](https://data-apis.org/blog/dataframe_protocol_rfc/) or [API](https://github.com/data-apis/dataframe-api))
- Larger tactical question about dependencies that incur potential maintenance costs/debt
- **TODO**: open issue on establishing dataframe interoperability
- May be related to Eric's issues
- Roman wants to clean up issues, but hasn't had a chance
- Many new issues, some common themes
- Clarifying expectations about documentation (like Hilde's (953, 952))
- Clarifying expectations about terminology (e.g., group names, fairness-aware, etc), and how to align with fairness style guide (not just adopting the authors' language)
- **TODO:** write issue for contributor guide to look at style guide first
- How to determine which new features to propose?
- Number of citations of paper (but may be too high a bar, or not the right signal)
- Writing something explaining how it should be used (and why)
- Are there "standard" benchmarks (for e.g., things like PCA)?
- **TODO:**
- Write issue for contributor guide that opening issues for new features should include an example use case (e.g., for face validity reasons)
- "Inclusion criteria" --> can point new contributors to this
- Hilde has a list of new features from AIF360
- Could open issues for benchmarking existing methods
- When do we reach the point when we want to make things computationally optimized, e.g., moving this out of Python to Cython (then compiled to C)?
## 2021-09-16
- Website design discussion
- Overview of the history of the website
- Difference between landing page built by marketing and others
- Hadn't yet built out content thinking about user experience
- Hoping to overhaul landing pages
- Consider dependencies
- Types of tasks
- UI revision
- Content restructuring
- Sphinx-related theme issues
- Fonts, logos, headers
- Updates:
- Thomas, Sean, Bram all working with Hilde on new issues, e.e.g,:
- [Fairness-aware PCA](https://github.com/fairlearn/fairlearn/issues/956)
- [Fairness metrics for ranking](https://github.com/fairlearn/fairlearn/issues/945)
- Allie looking into documenting Boston housing dataset issues
- Manojit putting together PyData
- Triage issues and PRs, focus on issues/PRs we want for the next release
- Prioritize review of open PRs
## 2021-09-02
- Hilde has interns starting next week... may be able to have them focus on more of the engineering issues (e.g., from AIF360)
- Have them start with good first issues, some issues around warnings first
- May want to revisit open issues before students join
- Some open issues may not be feasible for people (should we remove these?)
- Should break some apart:
- Warnings/documentation
- Code snippets
- Should revisit older issues to see whether it's a bigger project to address
- Could add new ones
- ACTION: Roman will review older issues that may need to be broken up and add them to the calls to discuss, if needed
- Manojit will lead discussion of paper on Retiring Adults dataset
- Hilde may have an assignment where students will evaluate fairness toolkits
- Checked in on several open issues and PRs
## 2021-08-25
- Approved PR for showing pie chart only for count metric [#939](https://github.com/fairlearn/fairlearn/pull/939)
- PR failing due to `flake8` [#938](https://github.com/fairlearn/fairlearn/pull/938) was re-opened, the error was a needed whitespace after a comma; should be easily fixable
- Versionchanged [#929](https://github.com/fairlearn/fairlearn/pull/929/files) - Roman followed up letting Matthew know that it needs an explanation of what changed for that version (Roman will follow up with Matthew to see if he's blocked)
- Synthetic datasets [#907](https://github.com/fairlearn/fairlearn/pull/907)
- Corey has updated and will be reviewed
- Removed section numbers from user guide (but kept indentation structure)
- Option for Fairlearn sprints
- Roman will post on Discord to ask about people's feelings about sprints (when, requirements needed (e.g., funding for internet connections))
- May want to revisit conversation about timing of weekly calls depending on participation of people in different time zones in sprints
- Possible option to have multiple calls at different times (e.g., an 8pm ET call that might work better for people in east Asian contexts)
- PyData Global - still waiting to hear back
- Panel with DataKind -
## 2021-08-19
- May be worth pinging @coreyharris for PR [#907](https://github.com/fairlearn/fairlearn/pull/907)
- Update on error plotter [#857](https://github.com/fairlearn/fairlearn/pull/857) - Adrin gave feedback, Alex will address before sending it out for feedback from others
- Update on Kas's PR [#869](https://github.com/fairlearn/fairlearn/pull/869) - all there, waiting on her to sync with Roman to be able to commit it
- PyData proposal - can keep editing it as long as we want
- Conference date: 10/28-30
- Outreach - DataKind volunteer summit this weekend and Manojit is speaking at it
- Think about outreach to iHub in Kenya (or other tech hubs in African contexts)
- Update on educational materials - started developing and synthesizing learning objectives
- Considering other audiences (e.g., compliance officers)
- May be touching on multiple areas (privacy, etc), not just fairness
- Could include discussion of using Fairlearn / disaggregated evaluation for non-AI rule-based algorithmic decision systems
- Developing educational resources is more than just code
- Question about format, structure, medium is still TBD
- May not need to live on the Fairlearn website or user guide, per se
- Check in on Allie's issue [#852](https://github.com/fairlearn/fairlearn/issues/852)
- Question about branch naming (doesn't really matter, doesn't get recorded)
- PR squashed into single commit, with commit message set as the PR title, lightly edited
- Her PR will tackle a sklearn error estimator message, but future PRs could tackle others (e.g., [#937](https://github.com/fairlearn/fairlearn/issues/937) around Matplotlib deprecation)
- Possibility to have a multi-day sprint??
- In-person may be tough/impossible during COVID
- Even virtual could be good to try to do
## 2021-08-12
- Check in with Laura (error with merging, will sync with Richard/Adrin), Bram (got great feedback, now implementing), Allie (starting this weekend), Alex:
- Error bars - do we want to support asymmetric errors also?
- Initially scalar, then confidence intervals
- Once we have functionality and see how people are using it, then can figure out how to keep developing it
- Is there an update on website visuals? Nothing yet - should be an update from Vanessa soon
- Where are we with features from AIF360?
- Check in on #[784](https://github.com/fairlearn/fairlearn/issues/784) - may be waiting on an answer to their question
- Check in on #[785](https://github.com/fairlearn/fairlearn/issues/785) - to see if they're stuck and need help
- Any others? Should we advertise for these on weekly calls?
- PyData - if you're interested in reviewing, sign up on the [link](https://docs.google.com/forms/d/e/1FAIpQLSda8dLQ5z4Mv0jbbtzxVNz3Y1R1elYtw2otCu3kUBR2J_Eqgg/viewform) soon (CfP closes on 8/15, and reviews start after that)
- Discussion of PyData Fairlearn panel - wanting institutional diversity, other types of diversity
- Discussion of other outreach efforts - giving talks on Fairlearn at Meetups, etc
- Sharing resources (slide decks, lecture notes, etc)
- How do we share low-stakes, informal talks they've done - to encourage others to do it?
## 2021-08-05
- Update on abstraction traps [PR](https://github.com/fairlearn/fairlearn/pull/809) - needs review from Michael and Hilde
- Bram looking for a way to get more involved - was working on error handling
- Pointer to [issue labels](https://github.com/fairlearn/fairlearn/labels)
- Updae from educational materials working group (working on learning objectives - will present them at the community call soon)
- PyData global workshop
- Technical demo of Fairlearn, and presentation from practitioners using it (e.g., Triveni)
- Will follow up on Discord once they have a better idea of the types of topics
- Deadline to apply is 08/15
- Workshop itself is end of October
- Host meetings on Discord?
- We will try this out next week (8/12), using the "weekly-community-calls" voice channel on Discord
- CircleCI is failing... Hilde and Adrin will follow up with the issue
- Package on documentation in the AI lifecycle (model cards, datasheets, impact assessments)
- Practices and packages for documentation
- Where should these live? Their own package?
- Model cards could be used to monitor models (offline, not in real-time)
- Tutorials on how to use other open-source projects like those
- Two orgs contacted Manojit about using RAI documentation and could be user testers
## 2021-07-29
- Other opportunities to evangelize Fairlearn: KDD, PyData (Manojit)
- Potential use cases - Women's World Banking using Fairlearn to assess bias towards women in lending scenario
- Consider potential for blog post
- Educational materials working group meeting next Wednesday (8/3) at 12pm ET
- Get comments in on the error bars PR [857](https://github.com/fairlearn/fairlearn/pull/857) ASAP before the end of Alex's internship
- Discussion about where/how to include API reference text for experimental packages (e.g., fairlearn.experimental)
- Eventually put it in metrics when we're happy with it
- Discussion of risks of multiple hypothesis testing and whether/how we might communicate those risks and provide resources (e.g., post-hoc corrections)
- Proposed to include this in learning goals (if not creating materials ourselves, maybe pointing to other resources)
- With the caveat from Manojit that intersectional tests won't be independent, and thus may need a different approach from post-hoc corrections that assume independence (e.g., family-wise FDR from Benjamini-Hochberg)
- Synthetic dataset PR - can we convey possible risks of generating synthetic data divorced from social context while still allowing users to quickly test Fairlearn functionality
- Idea for connecting educational resources to functionality - printing output at different points when relevant
## 2021-07-22
- Nils wants to work on [#720](https://github.com/fairlearn/fairlearn/issues/720), but hasn't been able to start
- SciPy sprint smaller than PyCon
- Synthetic dataset PR [#907](https://github.com/fairlearn/fairlearn/pull/907#discussion_r671987655) is open, and want to get feedback and move it forward; keep it smaller and can expand later
- Error bars PR [#857](https://github.com/fairlearn/fairlearn/pull/857) - can we re-use plotting functionality from pandas?
- Outreach - [EuroPy](https://ep2021.europython.eu/events/sprints/), [Data Umbrella](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fc%2Fdataumbrella&data=04%7C01%7Crolutz%40microsoft.com%7C608e3c4730ac47fa3fce08d9452cd3f3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637616881130429415%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=z7WYqlIHecWgh4W1kuH5CKZQ5y%2BfbCBoRJDWwqWZAik%3D&reserved=0), possible podcast episode with Vincent, [Python Bytes](https://pythonbytes.fm/)
## 2021-07-15
- SciPy reflection
- How to better incorporate discussion and breakout rooms next time?
- Potential next tutorials?
- PyData??
- Abstraction trap PR
- Resolve suggestions
- Discuss Framing and Formalism trap explanations
- Don't need to add new material, let's just clean up what's there and save the rest for future PRs
- Could reach out to others on Discord to chat about examples and/or future issues or PRs that involve code examples (if possible, without falling into a solutionist trap ourselves)
- Pinning dev dependencies
- How do we handle when the website breaks? Particularly when it fails silently and rollbacks are difficult
- Pinning may block PRs until someone
- Keep with status quo: not pinning
- Should revisit when sphinx has a new version
- Alex's [PR](https://github.com/fairlearn/fairlearn/pull/857) would benefit from some comments
- How do other other libraries (e.g., matplotlib, seaborn, pandas, etc) test plotting?
- SciPy sprints this weekend, but no updates from the organizers yet