Roman Lutz
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Fairlearn: Seed ideas for scenarios *If you're trying to find something tangible to collaborate on or contribute, search for "next step"* These scenarios are seed ideas for coming up with tangible and concrete deployment contexts where we can work through fairness questions. These seed scenarios can help us work towards making: - A. **fairlearn example notebooks**: Jupyter notebooks that illustrate the value of the Fairlearn Python library, possibly using synthetic datasets. This can help us show people why the project is valuable, rather than just telling them. - B. **sociotechnical "talking points"**: bullet points that illustrate the work of approaching fairness as a sociotechnical challenge, in a way that is approachable to developers and data scientists who are new to thinking about fairness. This hackpad evolved from https://github.com/fairlearn/fairlearn/pull/491, with Kevin adding initial scenarios, Roman adding additional ones, and group discussion that is recorded here as notes and questions. See the bottom of this hackpad for more historical notes and links. Our original intention was to use these for **fairlearn example notebooks** and so that's what this hackpad focused on. We've also discovered that some of these scenarios won't make for good examples of the Fairlearn Python library. But they may be helpful seeds for [Sociotechnical "talking points"](https://hackmd.io/nDiDafJ6TMKi2cYDHnujtA). ## Contributing example notebooks See https://fairlearn.github.io/contributor_guide/contributing_example_notebooks.html, which is pasted below for convenience. > A good example notebook exhibits the following attributes: > > 1. **Deployment context**: Describes a real deployment context, not just a dataset. > 2. **Real harms**: Focuses on real harms to real people. See [Blodget et al. (2020)](https://arxiv.org/abs/2005.14050). > 3. **Sociotechnical**: Models the Fairlearn team's value that fairness is a sociotechnical challenge. Avoids abstraction traps. See [Selbst et al. (2020)](https://andrewselbst.files.wordpress.com/2019/10/selbst-et-al-fairness-and-abstraction-in-sociotechnical-systems.pdf). > 4. **Substantiated**: Discusses trade-offs and compares alternatives. Describes why using particular Fairlearn functionalities makes sense. > 5. **For developers**: Speaks the language of developers and data scientists. Considers real practitioner needs. Fits within the lifecycle of real practioner work. See [Holstein et al (2019)](https://arxiv.org/pdf/1812.05239.pdf), [Madaio et al. (2020)](http://www.jennwv.com/papers/checklists.pdf). > > Please keep these in mind when creating, discussing, and critiquing examples. ## Next steps So where do we go from here for Fairlearn example notebooks? One path is that we: 1. Finish a brief walk through and **vote as a group on the top seed scenarios** that are worth working through further to create an example notebook that illustrates Fairlearn's value proposition for reducing real harm. 2. From there, we can **individually work through one or two deployment contexts offline** and see where we get in terms of the contributing guidelines. 3. If we find that no one votes for any of these seed scenarios, then we can **individually brainstorm and generate ten more seed scenarios offline**, and then try again to discuss and vote as a group. Sources include: personal experience, stories from people we know, news articles, research papers (eg, [Barocas and Selbst (2016)](https://www.cs.yale.edu/homes/jf/BarocasSelbst.pdf)), etc. ## Seed scenarios #### 1a. Identifying potential tax fraud XXX You're a member of an analytics team in a European country, and brought in to consult about a project that has already started to scale the deployment of models for predicting which tax returns may require further investigation for fraud. The team has used a model trained in other jurisdictions by a large predictive analytics supplier, and hopes that they can leverage this at a lower cost that would be required to invest in the capability in-house. [Veale et al. (2018)](https://arxiv.org/pdf/1802.01029.pdf) - stakeholders: everyone filing a tax return, data scientist, auditors - real harms: False Positive = audit on someone who made no mistake (perhaps burden for them? waste of time/money for auditor); False Negative = fraud undetected; - collaborators: Michael Veale - questions: Could this lead to feedback loops if people find out what criteria cause audits? What percentage of returns can be audited? Can we find a dataset for this? - kevin: In EU context mentioned in Veale et al. (2018), use of protected characteristics in model development would be unconstitutional; according to "lead of analytics at a national tax agency... if someone wanted to use gender, or age, or ethnicity or sexual preference into a model, [they] would not allow that — it’s grounded in constitutional law." even when legally cleared, analysts do not use these features because they would also have to explain to citizens that they were used to trigger an investigation, and there are ethical norms in the agency against this. *Next step: The seed scenario from Veale et al. (2018) is a good candidate for sociotechnical talking points, particularly focusing on the situation with the portability trap described in the paper.* #### 1b. Identifying tax fraud, adapted to US context (exploring adapting scenario #1 into US context) - kevin: Electronic fraud detection used for decades in US ([source](https://www.treasury.gov/tigta/auditreports/2015reports/201520093fr.html)), decades of large-scale fraud (eg, [Panama papers](https://www.icij.org/tags/us-panama-papers-case)). IRS funding, staffing for "fraud technical analysts" and thusly fraud referrals have declined dramatically (>50%) over the last decade or so, and in 2018 the "audit rate for individual returns was 0.59%." ([source](https://news.bloombergtax.com/daily-tax-report/insight-the-irss-renewed-focus-on-fraud-implications-for-tax-practitioners)). in 2019, IRS says 1500 fraud investigations, with ~60% recommended for prosecution ([annual report](https://www.irs.gov/pub/irs-utl/2019_irs_criminal_investigation_annual_report.pdf)). Steps of the investiation process are [described here](https://www.irs.gov/compliance/criminal-investigation/how-criminal-investigations-are-initiated). - kevin: Recent [IRS contracts](https://src.bna.com/C76) granted [to Palantir](https://news.bloombergtax.com/daily-tax-report/palantir-deal-may-make-irs-big-brother-ish-while-chasing-cheats). Since ~2019, anticipated increase in enforcement action ("the IRS is quite vocal about its increasingly specialized ability to analyze data in order to help it direct tax enforcement resources and develop criminal cases, touting its use of data analytics programs that can access and search over 9.5 billion records."). Contracts indicate this includes social network analysis, text and email communication, and other forms of non-financial data. - kevin: IRS has an office of civil rights, but didn't find any reporting on real harms here related to over-investigation that I could connect to current Fairlearn capabilities. I'm assuming adoption is driven primarily by internal cost-savings, and that there's a clear natural equilibrium since "cost of fraud" is clear to express financially and to trade off with "cost of preventing fraud." It's challenging to make real harms of fraud tangible in human terms since downstream impact is so diffuse (ie, it doesn't directly translate to reductions in specific services). - kevin: Other references: [IRS Criminal investigations](https://www.irs.gov/compliance/criminal-investigation/program-and-emphasis-areas-for-irs-criminal-investigation), [Artificial Intelligence: Entering the world of tax (Deloitte, 2019)](https://www2.deloitte.com/content/dam/Deloitte/global/Documents/Tax/dttl-tax-artificial-intelligence-in-tax.pdf), [Advanced Analytics for Better Tax Administration: Putting Data to Work (OECD, 2016)](http://www.oecd.org/publications/advanced-analytics-for-better-tax-administration-9789264256453-en.htm) *Next step: Table it, unless we find out more about how to express real harms in human terms.* #### 2. Debit card fraud investigation XXX You're a data scientist at a Dutch financial services company, and your manager asks you to join an existing team. This team has deploy a model trained on historical transaction data and now new debit transaction data is arriving. For each new transaction, the model predicts whether it is potentially fraudulent and then will trigger an alert and inspection by human analysts. The output that matters for the company is the final decision by the human analyst of whether to block the transaction, allow it but flag for further investigation by another team, or flag the transaction as normal. [Weerts et al. (2019)](https://arxiv.org/abs/1907.03334) - stakeholders: customers, data scientists, analysts - real harms: False negatives mean clients can't get their money back (eg, in a phishing scheme), while false positives may overwhelm the team of human analysts or disrupt clients making legitimate purchases. - collaborators: Hilde Weerts - questions: debit/credit card usage varies by country which changes what costs are associated with FP/FN; Can we find a dataset for this? - kevin: concerned about a natural equilibrium because of funding incentives within organization. Couldn't uncover real harms, since there are strong recourse and contestability procedures. *Next step: To move forward, find reporting on real harms to real people (eg, increasing the costs of fraud prevention would create barriers to entry for people in the Netherlands to use debit cards, or real harms from failure of contestability and recourse procedures related to fraud).* #### 3. Measuring brand sentiment You're a member of a team trying to measure brand sentiment from online comments and reviews. The team hopes to use an existing language model, a third-party service for flagging abusive comments, and then train a more targeted sentiment classifier for your brand on top. [Hutchinson et al. (2020)](https://arxiv.org/pdf/2005.00813.pdf) - questions: we have concerns about sentiment classification in general; Fairlearn may not want to focus on deep learning for text tasks at this point - How does the system behave if/when the third-party service for abusive comments updates its model? - How multilingual do we need to be? What about different dialects of the same language? #### 4. Candidate screening X X X A potential client asks you if ML can help with predicting job candidates' suitability for jobs based on a combination of personality tests and body language [Rhagavan et al. (2019)](https://arxiv.org/pdf/1906.09208.pdf) - collaborators: Solon is one of the authors! - notes: application looks sketchy, need to talk to Solon, perhaps this could be rewritten to be about qualifications rather than, for example, body language. - lisa: I find some of the job screening/posting scenarios interesting given the significance some of these algorithms/models in our current world where many people are and will be searching for new jobs. *Next steps: a) Work towards an example Fairlearn notebook, see [Fairlearn: Candidate screening example](https://hackmd.io/GMli82s7SxORABkabCgw8Q), b) Work this seed into sociotechnical "talking points".* #### 5. Advertising jobs to potential candidates X You work as a data scientist for an online job platform where people search for new jobs and exchange professional content and updates. You are in charge of the system that decides to whom to recommend which positions. [Upturn report](https://www.upturn.org/static/reports/2018/hiring-algorithms/files/Upturn%20--%20Help%20Wanted%20-%20An%20Exploration%20of%20Hiring%20Algorithms,%20Equity%20and%20Bias.pdf) & [The Guardian article](https://www.theguardian.com/technology/2015/jul/08/women-less-likely-ads-high-paid-jobs-google-study) - stakeholders: users of the job platform, employers advertising on the platform, data scientist working for the job platform - harms: recommending certain jobs only to certain groups of people increases the likelihood that the employers won't get a diverse set of applicants and job seekers may have no chance of seeing certain kinds of jobs - notes: somewhat complex setup compared to simple regression/classification scenarios supported by Fairlearn - lisa: I find some of the job screening/posting scenarios interesting given the significance some of these algorithms/models in our current world where many people are and will be searching for new jobs. - What is 'success' in the training data? That a candidate applied? Was interviewed? Was hired? Does this match the definition of success for the system (especially since applied->interviewed->hired is a very leaky pipe) - Do we define fairness relative to the applicants in our user pool or in the broader population? #### 6. Rankings for image search You're on a team working on improving an image search system after receiving some complaints from users related to fairness. Users often use this system to find a selection of stock images to use when making multimedia presentations. In this system, requests start with some information about the context and user creating the query, and your team is trying to incorporate ideas about fairness like diversity and inclusion into how search results are ranked. [Mitchell et al. (2020)](https://arxiv.org/pdf/2002.03256.pdf) - stakeholders: users of the search system, people who may (or may not) be shown in search results, data scientist - harms: over- or underrepresentation - notes: Fairlearn doesn't support ranking at the moment, but this may be a good application in the future. #### 7. Sales leads for car loans X You work at CarCorp, a company that collects special financing data: information on people who need car financing but have either low credit scores or limited credit histories, and sells this data to auto dealer as sales leads. CarCorp serves dealers across the United States. A new project manager asks about leveraging data science to “improve the quality” of leads so that dealers to not churn. CarCorp has a large amount of historical lead data (2 million leads in 2017 alone), but relatively less data on which leads had been approved for special financing (let alone why the loan was approved). [Passi and Barocas (2019)](https://arxiv.org/ftp/arxiv/papers/1901/1901.02547.pdf) - notes: There are concerns around predatory terms on such loans, perhaps best to avoid this - collaborators: paper by Solon #### 8. Predictive policing You are a contractor working with the police department in a large city. One of the project leaders in the department would like to construct a risk score for people who are known gang members engaging in knife crime. It's important to them that they can understand what the model is doing, and they are wary that any model will pick up on protected characteristics. [Veale et al. (2018)](https://arxiv.org/pdf/1802.01029.pdf) - stakeholders: police officers, everyone in the community (especially people who may be more affected by predictive policing than others), data scientist - harms: overpolicing of neighborhoods can lead to disproportionate effect on communities in these neighborhoods (perhaps exacerbated by feedback loop) - notes: Feedback loops! Prediction on behavior based on circumstances; perhaps useful for aggregate observations about behavior but less so for individual predictions - Dangerous to assume that 'crime' is a single phenomenon. There are different kinds of crime, so also need to make sure that data match the crime being predicted. #### 9. Scheduling maintenance within a factory You work within a manufacturing company, and are starting a new project that will create a schedule assigning employees to check and update certain components of the machinery to prevent critical operation failures. The component assignment is based on data that show how often different components have worn out and broken down in the past. [Kyung Lee (2018)](https://journals.sagepub.com/doi/full/10.1177/2053951718756684) - questions: need to provide details on link between preventative maintenance model and fairness in scheduling, seems to be separate; also need to figure out how/if fairness in shift scheduling is measurable #### 10. Child protective services hotline You're collaborating with the child protective services agency as part of a county government in the US. The agency is redesigning the intake flow for reports of potential child abuse or neglect, and wants to discuss if a predictive analytics system could help them improve this system. [Brown et al. (2019)](https://www.andrew.cmu.edu/user/achoulde/files/accountability_final_balanced.pdf) and [Chouldechova et al. (2018)](http://proceedings.mlr.press/v81/chouldechova18a/chouldechova18a.pdf) - stakeholders: children, parents, agency employees, data scientists - collaborators: Alexandra Chouldechova - notes: dataset? very high stakes - notes: see [Measuring the predictability of life outcomes with a scientific mass collaboration (Salganik et al. 2020)](https://www.pnas.org/content/117/15/8398) for a large scale study of predictive systems for child life outcomes using a longitudinal dataset. The authors specifically focus on child welfare outcomes, and find that "despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate...these results suggest practical limits to the predictability of life outcomes." #### 11. Compliance in customer service calls You work on a team within financial services that is building a system to reduce the company's compliance risk from customer service phone calls. Compliance risk includes when a company employee breaches confidentiality or engaging in instances of misrepresentation or fraud. Another team has leveraged third-party services to transcribe call audio into text, and then extract features for each call related to the presence of specific keywords. It's your team's role to take that vector of binary features, and build a system to estimate the compliance "risk score" for each call. A team of internal analysts will use these risk scores to triage which calls to investigate further. [vendor blog post (2020)](https://customers.microsoft.com/en-us/story/754840-kpmg-partner-professional-services-azure) - note: might be singling out people in call center - Do call-centre demographics roughly match the caller demographics? That gives extra avenues for miscommunication causing risk. - How fine grained is the risk score? How well are the boundaries between different risk levels defined? - What is the follow up process for an 'at risk' call (both for the caller and callee)? - How are callers allocated to the call centre operators? - Do the questions in the calls systematically vary with shift patterns (e.g. placing stock trades in the morning, cancelling credit cards in the evening)? #### 12. Facial verification of taxi drivers Your team in a taxi company is collaborating on a new feature, "selfies for security," which asks drivers to periodically take pictures of themselves in between rides. The intention is to reduce the company's risk in providing taxi's that are driven by someone who the company has not screened and approved. These photos will be taken within taxi cars on cell phones, in a wide range of conditions with uncontrolled lighting throughout the day. Another team in your company will generate the signal to "request a selfie" and your team is standing up a new service to process the photos through a third party facial verification vendor that returns a confidence score for how well the driver photo matches the last photo of the driver. Your team's service then decides whether to allow the driver to start picking up riders, or to block the driver's account and flag it for investigation by a small team of analysts. [taxi company blog post (2017)](https://eng.uber.com/real-time-id-check/) and [vendor blog post (2019)](https://customers.microsoft.com/en-us/story/731196-uber) - richard: concern on facial recognition, image quality might be an issue - hanna: avoid this one, maybe not want to build it as a notebook but as "talking points" because of landscape around facial recognition - varoon: fairlearn is hard because the communities you are impacting and the people deploying the algorithms have misaligned values and ideals. about surveillance, etc. in their communities. large sociotechnical barriers for developers, so might not be best example. a bit tangentially, this may be appropriate for auditors (or "evaluators" or "journalists") - given a system because trade secrecy was waived. - solon: would like not to avoid tricky cases, that's where people need most guidance. could include some where we think the right answer is to not build since no reasonable way to mitigate. - richard: could do "talking points" - miro: yes, but zero use cases now. - Exactly what question are we trying to answer? Is it that the photo the driver sends matches the one on file? Or that the photo matches the one sent at the start of the shift? With accuracy <100%, those are not quite the same thing. - How do we cope with a photo taken under Sodium-D lighting, especially of someone with darker skin? - What about the drivers' privacy? #### 13. Financial services product recommendations X X You work at a Canadian financial services company that makes financial product recommendations for consumers. Other financial products describe their offerings and store them with your company. Users come to the app, agree to share their credit history, and then after their identity is authenticated, your team builds a model to rank the financial products that are the best fits. [financial services (2020)](https://customers.microsoft.com/en-us/story/734799-borrowell-financial-services-azure-machine-learning-devops-canada) - miro: earlier examples might be better. recommendations are tricky - fairlearn currently has binary classification and regression (but ranking could be implemented as scoring) - hanna: don't want to lose track of this, if we do put fair ranking work in the project down the line - Based _just_ on their credit history? - What sort of reccommendations? Back in 2008, one of the scandals was that people with 'prime' credit scores were steered towards subprime loans. #### 14. Customer Service triage, consulting XX You work at consulting company. One of the services your company provides is setting up an single mailbox to receive incoming customer emails. Your role is to collaborate with company to create a classification system for labeling emails in one of six categories. The output of your system is then used to route the email to the correct department head. To do this, you're using a third party keyword extraction system that the company has already set up, and can extract ~1000 binary features from an email. [consulting blog post (2020)](https://customers.microsoft.com/en-us/story/774221-securex-professional-services-m365) - hanna: i like this, because i've worked with emails, i like the diversity of this as well, it's not just finance or other things but a newer area - miro: fairlearn supports binary classification and regression. if we want to support this, we'd have to present it as a score for each of the six categories, rather than six-way classification. people do that a lot in practice so it could be okay. - What is the fairness issue? Presumably a misclassified message would simply be rerouted by the recipient. How much delay (i.e. harm) does that add to the processing of a message (especially as compared to resolution time once correctly routed)? - How often can the underlying model be updated with examples of incorrectly routed messages (with the correct labels manually applied)? #### 15. Job recommendations X You work for a job recommendation product. Background processes gather job posting, and submits them to a third party search indexing service. When a user comes to the website and uploads their resume, the resume is processed and a set of job skills are extracted. Your team works on the service that takes the set of job skills in a resume, and searches the job posting index service managed by a third party vendor. Your team then provides the ranking of job postings that is ultimately shown to the user. [company blog post](https://azure.microsoft.com/en-us/blog/using-azure-search-custom-skills-to-create-personalized-job-recommendations/) - How do we recognise 'job skills' - How do we cope with unusual qualifications (could be as simple as attending university overseas) #### 16. Alerting for first responder police officers You work at a company providing a service to police officers that accompanies queries typically run when a police officer is a first responder. Three types of queries are run: driver’s license information, license plate information, and vehicle identification numbers. When an officer presses a button on their radio and speaks a license plate number, within seconds they hear an alert tone that classifies whether the queries returned information that is low priority, sensitive but not urgent, or high priority (eg, a prior arrest record or a stolen vehicle). The system relies on a third party language system to parse the audio and extract the license plate number, and then runs those queries through police department systems. You work on the team building the classification system that chooses which of the three alert tones to play through the officers radio. [company blog post](https://customers.microsoft.com/en-us/story/792324-motorola-solutions-manufacturing-azure-bot-service) - note: hanna says no #### 17. Choosing new retail sites X You work at a clothing company, as an analyst working to select the location for three new physical stores that will be opened in the next six months. You're collaborating with a third-party vendor to estimate potential revenues at new site locations. You've gathered data on past store openings, and shared it with the vendor, who has created a model that can estimate the potential revenue for the first two years of operation in new sites. The vendor's model relies on data you've provided about your company's past openings, and other undisclosed data sources about retail sales, real-estate prices, foot traffic, etc. [company blog post](https://customers.microsoft.com/en-us/story/816179-carhartt-retailers-azure) #### 18. Streaming music recommendations XX You’re a member of a team working on the music recommendation system of a music streaming platform. Previously, your team has primarily focused on optimizing recommendations for user satisfaction, which is measured implicitly as time spent listening on the platform. The company has received complaints from several artists that their music is not getting enough exposure, many of whom belong to groups that are historically underrepresented in the music industry. Your team decides to work on improving the recommendation system to allow for more diverse recommendations. [Ferraro et al. (2019)](https://arxiv.org/pdf/1911.04827.pdf) #### 19. Deciding the credit card limit You work for a bank as a data scientist. You're tasked with building a system that decides the credit limit for new credit cards. Inspired by [Apple Card](https://hbswk.hbs.edu/item/gender-bias-complaints-against-apple-card-signal-a-dark-side-to-fintech) - stakeholders: credit card holders, bank (employees) - harms: receiving a lower credit limit may restrict the opportunities of the credit card holder by preventing them from being able to afford things - questions: Is this how it works in real life? Or is this part of the application itself? Need to consult with subject matter experts. #### 20. School choice X You work as a data scientist for a large school district. Your task is to create a system that assigns children to schools based on their (parents') preferences. Inspired by [Edweek](https://www.edweek.org/ew/articles/2013/12/04/13algorithm_ep.h33.html) - stakeholders: children, parents, data scientists - note: see section on NYC school assignment in [AI Now 2019 Report](https://ainowinstitute.org/ads-shadowreport-2019.pdf) for more history on this in NYC, with links to critiques of racial and socioteconomic segregeation, subsequent legislation, task force around algorithmic transparency, etc. see also [High School Choice in New York City:A Report on the School Choices and Placements of Low-Achieving Students (Nathanson et al. 2013)](https://research.steinhardt.nyu.edu/scmsAdmin/media/users/ggg5/HSChoiceReport-April2013.pdf) for a critique of an older high school assignment algorithm in NYC, which is overlaid over a [longer history of segregation](https://civilrightsproject.ucla.edu/research/k-12-education/integration-and-diversity/ny-norflet-report-placeholder/Kucsera-New-York-Extreme-Segregation-2014.pdf). #### 21. Hate speech detection You work for a social network as a data scientist. Your task is to build a system to identify hate speech so that the network can notify/warn users before reading the hate speech, or potentially block it. Inspired by [TheRegister](https://www.theregister.com/2019/10/11/ai_black_people/), perhaps somewhat related is toxicity, inspired by [Medium](https://medium.com/@carolinesinders/toxicity-and-tone-are-not-the-same-thing-analyzing-the-new-google-api-on-toxicity-perspectiveapi-14abe4e728b3) - stakeholders: users of social network (both content creators and consumers), data scientist - harms: false positive mean that benign posts are flagged as hate speech, false negatives mean that actual hate speech isn't flagged as such - notes: There's a lot of overhead for the social network to manually decide what is hate speech. Hate speech detection itself is very much a NLP task, but it's possible that disparities between groups could be mitigated by postprocessing probabilities. - note: the scenario mentioned in the article uses a third party service, Google's Perspective API, rather than developing a system in-house. Various kinds of fairness audits have been conducted and written about it, and the Perspective API itself has a [public model card](https://medium.com/the-false-positive/increasing-transparency-in-machine-learning-models-311ee08ca58a) - There's another open source MSR project called [CheckList](https://github.com/marcotcr/checklist) that might be applicable for some of these low-level kinds of fairness checks (eg, particular identity statements like "I am gay" returning negative sentiment). #### 22. Predicting who needs special attention in healthcare X X X You work for a hospital network to create a system that should predict to which patients healthcare professionals should pay special attention. Inspired by [TheVerge](https://www.theverge.com/2019/10/24/20929337/care-algorithm-study-race-bias-health) - stakeholders: patients, healthcare professionals, data scientists - harms: False positive = somebody who doesn't need special attention gets special attention (perhaps unnecessary effort) and this care is potentially taken away from somebody else who needs it; false negative = somebody who needed special attention doesn't get it (potentially severe health consequences) - questions: need to find out what percentage of patients actually get special attention, and overall more details on such an application; Can we somehow get a dataset? - sociotechnical: "special attention" needs to be more concrete; also, does implementing the algorithm provide more budget for additional staffing? Does funding for the algorithmic system come with funding for increased capacity for "special attention," or does this function as a new kind of pressure for how to allocate staffing attention that will have to compete with other existing pressures? - sociotechnical: see [Yang et al. (2016)](https://www.researchgate.net/publication/292320820_Investigating_the_Heart_Pump_Implant_Decision_Process_Opportunities_for_Decision_Support_Tools_to_Help) for a case study of a heart pump implant decision that found "lack of perceived need for and trust of machine intelligence, as well as many barriers to computer use at the point of clinical decision-making," or [Estimate the hidden deployment cost of predictive models to improve patient care (Morse et. al. 2020)](https://www.nature.com/articles/s41591-019-0651-8) for some caution on what it takes to actually deploy these kinds of models in a way that impacts patient outcomes. [Sendak et al. (2020)](https://www.nature.com/articles/s41746-020-0253-3.pdf) describes "Model Facts," a medical variant on model cards. - sociotechnical: More broadly, digitization of human service data is often [incredibly challenging](https://www.motherjones.com/politics/2015/10/epic-systems-judith-faulkner-hitech-ehr-interoperability/), with [data quality](https://fortune.com/longform/medical-records/) and [overhead of data entry](https://www.selecthub.com/medical-software/emr/electronic-medical-records-future-emr-trends/) as core issues. Common assumptions about scalable cost efficiencies in research papers (eg, [Rajkomar et al. 2018)](https://arxiv.org/ftp/arxiv/papers/1801/1801.07860.pdf) have not been validated by field research (eg, [Soron and Collins (2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5596299/)). - note: see [Baker et al. (2020)](https://alexhanna.github.io/algo-identity/) for some interactives on "administrative violence" in the healthcare system (eg, related to gender identity). - note: example of [fairness analysis](https://storage.googleapis.com/covid-external/COVID-19ForecastFairnessAnalysis.pdf) [whitepaper](https://storage.googleapis.com/covid-external/COVID-19ForecastWhitePaper.pdf) re: COVID forecasting. starts with citing existing disparite impact, and focuses on absolute errors by subgroups, but binned into quartiles of counties (partially because of data sources). also alludes in passing to differential costs of over- and under-prediction (described in papers but hidden from "prediction" CSVs or UIs) - See [ml4health](https://ml4health.github.io/) - specific example to build on: https://ai.googleblog.com/2020/08/using-machine-learning-to-detect.html *Next step: Start with either a) finding a specific deployment context and writing it into a paragraph, or b) finding where there are significant real harms in healthcare, and explore in that direction.* ---- ## Historical notes and links Here's what's happened so far. 1. **Research paper examples**. The initial research papers used abstracted datasets, and ran some experiments to demonstrate the approach empirically. These included the "UCI credit card" dataset and the "COMPAS" dataset, but these didn't engage with sociotechnical context. 2. **Critiques**. We tried exploring some of the sociotechnical context around those initial examples (eg, https://github.com/fairlearn/fairlearn/issues/413 and then more detailed explorations of credit card applications in https://github.com/fairlearn/fairlearn/issues/418, consumer lending in https://github.com/fairlearn/fairlearn/pull/492, and pre-trial detention in https://github.com/fairlearn/fairlearn/issues/478). The conclusion has been mostly that these deployment contexts may not be the best illustration of the project's core value that fairness is a sociotechnical challenge. 3. **"How to talk and write about fairness"**. We wrote up a document with aspirations for how the team would talk about fairness on the project ([link](https://fairlearn.github.io/contributor_guide/how_to_talk_about_fairness.html)). This was difficult to use in practice, and in practice the document wasn't influencing how we were talking or writing. We also found that the existing example notebooks were not reflecting these kinds of project values. 4. **"Contributing example notebooks"**. We developed a [microrubric for critiques](https://github.com/fairlearn/fairlearn/pull/490) to try to make a more concise and usable checklist. This becamse part of the contributor guide for [Contributing example notebooks](https://fairlearn.github.io/contributor_guide/contributing_example_notebooks.html). 5. **"Seed scenarios"**. To explore other deployment contexts where we might illustrate the value of the Fairlearn Python library, we created and discussed a set of "seed scenarios" as potential candidates (see https://github.com/fairlearn/fairlearn/pull/491). This led to productive discussion about fairness as as sociotechnical challenges, but concerns about whether Fairlearn was an appropriate choice for reducing real harms in any of these deployment contexts. It also raised questions about whether the team would be able to do this kind of work on its own, without other kinds of interdisciplinary collaboration.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully