SoTA CF, XAI & RAI

# SoTA CF, XAI & RAI Papers on CF & XAI: - [CFs survey, '22](https://dl.acm.org/doi/pdf/10.1145/3527848) informative table comparing existing approaches, see pg 95:5 - [CFs & skewed intuition, '22](https://arxiv.org/abs/2205.06241) - [Deficits to rectify CFs, '22](https://arxiv.org/pdf/2103.01035.pdf): argues that choice of metrics, similarity, nearest criteria are not "intuitive" for the user; it calls for better testing and focus on the user. - [Rule based explanations and CFs, '22](https://arxiv.org/abs/2210.17071) - [CFs review, '22](https://link.springer.com/article/10.1007/s10618-022-00831-6) - [CFDB, '22](https://dl.acm.org/doi/10.1145/3514221.3520162) i cannot access the pdf, i watched this [youtube video](https://www.youtube.com/watch?v=-KD9iOkcb7M&ab_channel=WomeninDataScience) - [Quality Counterfactual Explanations in Real Time, '21](https://arxiv.org/abs/2101.01292) - [A survey of algorithmic recourse: definitions, formulations, solutions, and prospects, '21](https://arxiv.org/abs/2010.04050) - [Harm & ML](https://arxiv.org/pdf/1901.10002.pdf) - [Feature attribution & CF, '21](https://arxiv.org/pdf/2011.04917.pdf) - [CFs & challenges, '20](https://ceur-ws.org/Vol-2301/paper_20.pdf) - [impossibility theorem fairness ML](https://arxiv.org/abs/2007.06024) - [FACE, '19](https://arxiv.org/abs/1909.09369) Other resources: - [Molnar's interpretable ML book chapter](https://christophm.github.io/interpretable-ml-book/counterfactual.html) - [Google tutorial](https://sites.google.com/view/kdd-2021-counterfactual?pli=1#h.cat78h636w5) - [what if tool](https://pair-code.github.io/what-if-tool/learn/tutorials/walkthrough/) and [github](https://github.com/pair-code/what-if-tool) Books, docs on XAI: - [dataiku](https://pages.dataiku.com/oreilly-responsible-ai) - [h2o](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/explain.html) Tools for XAI: - LIME, SHAP - Captum - AIX360 - [alibi](https://github.com/SeldonIO/alibi) For CF, there's also: - [DICE](https://github.com/interpretml/dice) - [refs from Molnar's chapter](https://christophm.github.io/interpretable-ml-book/counterfactual.html#example-software) - [carla, for benchmarking, '21](https://github.com/carla-recourse/CARLA) - [CF & SAI](https://www.alignmentforum.org/s/pcdHisDEGLbxrbSHD), see also [causal incentives working group](https://causalincentives.com/) - [CF conf](https://www.causalaiconference.com/) - [causalens whitepapers](https://www.causalens.com/white-papers/?_gl=1*vavrvn*_up*MQ..&gclid=Cj0KCQjwnrmlBhDHARIsADJ5b_lfmkVoN) - [PyMC-experimental](https://www.pymc.io/projects/examples/en/latest/causal_inference/interventional_distribution.html) and [CausalPy](https://github.com/pymc-labs/CausalPy) - [causalpython](https://causalpython.io/#holiday-resources) - [causal ai lab](https://causalai.net/) with a recent tuto on [causal fairness analysis](https://fairness.causalai.net/) - [CF fairness at the turing institute](https://www.turing.ac.uk/research/research-projects/counterfactual-fairness) - [the institute of ethical ai](https://ethical.institute/index.html) - [causality & time series workshop](https://sites.google.com/view/ci4ts2023/accepted-papers) - [causality school](https://quarter-on-causality.github.io/tools/) Kaggle training: - [explainable ML](https://www.kaggle.com/learn/machine-learning-explainability) - [intro ethics & AI](https://www.kaggle.com/learn/intro-to-ai-ethics) For RAI: - [dataiku RAI](https://academy.dataiku.com/page/responsible-ai) or [tutos](https://knowledge.dataiku.com/latest/ml-analytics/responsible-ai/index.html#tutorials) - [fairness book](https://fairmlbook.org/) with [a chapter on causality](https://fairmlbook.org/causal.html) - [fairness tree](http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/) ## Interpretability - [all sorts of colab notebooks](https://alignmentjam.com/interpretability) ## AI Observability & Monitoring - [evidentlyAI](https://www.evidentlyai.com/); [tuto mlcards in evidently](https://www.evidentlyai.com/blog/ml-model-card-tutorial) - [whylabs](https://whylabs.ai/) - [dataheroes](https://dataheroes.ai/) - [uptrain](https://github.com/uptrain-ai/uptrain) - [censius](https://censius.ai/) - [(partial) landscape](https://ai-infrastructure.org/ai-infrastructure-landscape/) ## Ethics - [openai blogpost](https://openai.com/blog/how-should-ai-systems-behave?utm_source=tldrai) - [openai and education](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4354422) - [ethics fastai](https://ethics.fast.ai/syllabus/index.html) - [anthropic and safety](https://www.anthropic.com/index/core-views-on-ai-safety) - [meta blogpost](https://ai.facebook.com/blog/responsible-ai-progress-meta-2022/) ## Safety & AI - [Toward Trustworthy AI Development'20](https://arxiv.org/pdf/2004.07213.pdf) - [Harald Ruess & Simon Burton'22](https://arxiv.org/pdf/2201.10436.pdf) - [standard proposal UL4600](https://www.ecr.ai/ul4600/) - [AI incident database "indexes the collective history of harms or near harms realized in the real world by the deployment of artificial intelligence systems"](https://incidentdatabase.ai/) - [OECD AI incidents monitor](https://oecd.ai/en/incidents) - [owasp](https://owasp.org/www-project-top-10-for-large-language-model-applications/) - [resources on AI risks](https://www.simeon.ai/resources-on-ai-risks) - [an analysis of AI act](https://publications.jrc.ec.europa.eu/repository/handle/JRC132833) - [far.ai](https://far.ai/research/publications/) - [metr](https://github.com/METR) - [chai](https://humancompatible.ai/research) - [tools&metrics from oecd.ai](https://oecd.ai/en/catalogue/overview) - [alignmentforum](https://www.alignmentforum.org/) - [Center for AI Safety](https://www.safe.ai/work/research) - [AI Standards Lab](https://www.aistandardslab.org/) - [Applied AI institute](https://transferlab.ai/aois/trustworthy-and-interpretable-ml/) - [centre pour la securite de l'IA](https://www.securite-ia.fr/) - [giskard](https://www.giskard.ai/) - [safeguarded ai, davidad](https://www.aria.org.uk/programme-safeguarded-ai/) - [guaranteed safe ai](https://arxiv.org/abs/2405.06624) - [rigorllm](https://arxiv.org/abs/2403.13031) - [Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation](https://arxiv.org/abs/2311.03348) - [ignore prev prompt'22](https://arxiv.org/abs/2211.09527) - [awesome ai safety](https://github.com/Giskard-AI/awesome-ai-safety) ## Fairness - [awesome fairness in ai](https://github.com/datamllab/awesome-fairness-in-ai) - [awesome fairness papers](https://github.com/uclanlp/awesome-fairness-papers) - [bias reduction in LLMs](https://github.com/cyp-jlu-ai/ba-lora) with [paper](https://paperswithcode.com/paper/bias-aware-low-rank-adaptation-mitigating) ## LLMs - [awesome llms](https://github.com/Hannibal046/Awesome-LLM) - [Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses](https://arxiv.org/pdf/2406.01288) - [ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs](https://arxiv.org/pdf/2402.11753) with [code](https://github.com/uw-nsl/ArtPrompt) - [harmbench](https://www.harmbench.org/) ## MLSecOps - [awesome mlsecops](https://github.com/RiccardoBiosas/awesome-MLSecOps) ## Programming LLMs - [DSPy](https://github.com/stanfordnlp/dspy) ## Data anonymization, synthetic data - [neosync](https://github.com/nucleuscloud/neosync) - [gretel](https://gretel.ai/blog/how-to-create-high-quality-synthetic-data-for-fine-tuning-llms) - [scaling synthetic data](https://arxiv.org/abs/2406.20094) - [private synthetic data generation](https://arxiv.org/abs/2401.18024) ## AI safety jobs - [ai safety jobs](https://www.aisafety.com/jobs) ## AI safety landscape map - [ai safety landscape map](https://www.aisafety.com/landscape-map)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.