# SoTA CF, XAI & RAI
Papers on CF & XAI:
- [CFs survey, '22](https://dl.acm.org/doi/pdf/10.1145/3527848) informative table comparing existing approaches, see pg 95:5
- [CFs & skewed intuition, '22](https://arxiv.org/abs/2205.06241)
- [Deficits to rectify CFs, '22](https://arxiv.org/pdf/2103.01035.pdf): argues that choice of metrics, similarity, nearest criteria are not "intuitive" for the user; it calls for better testing and focus on the user.
- [Rule based explanations and CFs, '22](https://arxiv.org/abs/2210.17071)
- [CFs review, '22](https://link.springer.com/article/10.1007/s10618-022-00831-6)
- [CFDB, '22](https://dl.acm.org/doi/10.1145/3514221.3520162) i cannot access the pdf, i watched this [youtube video](https://www.youtube.com/watch?v=-KD9iOkcb7M&ab_channel=WomeninDataScience)
- [Quality Counterfactual Explanations in Real Time, '21](https://arxiv.org/abs/2101.01292)
- [A survey of algorithmic recourse: definitions, formulations, solutions, and prospects, '21](https://arxiv.org/abs/2010.04050)
- [Harm & ML](https://arxiv.org/pdf/1901.10002.pdf)
- [Feature attribution & CF, '21](https://arxiv.org/pdf/2011.04917.pdf)
- [CFs & challenges, '20](https://ceur-ws.org/Vol-2301/paper_20.pdf)
- [impossibility theorem fairness ML](https://arxiv.org/abs/2007.06024)
- [FACE, '19](https://arxiv.org/abs/1909.09369)
Other resources:
- [Molnar's interpretable ML book chapter](https://christophm.github.io/interpretable-ml-book/counterfactual.html)
- [Google tutorial](https://sites.google.com/view/kdd-2021-counterfactual?pli=1#h.cat78h636w5)
- [what if tool](https://pair-code.github.io/what-if-tool/learn/tutorials/walkthrough/) and [github](https://github.com/pair-code/what-if-tool)
Books, docs on XAI:
- [dataiku](https://pages.dataiku.com/oreilly-responsible-ai)
- [h2o](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/explain.html)
Tools for XAI:
- LIME, SHAP
- Captum
- AIX360
- [alibi](https://github.com/SeldonIO/alibi)
For CF, there's also:
- [DICE](https://github.com/interpretml/dice)
- [refs from Molnar's chapter](https://christophm.github.io/interpretable-ml-book/counterfactual.html#example-software)
- [carla, for benchmarking, '21](https://github.com/carla-recourse/CARLA)
- [CF & SAI](https://www.alignmentforum.org/s/pcdHisDEGLbxrbSHD), see also [causal incentives working group](https://causalincentives.com/)
- [CF conf](https://www.causalaiconference.com/)
- [causalens whitepapers](https://www.causalens.com/white-papers/?_gl=1*vavrvn*_up*MQ..&gclid=Cj0KCQjwnrmlBhDHARIsADJ5b_lfmkVoN)
- [PyMC-experimental](https://www.pymc.io/projects/examples/en/latest/causal_inference/interventional_distribution.html) and [CausalPy](https://github.com/pymc-labs/CausalPy)
- [causalpython](https://causalpython.io/#holiday-resources)
- [causal ai lab](https://causalai.net/) with a recent tuto on [causal fairness analysis](https://fairness.causalai.net/)
- [CF fairness at the turing institute](https://www.turing.ac.uk/research/research-projects/counterfactual-fairness)
- [the institute of ethical ai](https://ethical.institute/index.html)
- [causality & time series workshop](https://sites.google.com/view/ci4ts2023/accepted-papers)
- [causality school](https://quarter-on-causality.github.io/tools/)
Kaggle training:
- [explainable ML](https://www.kaggle.com/learn/machine-learning-explainability)
- [intro ethics & AI](https://www.kaggle.com/learn/intro-to-ai-ethics)
For RAI:
- [dataiku RAI](https://academy.dataiku.com/page/responsible-ai) or [tutos](https://knowledge.dataiku.com/latest/ml-analytics/responsible-ai/index.html#tutorials)
- [fairness book](https://fairmlbook.org/) with [a chapter on causality](https://fairmlbook.org/causal.html)
- [fairness tree](http://www.datasciencepublicpolicy.org/our-work/tools-guides/aequitas/)
## Interpretability
- [all sorts of colab notebooks](https://alignmentjam.com/interpretability)
## AI Observability & Monitoring
- [evidentlyAI](https://www.evidentlyai.com/); [tuto mlcards in evidently](https://www.evidentlyai.com/blog/ml-model-card-tutorial)
- [whylabs](https://whylabs.ai/)
- [dataheroes](https://dataheroes.ai/)
- [uptrain](https://github.com/uptrain-ai/uptrain)
- [censius](https://censius.ai/)
- [(partial) landscape](https://ai-infrastructure.org/ai-infrastructure-landscape/)
## Ethics
- [openai blogpost](https://openai.com/blog/how-should-ai-systems-behave?utm_source=tldrai)
- [openai and education](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4354422)
- [ethics fastai](https://ethics.fast.ai/syllabus/index.html)
- [anthropic and safety](https://www.anthropic.com/index/core-views-on-ai-safety)
- [meta blogpost](https://ai.facebook.com/blog/responsible-ai-progress-meta-2022/)
## Safety & AI
- [Toward Trustworthy AI Development'20](https://arxiv.org/pdf/2004.07213.pdf)
- [Harald Ruess & Simon Burton'22](https://arxiv.org/pdf/2201.10436.pdf)
- [standard proposal UL4600](https://www.ecr.ai/ul4600/)
- [AI incident database "indexes the collective history of harms or near harms realized in the real world by the deployment of artificial intelligence systems"](https://incidentdatabase.ai/)
- [OECD AI incidents monitor](https://oecd.ai/en/incidents)
- [owasp](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
- [resources on AI risks](https://www.simeon.ai/resources-on-ai-risks)
- [an analysis of AI act](https://publications.jrc.ec.europa.eu/repository/handle/JRC132833)
- [far.ai](https://far.ai/research/publications/)
- [metr](https://github.com/METR)
- [chai](https://humancompatible.ai/research)
- [tools&metrics from oecd.ai](https://oecd.ai/en/catalogue/overview)
- [alignmentforum](https://www.alignmentforum.org/)
- [Center for AI Safety](https://www.safe.ai/work/research)
- [AI Standards Lab](https://www.aistandardslab.org/)
- [Applied AI institute](https://transferlab.ai/aois/trustworthy-and-interpretable-ml/)
- [centre pour la securite de l'IA](https://www.securite-ia.fr/)
- [giskard](https://www.giskard.ai/)
- [safeguarded ai, davidad](https://www.aria.org.uk/programme-safeguarded-ai/)
- [guaranteed safe ai](https://arxiv.org/abs/2405.06624)
- [rigorllm](https://arxiv.org/abs/2403.13031)
- [Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation](https://arxiv.org/abs/2311.03348)
- [ignore prev prompt'22](https://arxiv.org/abs/2211.09527)
- [awesome ai safety](https://github.com/Giskard-AI/awesome-ai-safety)
## Fairness
- [awesome fairness in ai](https://github.com/datamllab/awesome-fairness-in-ai)
- [awesome fairness papers](https://github.com/uclanlp/awesome-fairness-papers)
- [bias reduction in LLMs](https://github.com/cyp-jlu-ai/ba-lora) with [paper](https://paperswithcode.com/paper/bias-aware-low-rank-adaptation-mitigating)
## LLMs
- [awesome llms](https://github.com/Hannibal046/Awesome-LLM)
- [Improved Few-Shot Jailbreaking Can Circumvent
Aligned Language Models and Their Defenses](https://arxiv.org/pdf/2406.01288)
- [ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs](https://arxiv.org/pdf/2402.11753) with [code](https://github.com/uw-nsl/ArtPrompt)
- [harmbench](https://www.harmbench.org/)
## MLSecOps
- [awesome mlsecops](https://github.com/RiccardoBiosas/awesome-MLSecOps)
## Programming LLMs
- [DSPy](https://github.com/stanfordnlp/dspy)
## Data anonymization, synthetic data
- [neosync](https://github.com/nucleuscloud/neosync)
- [gretel](https://gretel.ai/blog/how-to-create-high-quality-synthetic-data-for-fine-tuning-llms)
- [scaling synthetic data](https://arxiv.org/abs/2406.20094)
- [private synthetic data generation](https://arxiv.org/abs/2401.18024)
## AI safety jobs
- [ai safety jobs](https://www.aisafety.com/jobs)
## AI safety landscape map
- [ai safety landscape map](https://www.aisafety.com/landscape-map)