Canva:
https://www.canva.com/design/DAG15819YvY/f2FaiC8FRwcmD4hZnQnrhw/edit?utm_content=DAG15819YvY&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton
Paper Reading:
- Analyzing 16,193 LLM Papers for Fun and Profits
- https://arxiv.org/html/2504.08619v1
- Survey on Evaluation of LLM-based Agents
- https://arxiv.org/pdf/2503.16416
- Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering
- https://dl.acm.org/doi/abs/10.1145/3728963
- Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
- https://openreview.net/forum?id=PNRznmmWP7
- ICML 2025
ICLR 2025:
https://openreview.net/group?id=ICLR.cc/2025/Conference#tab-accept-oral
- LIMITS TO SCALABLE EVALUATION AT THE FRONTIER:
LLM AS JUDGE WON’T BEAT TWICE THE DATA
- https://openreview.net/forum?id=NO6Tv6QcDs
- ICLR 2025 (Oral)
- RocketEval: Efficient automated LLM evaluation via grading checklist
- https://iclr.cc/virtual/2025/poster/27674
- ICLR 2025 (Poster)
- Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
- https://iclr.cc/virtual/2025/33388
- ICLR 2025, Workshop
- Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation