ShareFolder - HackMD

Canva: https://www.canva.com/design/DAG15819YvY/f2FaiC8FRwcmD4hZnQnrhw/edit?utm_content=DAG15819YvY&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton Notion: https://obtainable-aardwolf-d41.notion.site/Evaluate-Sharing-2bf31c02fb3f80d6a1e5f2fe51a59207?source=copy_link Paper Reading: - Analyzing 16,193 LLM Papers for Fun and Profits - https://arxiv.org/html/2504.08619v1 - Survey on Evaluation of LLM-based Agents - https://arxiv.org/pdf/2503.16416 - Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering - https://dl.acm.org/doi/abs/10.1145/3728963 - Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge - https://openreview.net/forum?id=PNRznmmWP7 - ICML 2025 ICLR 2025: https://openreview.net/group?id=ICLR.cc/2025/Conference#tab-accept-oral - LIMITS TO SCALABLE EVALUATION AT THE FRONTIER: LLM AS JUDGE WON’T BEAT TWICE THE DATA - https://openreview.net/forum?id=NO6Tv6QcDs - ICLR 2025 (Oral) - RocketEval: Efficient automated LLM evaluation via grading checklist - https://iclr.cc/virtual/2025/poster/27674 - ICLR 2025 (Poster) - Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates - https://iclr.cc/virtual/2025/33388 - ICLR 2025, Workshop - Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation