About 395,000 results
Open links in new tab
  1. NeurIPS Social Evaluating Agentic Systems: Bridging Research …

    Dec 5, 2025 · Through lightning talks, panel discussions, and networking, the event fosters an interactive exchange on how to meaningfully evaluate and benchmark the next generation of agentic …

  2. TRiSM for Agentic AI: A Review of Trust, Risk, and Security …

    2 days ago · The review concludes with a research roadmap for the responsible development and deployment of Agentic AI, highlighting key directions to align emerging systems with TRiSM …

  3. SEA Workshop @ NeurIPS 2025

    Please use the NeurIPS 2025 LaTeX style file; it includes a preprint option for non‑anonymous preprints posted online (see additional formatting details here). Submissions should be PDFs of ≤ 9 pages …

  4. Key Takeaways from NeurIPS: Agentic AI

    Mar 4, 2025 · The paper emphasizes the importance of addressing self-recognition bias to ensure unbiased LLM evaluations and enhance overall AI safety while also calling for further research to …

  5. NeurIPS Workshop 2024 - ML Safety

    This workshop aims to clarify key questions on the safety of agentic AI systems and foster a community of researchers working in this area. To this end, we have prepared a diverse and comprehensive …

  6. Licence to Scale: A Microservice Simulation Environment for ...

    Dec 2, 2025 · We use these characteristics to develop a microservice simulation environment that models the causal relations between CPU usage, memory usage, resource limits, and latency in …

  7. Evaluation and Safety of Agentic Systems - AgentForge Hub

    Oct 21, 2025 · This article breaks down emerging benchmarks, outlines practical safety drills, and shares the instrumentation strategy we recommend for every production-grade agent.

  8. MathWorks Showcases AI for Safety Critical Systems at NeurIPS 2025

    Dec 2, 2025 · Drawing on the company's experience working with engineers and scientists in developing AI-enabled safety-critical systems, he’ll demonstrate how to verify robustness, detect out-of …

  9. safety-for-agentic-ai/README.md at main - GitHub

    Start with model evaluation using garak vulnerability scanning with curated risk prompts, benchmarking against enterprise thresholds. Then, post-train using recipes and safety datasets to close critical …

  10. Evaluating Agentic AI Systems: A Deep Dive into Agentic Metrics ...

    Apr 14, 2025 · In this post, we explore the latest Agentic metrics introduced in the Azure AI Evaluation library, a Python library designed to assess generative AI systems with both traditional NLP metrics …