AI-ML·중요도 7·2026. 05. 28.·r/MachineLearning

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems [R]

── KO ──────────────────

에이전트의 수명 엔지니어링에 대한 연구 결과와 AgingBench 도구를 소개합니다.

이 논문은 배포된 에이전트들이 시간에 따라 어떻게 노화하는지를 연구한 결과를 제공합니다. AgingBench라는 새로운 벤치마크는 코딩 에이전트의 장기적인 성능을 측정하여, 모델 전환 시 성능 감소를 보여줍니다. Sonnet 4.6에서 Opus 4.7로의 전환이 평균 15%의 PyTest 통과율 감소를 나타냈으며, 이는 모델의 성능보다 메모리 정책에 따른 영향을 강조하고 있습니다. 결과적으로, '신규 모델 교체'가 항상 안전한 업그레이드 전략이 아님을 알립니다.

── EN ──────────────────

The study introduces agent lifespan engineering and the AgingBench tool for deployed systems.

This article explores how deployed agents age over time and presents findings from a longitudinal study. The new benchmark, AgingBench, measures the long-term performance of coding agents, demonstrating a 15% drop in PyTest pass rate when transitioning from Sonnet 4.6 to Opus 4.7. It emphasizes that the impact of memory policy is more significant than that of model performance alone. Ultimately, it warns that simply swapping in a newer model may not be a reliable upgrade strategy.

원문 보기 →목록으로