AI-ML·중요도 6·2026. 05. 16.·Dev.to

I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.

── KO ──────────────────

6개의 로컬 모델을 테스트한 결과, SmolLM3-3B가 50%의 점수를 기록했다.

이 아티클에서는 6개의 로컬 AI 모델을 실제 에이전트 작업에 대해 테스트한 결과를 다루고 있다. SmolLM3-3B는 코드 품질 기준에서 93.3%를 기록했지만, 에이전트 작업에서는 50%로 평가되었다. Phi-4-mini는 코드에서 90%를 기록했으나 에이전트 작업에서는 17%였다. 이러한 차이는 모델이 프로토콜을 따르는 능력을 평가하는 에이전트 작업에서 나타난다.

── EN ──────────────────

Tested 6 local models; SmolLM3-3B scored 50% on agent tasks.

This article discusses testing six local AI models on real agent tasks. SmolLM3-3B scored 93.3% on code quality benchmarks but only 50% on agent tasks. Phi-4-mini scored 90% in code but only 17% on agent tasks. The gap highlights the distinction between generating correct code and following complex protocols in agent tasks.

원문 보기 →목록으로