Tag: llm-evaluation

All the articles with the tag "llm-evaluation".

DeepEval: 코딩 에이전트 평가를 개발 루프 안으로 넣는 LLM eval harness

2026년 5월 14일

Pytest처럼 LLM 앱을 테스트하는 DeepEval을 딥다이브한다. 4.0 릴리스가 보여준 건 단순한 평가 도구가 아니라 코딩 에이전트용 피드백 루프다.