Evals Will Break and You Won't See It Coming
Source: News.Ycombinator
Published:
<p>We're good at evaluating the models we have. We're much worse at evaluating the models we're to build — especially if they cross into a new capability regime.</p> <p>Most benchmarks, safety evals, and red-teaming protocols implicitly assume the model is a stronger version of the current one. If i