SWE-Bench Pro
Source: labs.scale.com
Published:
<p>Evaluating challenging long-horizon software engineering tasks in public open source repositories</p> <p>SWE-Bench Pro is a benchmark designed to provide a rigorous and realistic evaluation of AI agents for software engineering. It was developed to address several limitations in existing benchmar