openai / mle-bench
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
GitHub Repo 
15K
lines of main code
382 files
0.3K
lines of test code
7 files
14K
lines of other code
422 files
<1y
age
208 days
100%
main code touched
1 year (15K LOC)
100%
new main code
1 year (15K LOC)
12K
py
2.3K
yaml
0.05K
toml
MJS
0.03K
mjs

2

10

2

6

2025 2024

generated by sokrates.dev (configuration) on 2025-05-04