lm_eval/tasks/leaderboard/musr/_musr.yaml (9 lines of code) (raw):
group: leaderboard_musr
task:
- leaderboard_musr_murder_mysteries
- leaderboard_musr_object_placements
- leaderboard_musr_team_allocation
aggregate_metric_list:
- metric: acc_norm
aggregation: mean
weight_by_size: true