lm_eval/tasks/leaderboard/musr/_musr.yaml (9 lines of code) (raw):

group: leaderboard_musr task: - leaderboard_musr_murder_mysteries - leaderboard_musr_object_placements - leaderboard_musr_team_allocation aggregate_metric_list: - metric: acc_norm aggregation: mean weight_by_size: true