lmms_eval/tasks/mmbench/mmbench_cn_dev.yaml (10 lines of code) (raw):
task: "mmbench_cn_dev"
test_split: "dev"
metric_list:
- metric: gpt_eval_score
aggregation: !function cn_utils.mmbench_aggregate_dev_results_eval
higher_is_better: true
- metric: submission
higher_is_better: true
aggregation: !function cn_utils.mmbench_aggregate_dev_results
include: _default_template_mmbench_cn_yaml