lmms_eval/tasks/seedbench/seedbench.yaml (26 lines of code) (raw):

dataset_path: lmms-lab/SEED-Bench dataset_kwargs: token: True task: "seedbench" test_split: test output_type: generate_until doc_to_visual: !function utils.seed_doc_to_visual doc_to_text: !function utils.seed_doc_to_text doc_to_target: "answer" generation_kwargs: until: - "ASSISTANT:" image_aspect_ratio: original # The return value of process_results will be used by metrics process_results: !function utils.seed_process_result # Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results metric_list: - metric: seed_image aggregation: !function utils.seed_aggregation_result higher_is_better: true - metric: seed_video aggregation: !function utils.seed_aggregation_result higher_is_better: true - metric: seed_all aggregation: !function utils.seed_aggregation_result higher_is_better: true metadata: - version: 0.0