openai / simple-evals

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

File Size Overall

Legend:

1001+

501-1000

201-500

101-200

1-100

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 15)

File	# lines	# units
simple_evals.py in root	224	1
drop_eval.py in root	220	20
common.py in root	168	11
mgsm_eval.py in root	144	6
run_multilingual_mmlu.py in root	134	1
mmlu_eval.py in root	113	2
simpleqa_eval.py in root	86	3
browsecomp_eval.py in root	83	5
responses_sampler.py in sampler	71	5
claude_sampler.py in sampler	68	5
chat_completion_sampler.py in sampler	64	5
gpqa_eval.py in root	61	2
o_chat_completion_sampler.py in sampler	52	5
math_eval.py in root	47	2
types.py in root	22	2

Files With Most Units (Top 15)

File	# lines	# units
drop_eval.py in root	220	20
common.py in root	168	11
mgsm_eval.py in root	144	6
browsecomp_eval.py in root	83	5
claude_sampler.py in sampler	68	5
o_chat_completion_sampler.py in sampler	52	5
responses_sampler.py in sampler	71	5
chat_completion_sampler.py in sampler	64	5
simpleqa_eval.py in root	86	3
types.py in root	22	2
gpqa_eval.py in root	61	2
mmlu_eval.py in root	113	2
math_eval.py in root	47	2
run_multilingual_mmlu.py in root	134	1
simple_evals.py in root	224	1

Files With Long Lines (Top 1)

There is only one file with lines longer than 120 characters. In total, there is only one long line.

File	# lines	# units	# long lines
simpleqa_eval.py in root	86	3	1

Correlations

File Size vs. Commits (all time): 15 points

		224.0	lines of code min: 22.0 average: 103.8 25th percentile: 61.0 median: 83.0 75th percentile: 144.0 max: 224.0
0	22.0
commits (all time) min: 1.0 \| average: 6.2 \| 25th percentile: 2.0 \| median: 6.0 \| 75th percentile: 7.0 \| max: 22.0

		224.0	lines of code min: 22.0 average: 103.8 25th percentile: 61.0 median: 83.0 75th percentile: 144.0 max: 224.0
0	6.0
contributors (all time) min: 1.0 \| average: 3.4 \| 25th percentile: 2.0 \| median: 3.0 \| 75th percentile: 5.0 \| max: 6.0

		224.0	lines of code min: 71.0 average: 121.6 25th percentile: 77.0 median: 86.0 75th percentile: 184.0 max: 224.0
0	8.0
commits (30d) min: 2.0 \| average: 5.2 \| 25th percentile: 2.5 \| median: 6.0 \| 75th percentile: 7.5 \| max: 8.0

		224.0	lines of code min: 71.0 average: 121.6 25th percentile: 77.0 median: 86.0 75th percentile: 184.0 max: 224.0
0	4.0
contributors (30d) min: 2.0 \| average: 2.4 \| 25th percentile: 2.0 \| median: 2.0 \| 75th percentile: 3.0 \| max: 4.0

		224.0	lines of code min: 71.0 average: 121.6 25th percentile: 77.0 median: 86.0 75th percentile: 184.0 max: 224.0
0	10.0
commits (90d) min: 2.0 \| average: 5.6 \| 25th percentile: 2.5 \| median: 6.0 \| 75th percentile: 8.5 \| max: 10.0

		224.0	lines of code min: 71.0 average: 121.6 25th percentile: 77.0 median: 86.0 75th percentile: 184.0 max: 224.0
0	5.0
contributors (90d) min: 2.0 \| average: 2.6 \| 25th percentile: 2.0 \| median: 2.0 \| 75th percentile: 3.5 \| max: 5.0