amazon-research / datatuner

File Size

The distribution of size of files (measured in lines of code).

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

Learn more...

File Size Overall

There are 35 files with 4,158 lines of code.

0 very long files (0 lines of code)
2 long files (1,451 lines of code)
4 medium size files (1,169 lines of codeclsfd_ftr_w_mp_ins)
5 small files (736 lines of code)
24 very small files (802 lines of code)

Legend:

1001+

501-1000

201-500

101-200

1-100

explore: zoomable circles | sunburst | 3D view

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 35)

File	# lines	# units
evaluate.py in src/datatuner/lm	857	12
run_classifier.py in src/datatuner/classification	594	6
train.py in src/datatuner/lm	379	1
prepare_mturk.py in paper/experiments/mturk	327	6
data_loader.py in src/datatuner/lm	246	6
classify_generated.py in src/datatuner/classification	217	4
distractors.py in src/datatuner/classification	177	9
metrics.py in src/datatuner/lm	168	10
preprocess.py in paper/experiments/webnlg	168	6
cross_entropy.py in src/datatuner/lm	121	10
model_loader.py in src/datatuner/lm	102	5
preprocess.py in paper/experiments/viggo	84	2
utils.py in src/datatuner/lm	77	8
reranker.py in src/datatuner/lm	72	5
utils.py in src/datatuner	69	13
novograd.py in src/datatuner/lm	62	3
preprocess.py in paper/experiments/ldc	57	2
custom_tokenizer.py in src/datatuner/lm	55	1
text_stats.py in paper/experiments/mturk	45	2
consistency_processor.py in src/datatuner/classification	43	7
environment.yml in root	42	-
special_token_generator.py in src/datatuner/lm	41	4
process_json.py in src/datatuner/lm	36	3
consistency_classifier.py in src/datatuner/classification	36	3
custom_gpt2.py in src/datatuner/lm	28	1
webnlg_utils.py in paper/experiments/webnlg	15	2
setup.py in root	13	-
launch_tokenizer.py in src/datatuner/lm	10	1
mlflow.py in src/datatuner/ops	9	2
converters.py in src/datatuner/lm	3	1
__init__.py in src/datatuner/lm	1	-
__init__.py in src/datatuner	1	-
__init__.py in paper/experiments/viggo	1	-
__init__.py in paper/experiments/ldc	1	-
__init__.py in paper/experiments/webnlg	1	-

Files With Most Units (Top 20)

File	# lines	# units
utils.py in src/datatuner	69	13
evaluate.py in src/datatuner/lm	857	12
metrics.py in src/datatuner/lm	168	10
cross_entropy.py in src/datatuner/lm	121	10
distractors.py in src/datatuner/classification	177	9
utils.py in src/datatuner/lm	77	8
consistency_processor.py in src/datatuner/classification	43	7
data_loader.py in src/datatuner/lm	246	6
run_classifier.py in src/datatuner/classification	594	6
preprocess.py in paper/experiments/webnlg	168	6
prepare_mturk.py in paper/experiments/mturk	327	6
reranker.py in src/datatuner/lm	72	5
model_loader.py in src/datatuner/lm	102	5
special_token_generator.py in src/datatuner/lm	41	4
classify_generated.py in src/datatuner/classification	217	4
process_json.py in src/datatuner/lm	36	3
novograd.py in src/datatuner/lm	62	3
consistency_classifier.py in src/datatuner/classification	36	3
mlflow.py in src/datatuner/ops	9	2
preprocess.py in paper/experiments/viggo	84	2

Files With Long Lines (Top 3)

There are 3 files with lines longer than 120 characters. In total, there are 5 long lines.

File	# lines	# units	# long lines
evaluate.py in src/datatuner/lm	857	12	3
model_loader.py in src/datatuner/lm	102	5	1
classify_generated.py in src/datatuner/classification	217	4	1