amazon-research / datatuner
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 35 files with 4,158 lines of code.
    • 0 very long files (0 lines of code)
    • 2 long files (1,451 lines of code)
    • 4 medium size files (1,169 lines of codeclsfd_ftr_w_mp_ins)
    • 5 small files (736 lines of code)
    • 24 very small files (802 lines of code)
0% | 34% | 28% | 17% | 19%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 35% | 28% | 17% | 18%
yml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
src/datatuner/lm0% | 37% | 27% | 17% | 17%
src/datatuner/classification0% | 55% | 20% | 16% | 7%
paper/experiments/mturk0% | 0% | 87% | 0% | 12%
paper/experiments/webnlg0% | 0% | 0% | 91% | 8%
paper/experiments/viggo0% | 0% | 0% | 0% | 100%
src/datatuner0% | 0% | 0% | 0% | 100%
paper/experiments/ldc0% | 0% | 0% | 0% | 100%
ROOT0% | 0% | 0% | 0% | 100%
src/datatuner/ops0% | 0% | 0% | 0% | 100%
Longest Files (Top 35)
File# lines# units
evaluate.py
in src/datatuner/lm
857 12
run_classifier.py
in src/datatuner/classification
594 6
train.py
in src/datatuner/lm
379 1
prepare_mturk.py
in paper/experiments/mturk
327 6
data_loader.py
in src/datatuner/lm
246 6
classify_generated.py
in src/datatuner/classification
217 4
distractors.py
in src/datatuner/classification
177 9
metrics.py
in src/datatuner/lm
168 10
preprocess.py
in paper/experiments/webnlg
168 6
cross_entropy.py
in src/datatuner/lm
121 10
model_loader.py
in src/datatuner/lm
102 5
preprocess.py
in paper/experiments/viggo
84 2
utils.py
in src/datatuner/lm
77 8
reranker.py
in src/datatuner/lm
72 5
utils.py
in src/datatuner
69 13
novograd.py
in src/datatuner/lm
62 3
preprocess.py
in paper/experiments/ldc
57 2
custom_tokenizer.py
in src/datatuner/lm
55 1
text_stats.py
in paper/experiments/mturk
45 2
consistency_processor.py
in src/datatuner/classification
43 7
environment.yml
in root
42 -
special_token_generator.py
in src/datatuner/lm
41 4
process_json.py
in src/datatuner/lm
36 3
consistency_classifier.py
in src/datatuner/classification
36 3
custom_gpt2.py
in src/datatuner/lm
28 1
webnlg_utils.py
in paper/experiments/webnlg
15 2
setup.py
in root
13 -
launch_tokenizer.py
in src/datatuner/lm
10 1
mlflow.py
in src/datatuner/ops
9 2
converters.py
in src/datatuner/lm
3 1
__init__.py
in src/datatuner/lm
1 -
__init__.py
in src/datatuner
1 -
__init__.py
in paper/experiments/viggo
1 -
__init__.py
in paper/experiments/ldc
1 -
__init__.py
in paper/experiments/webnlg
1 -
Files With Most Units (Top 20)
File# lines# units
utils.py
in src/datatuner
69 13
evaluate.py
in src/datatuner/lm
857 12
metrics.py
in src/datatuner/lm
168 10
cross_entropy.py
in src/datatuner/lm
121 10
distractors.py
in src/datatuner/classification
177 9
utils.py
in src/datatuner/lm
77 8
consistency_processor.py
in src/datatuner/classification
43 7
data_loader.py
in src/datatuner/lm
246 6
run_classifier.py
in src/datatuner/classification
594 6
preprocess.py
in paper/experiments/webnlg
168 6
prepare_mturk.py
in paper/experiments/mturk
327 6
reranker.py
in src/datatuner/lm
72 5
model_loader.py
in src/datatuner/lm
102 5
special_token_generator.py
in src/datatuner/lm
41 4
classify_generated.py
in src/datatuner/classification
217 4
process_json.py
in src/datatuner/lm
36 3
novograd.py
in src/datatuner/lm
62 3
consistency_classifier.py
in src/datatuner/classification
36 3
mlflow.py
in src/datatuner/ops
9 2
preprocess.py
in paper/experiments/viggo
84 2
Files With Long Lines (Top 3)

There are 3 files with lines longer than 120 characters. In total, there are 5 long lines.

File# lines# units# long lines
evaluate.py
in src/datatuner/lm
857 12 3
model_loader.py
in src/datatuner/lm
102 5 1
classify_generated.py
in src/datatuner/classification
217 4 1