facebookresearch / SpanBERT
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 83 files with 11,541 lines of code.
    • 0 very long files (0 lines of code)
    • 6 long files (4,260 lines of code)
    • 10 medium size files (2,975 lines of codeclsfd_ftr_w_mp_ins)
    • 17 small files (2,346 lines of code)
    • 50 very small files (1,960 lines of code)
0% | 36% | 25% | 20% | 16%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 36% | 25% | 20% | 16%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
code0% | 81% | 15% | 3% | 0%
pretraining/fairseq/models0% | 81% | 0% | 0% | 18%
code/pytorch_pretrained_bert0% | 52% | 23% | 14% | 9%
pretraining/fairseq0% | 0% | 61% | 32% | 6%
pretraining/fairseq/data0% | 0% | 56% | 41% | 2%
pretraining0% | 0% | 45% | 31% | 22%
pretraining/fairseq/modules0% | 0% | 25% | 44% | 29%
pretraining/fairseq/optim0% | 0% | 0% | 33% | 66%
pretraining/fairseq/tasks0% | 0% | 0% | 60% | 39%
pretraining/fairseq/criterions0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
run_squad.py
in code
988 25
run_mrqa.py
in code
806 16
run_glue.py
in code
770 59
modeling.py
in code/pytorch_pretrained_bert
615 60
pair_bert.py
in pretraining/fairseq/models
558 59
hf_bert.py
in pretraining/fairseq/models
523 55
run_tacred.py
in code
487 14
span_bert_dataset.py
in pretraining/fairseq/data
350 17
utils.py
in pretraining/fairseq
302 26
options.py
in pretraining/fairseq
298 17
train.py
in pretraining
287 9
tokenization.py
in code/pytorch_pretrained_bert
274 20
trainer.py
in pretraining/fairseq
264 18
multihead_attention.py
in pretraining/fairseq/modules
253 20
masking.py
in pretraining/fairseq/data
242 20
indexed_dataset.py
in pretraining/fairseq/data
218 33
preprocess.py
in pretraining
200 5
no_nsp_span_bert_dataset.py
in pretraining/fairseq/data
200 13
downsampled_multihead_attention.py
in pretraining/fairseq/modules
197 8
file_utils.py
in code/pytorch_pretrained_bert
175 11
dictionary.py
in pretraining/fairseq/data
149 19
span_bert.py
in pretraining/fairseq/tasks
145 11
adaptive_softmax.py
in pretraining/fairseq/modules
136 10
progress_bar.py
in pretraining/fairseq
131 27
iterators.py
in pretraining/fairseq/data
131 22
bert_adam.py
in pretraining/fairseq/optim
123 9
fp16_optimizer.py
in pretraining/fairseq/optim
122 17
data_utils.py
in pretraining/fairseq/data
113 8
tokenizer.py
in pretraining/fairseq
112 7
bidirectional_multihead_attention.py
in pretraining/fairseq/modules
105 7
meters.py
in pretraining/fairseq
103 21
distributed_utils.py
in pretraining/fairseq
103 8
mrqa_official_eval.py
in code
101 8
optimization.py
in code/pytorch_pretrained_bert
100 6
fairseq_model.py
in pretraining/fairseq/models
98 20
mlm_nsp_sbo_loss.py
in pretraining/fairseq/criterions
85 3
spanbert_loss.py
in pretraining/fairseq/criterions
72 3
adam.py
in pretraining/fairseq/optim
67 5
bert_loss.py
in pretraining/fairseq/criterions
63 3
cosine_lr_scheduler.py
in pretraining/fairseq/optim/lr_scheduler
61 4
legacy_distributed_data_parallel.py
in pretraining/fairseq
59 5
fairseq_task.py
in pretraining/fairseq/tasks
59 12
multiprocessing_train.py
in pretraining
59 6
sinusoidal_positional_embedding.py
in pretraining/fairseq/modules
56 5
adaptive_input.py
in pretraining/fairseq/modules
55 4
adaptive_inputs.py
in pretraining/fairseq/modules
55 4
distributed_fairseq_model.py
in pretraining/fairseq/models
52 1
composite_loss.py
in pretraining/fairseq/criterions
52 8
bpe_tokenize.py
in pretraining
50 4
mlm_loss.py
in pretraining/fairseq/criterions
50 3
Files With Most Units (Top 20)
File# lines# units
modeling.py
in code/pytorch_pretrained_bert
615 60
run_glue.py
in code
770 59
pair_bert.py
in pretraining/fairseq/models
558 59
hf_bert.py
in pretraining/fairseq/models
523 55
indexed_dataset.py
in pretraining/fairseq/data
218 33
progress_bar.py
in pretraining/fairseq
131 27
utils.py
in pretraining/fairseq
302 26
run_squad.py
in code
988 25
iterators.py
in pretraining/fairseq/data
131 22
meters.py
in pretraining/fairseq
103 21
tokenization.py
in code/pytorch_pretrained_bert
274 20
fairseq_model.py
in pretraining/fairseq/models
98 20
multihead_attention.py
in pretraining/fairseq/modules
253 20
masking.py
in pretraining/fairseq/data
242 20
dictionary.py
in pretraining/fairseq/data
149 19
trainer.py
in pretraining/fairseq
264 18
options.py
in pretraining/fairseq
298 17
fp16_optimizer.py
in pretraining/fairseq/optim
122 17
span_bert_dataset.py
in pretraining/fairseq/data
350 17
run_mrqa.py
in code
806 16
Files With Long Lines (Top 18)

There are 18 files with lines longer than 120 characters. In total, there are 47 long lines.

File# lines# units# long lines
pair_bert.py
in pretraining/fairseq/models
558 59 10
masking.py
in pretraining/fairseq/data
242 20 8
span_bert_dataset.py
in pretraining/fairseq/data
350 17 4
bpe_tokenize.py
in pretraining
50 4 3
hf_bert.py
in pretraining/fairseq/models
523 55 3
modeling.py
in code/pytorch_pretrained_bert
615 60 2
run_squad.py
in code
988 25 2
run_mrqa.py
in code
806 16 2
span_bert.py
in pretraining/fairseq/tasks
145 11 2
multihead_attention.py
in pretraining/fairseq/modules
253 20 2
no_nsp_span_bert_dataset.py
in pretraining/fairseq/data
200 13 2
run_tacred.py
in code
487 14 1
run_glue.py
in code
770 59 1
optimization.py
in code/pytorch_pretrained_bert
100 6 1
options.py
in pretraining/fairseq
298 17 1
polynomial_decay_schedule.py
in pretraining/fairseq/optim/lr_scheduler
43 5 1
utils.py
in pretraining/fairseq
302 26 1
train.py
in pretraining
287 9 1