facebookresearch / DisCo
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 153 files with 14,239 lines of code.
    • 0 very long files (0 lines of code)
    • 1 long files (566 lines of code)
    • 16 medium size files (4,874 lines of codeclsfd_ftr_w_mp_ins)
    • 35 small files (5,192 lines of code)
    • 101 very small files (3,607 lines of code)
0% | 3% | 34% | 36% | 25%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 4% | 34% | 37% | 24%
cpp0% | 0% | 0% | 0% | 100%
lua0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
fairseq/models0% | 32% | 55% | 0% | 11%
fairseq0% | 0% | 64% | 21% | 13%
fairseq/data0% | 0% | 37% | 45% | 17%
fairseq/modules0% | 0% | 20% | 51% | 28%
fairseq/optim0% | 0% | 18% | 34% | 46%
ROOT0% | 0% | 18% | 68% | 13%
fairseq_cli0% | 0% | 24% | 69% | 5%
fairseq/tasks0% | 0% | 0% | 92% | 7%
scripts0% | 0% | 0% | 25% | 75%
fairseq/criterions0% | 0% | 0% | 0% | 100%
fairseq/strategies0% | 0% | 0% | 0% | 100%
fairseq/clib0% | 0% | 0% | 0% | 100%
fairseq/fb_tbmf_wrapper0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
transformer.py
in fairseq/models
566 36
bert_seq2seq.py
in fairseq/models
496 29
sequence_generator.py
in fairseq
442 14
options.py
in fairseq
423 19
utils.py
in fairseq
409 28
indexed_dataset.py
in fairseq/data
392 62
trainer.py
in fairseq
373 23
checkpoint_utils.py
in fairseq
265 11
multihead_attention.py
in fairseq/modules
256 14
disco_transformer.py
in fairseq/models
244 9
dictionary.py
in fairseq/data
241 28
fp16_optimizer.py
in fairseq/optim
233 32
fb_elmo_token_embedder.py
in fairseq/modules
231 10
train.py
in fairseq_cli
222 7
train.py
in root
222 7
fairseq_model.py
in fairseq/models
215 39
file_utils.py
in fairseq
210 12
preprocess.py
in root
200 5
preprocess.py
in fairseq_cli
200 5
progress_bar.py
in fairseq
197 36
downsampled_multihead_attention.py
in fairseq/modules
197 8
iterators.py
in fairseq/data
193 34
lightweight_convolution.py
in fairseq/modules
192 13
dynamic_convolution.py
in fairseq/modules
182 11
generate_disco.py
in root
179 5
search.py
in fairseq
172 13
language_pair_self_dataset_mask.py
in fairseq/data
171 13
adam.py
in fairseq/optim
164 9
eval_lm.py
in fairseq_cli
163 5
eval_lm.py
in root
163 5
data_utils.py
in fairseq/data
155 9
masked_multihead_attention.py
in fairseq/modules
148 1
generate.py
in root
144 3
transformer_sentence_encoder.py
in fairseq/modules
144 4
generate.py
in fairseq_cli
144 3
translation_self.py
in fairseq/tasks
142 7
interactive.py
in fairseq_cli
139 4
interactive.py
in root
139 4
translation.py
in fairseq/tasks
134 10
bmuf.py
in fairseq/optim
133 20
distributed_utils.py
in fairseq
133 9
adaptive_softmax.py
in fairseq/modules
133 10
adafactor.py
in fairseq/optim
131 10
character_token_embedder.py
in fairseq/modules
121 7
gpt2_bpe.py
in fairseq/data/encoders
120 11
language_pair_dataset.py
in fairseq/data
120 11
fairseq_task.py
in fairseq/tasks
118 20
rm_pt.py
in scripts
109 4
hub_utils.py
in fairseq
105 3
fb_bidirectional_multihead_attention.py
in fairseq/modules
104 8
Files With Most Units (Top 20)
File# lines# units
indexed_dataset.py
in fairseq/data
392 62
fairseq_model.py
in fairseq/models
215 39
transformer.py
in fairseq/models
566 36
progress_bar.py
in fairseq
197 36
iterators.py
in fairseq/data
193 34
fp16_optimizer.py
in fairseq/optim
233 32
bert_seq2seq.py
in fairseq/models
496 29
utils.py
in fairseq
409 28
dictionary.py
in fairseq/data
241 28
trainer.py
in fairseq
373 23
fairseq_task.py
in fairseq/tasks
118 20
bmuf.py
in fairseq/optim
133 20
options.py
in fairseq
423 19
fairseq_optimizer.py
in fairseq/optim
57 15
sequence_generator.py
in fairseq
442 14
multihead_attention.py
in fairseq/modules
256 14
meters.py
in fairseq
47 13
search.py
in fairseq
172 13
lightweight_convolution.py
in fairseq/modules
192 13
language_pair_self_dataset_mask.py
in fairseq/data
171 13
Files With Long Lines (Top 15)

There are 15 files with lines longer than 120 characters. In total, there are 28 long lines.

File# lines# units# long lines
generate_disco.py
in root
179 5 5
average_checkpoints.py
in scripts
102 3 4
transformer.py
in fairseq/models
566 36 3
mask_predict.py
in fairseq/strategies
52 4 2
strategy_utils.py
in fairseq/strategies
27 7 2
language_pair_self_dataset_mask.py
in fairseq/data
171 13 2
label_smoothed_cross_entropy.py
in fairseq/criterions
62 6 2
options.py
in fairseq
423 19 1
easy_first.py
in fairseq/strategies
84 4 1
adam.py
in fairseq/optim
164 9 1
disco_transformer.py
in fairseq/models
244 9 1
bert_seq2seq.py
in fairseq/models
496 29 1
dynamic_convolution.py
in fairseq/modules
182 11 1
simple_sinusoidal_positional_embedding.py
in fairseq/modules
18 2 1
average_best_checkpoints.py
in scripts
19 2 1