amazon-research / fact-check-summarization
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 289 files with 33,591 lines of code.
    • 1 very long files (1,130 lines of code)
    • 6 long files (3,792 lines of code)
    • 40 medium size files (11,886 lines of codeclsfd_ftr_w_mp_ins)
    • 70 small files (9,965 lines of code)
    • 172 very small files (6,818 lines of code)
3% | 11% | 35% | 29% | 20%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py3% | 11% | 35% | 28% | 20%
cu0% | 0% | 64% | 35% | 0%
cpp0% | 0% | 0% | 64% | 35%
pyx0% | 0% | 0% | 74% | 25%
cuh0% | 0% | 0% | 0% | 100%
lua0% | 0% | 0% | 0% | 100%
h0% | 0% | 0% | 0% | 100%
toml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
preprocess36% | 42% | 0% | 17% | 3%
fairseq/models0% | 24% | 48% | 18% | 8%
fairseq0% | 13% | 54% | 22% | 9%
fairseq/tasks0% | 0% | 52% | 42% | 4%
fairseq/data0% | 0% | 36% | 25% | 37%
fairseq/modules0% | 0% | 33% | 31% | 35%
fairseq_cli0% | 0% | 47% | 40% | 11%
fairseq/optim0% | 0% | 18% | 35% | 45%
fairseq/clib0% | 0% | 43% | 45% | 11%
fairseq/logging0% | 0% | 45% | 54% | <1%
scripts0% | 0% | 0% | 70% | 29%
fairseq/criterions0% | 0% | 0% | 44% | 55%
ROOT0% | 0% | 0% | 53% | 46%
fairseq/benchmark0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
sm_inference_asum.py
in preprocess
1130 16
evaluate_hypo.py
in preprocess
792 13
transformer.py
in fairseq/models
744 39
wav2vec.py
in fairseq/models
629 25
lightconv.py
in fairseq/models
554 29
trainer.py
in fairseq
543 29
data_prepro_clean.py
in preprocess
530 16
sequence_generator.py
in fairseq
499 22
options.py
in fairseq
482 21
lstm.py
in fairseq/models
459 23
fconv.py
in fairseq/models
455 31
fconv_self_att.py
in fairseq/models
448 26
indexed_dataset.py
in fairseq/data
398 62
levenshtein_transformer.py
in fairseq/models/nat
381 15
multihead_attention.py
in fairseq/modules
370 10
preprocess.py
in fairseq_cli
368 8
checkpoint_utils.py
in fairseq
360 13
utils.py
in fairseq
354 38
translation.py
in fairseq/tasks
331 14
nonautoregressive_transformer.py
in fairseq/models/nat
324 19
semisupervised_translation.py
in fairseq/tasks
316 11
fp16_optimizer.py
in fairseq/optim
298 37
lightconv_cuda_kernel.cu
in fairseq/modules/lightconv_layer
285 -
edit_dist.cu
in fairseq/clib/libnat_cuda
284 -
dictionary.py
in fairseq/data
282 29
denoising_dataset.py
in fairseq/data
281 18
fairseq_model.py
in fairseq/models
265 45
model.py
in fairseq/models/roberta
265 22
multilingual_masked_lm.py
in fairseq/tasks
257 10
progress_bar.py
in fairseq/logging
257 38
multilingual_translation.py
in fairseq/tasks
256 19
masked_lm.py
in fairseq/models
250 13
train.py
in fairseq_cli
244 8
transformer_layer.py
in fairseq/modules
243 9
transformer_lm.py
in fairseq/models
237 12
iterative_refinement_generator.py
in fairseq
235 4
translation_with_unlikelihood.py
in fairseq/tasks
234 12
fairseq_task.py
in fairseq/tasks
232 21
language_pair_dataset.py
in fairseq/data
229 13
model.py
in fairseq/models/bart
226 12
iterators.py
in fairseq/data
225 37
insertion_transformer.py
in fairseq/models/nat
215 16
levenshtein_utils.py
in fairseq/models/nat
211 9
file_utils.py
in fairseq
210 12
block_pair_dataset.py
in fairseq/data/legacy
210 13
lightweight_convolution.py
in fairseq/modules
209 14
dynamic_convolution.py
in fairseq/modules
201 12
meters.py
in fairseq/logging
199 40
hub_utils.py
in fairseq
197 23
downsampled_multihead_attention.py
in fairseq/modules
197 8
Files With Most Units (Top 20)
File# lines# units
indexed_dataset.py
in fairseq/data
398 62
fairseq_model.py
in fairseq/models
265 45
meters.py
in fairseq/logging
199 40
transformer.py
in fairseq/models
744 39
utils.py
in fairseq
354 38
progress_bar.py
in fairseq/logging
257 38
fp16_optimizer.py
in fairseq/optim
298 37
iterators.py
in fairseq/data
225 37
fconv.py
in fairseq/models
455 31
trainer.py
in fairseq
543 29
lightconv.py
in fairseq/models
554 29
dictionary.py
in fairseq/data
282 29
fconv_self_att.py
in fairseq/models
448 26
wav2vec.py
in fairseq/models
629 25
hub_utils.py
in fairseq
197 23
bmuf.py
in fairseq/optim
145 23
lstm.py
in fairseq/models
459 23
sequence_generator.py
in fairseq
499 22
model.py
in fairseq/models/roberta
265 22
options.py
in fairseq
482 21
Files With Long Lines (Top 20)

There are 25 files with lines longer than 120 characters. In total, there are 79 long lines.

File# lines# units# long lines
lightconv.py
in fairseq/models
554 29 14
transformer.py
in fairseq/models
744 39 12
data_prepro_clean.py
in preprocess
530 16 9
transformer_lm.py
in fairseq/models
237 12 5
average_checkpoints.py
in scripts
105 3 5
evaluate_hypo.py
in preprocess
792 13 4
fconv.py
in fairseq/models
455 31 3
label_smoothed_cross_entropy_with_multitask.py
in fairseq/criterions
193 8 3
preprocess.py
in fairseq_cli
368 8 3
model.py
in fairseq/models/bart
226 12 2
model.py
in fairseq/models/roberta
265 22 2
pyx
token_block_utils_fast.pyx
in fairseq/data
153 1 2
round_robin_zip_datasets.py
in fairseq/data
69 10 2
sm_inference_asum.py
in preprocess
1130 16 2
semisupervised_translation.py
in fairseq/tasks
316 11 1
lightconv_cuda_kernel.cu
in fairseq/modules/lightconv_layer
285 - 1
cuda_utils.cu
in fairseq/modules
169 - 1
dynamic_convolution.py
in fairseq/modules
201 12 1
wav2vec.py
in fairseq/models
629 25 1
hub_interface.py
in fairseq/models/bart
164 10 1