pytorch / fairseq
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 3,834 units with 41,101 lines of code in units (62.7% of code).
    • 24 very long units (3,531 lines of code)
    • 81 long units (5,522 lines of code)
    • 389 medium size units (12,107 lines of code)
    • 533 small units (7,613 lines of code)
    • 2,807 very small units (12,328 lines of code)
8% | 13% | 29% | 18% | 29%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
py8% | 13% | 29% | 18% | 30%
cpp0% | 25% | 12% | 34% | 26%
lua0% | 0% | 34% | 55% | 10%
pyx0% | 0% | 0% | 0% | 100%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
fairseq/models9% | 13% | 33% | 16% | 26%
fairseq20% | 8% | 23% | 14% | 32%
fairseq/tasks16% | 16% | 32% | 15% | 18%
fairseq_cli45% | 24% | 12% | 5% | 11%
fairseq/data2% | 6% | 25% | 24% | 41%
fairseq/criterions0% | 29% | 42% | 14% | 14%
scripts0% | 53% | 29% | 13% | 3%
fairseq/model_parallel0% | 25% | 33% | 14% | 26%
fairseq/optim0% | 14% | 23% | 21% | 40%
fairseq/modules0% | 6% | 32% | 25% | 35%
fairseq/distributed0% | 24% | 26% | 24% | 24%
ROOT0% | 67% | 0% | 15% | 17%
fairseq/clib0% | 28% | 14% | 39% | 18%
fairseq/dataclass0% | 0% | 48% | 27% | 23%
fairseq/benchmark0% | 0% | 41% | 7% | 50%
scripts/constraints0% | 0% | 87% | 0% | 12%
fairseq/logging0% | 0% | 4% | 14% | 80%
fairseq/scoring0% | 0% | 0% | 38% | 61%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
def _main()
in fairseq_cli/generate.py
281 62 2
def train_step()
in fairseq/trainer.py
231 19 3
def load_dataset()
in fairseq/tasks/semisupervised_translation.py
196 23 4
def add_args()
in fairseq/data/multilingual/multilingual_data_manager.py
185 2 1
def load_dataset()
in fairseq/tasks/multilingual_language_modeling.py
183 28 5
def add_args()
in fairseq/models/lightconv_lm.py
175 1 1
def generate()
in fairseq/iterative_refinement_generator.py
172 39 5
def add_args()
in fairseq/models/lightconv.py
168 1 1
def main()
in fairseq_cli/interactive.py
161 43 1
def add_args()
in fairseq/models/roberta/model.py
157 9 1
def load_dataset()
in fairseq/tasks/multilingual_denoising.py
133 20 5
def main()
in fairseq_cli/train.py
131 30 1
def load_dataset()
in fairseq/tasks/multilingual_masked_lm.py
128 17 5
def save_checkpoint()
in fairseq/checkpoint_utils.py
126 44 4
def add_args()
in fairseq/models/speech_to_speech/s2s_transformer.py
121 1 1
def add_args()
in fairseq/models/speech_to_text/convtransformer.py
119 1 1
def add_args()
in fairseq/models/speech_to_text/s2t_transformer.py
115 1 1
def __init__()
in fairseq/models/wav2vec/wav2vec.py
113 17 2
def __init__()
in fairseq/trainer.py
108 26 6
def generate()
in fairseq/sequence_scorer.py
107 22 4