pytorch / text
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 323 units with 3,535 lines of code in units (52.3% of code).
    • 0 very long units (0 lines of code)
    • 9 long units (614 lines of code)
    • 30 medium size units (917 lines of code)
    • 67 small units (958 lines of code)
    • 217 very small units (1,046 lines of code)
0% | 17% | 25% | 27% | 29%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
py0% | 21% | 22% | 26% | 28%
cpp0% | 0% | 36% | 30% | 33%
h0% | 0% | 53% | 0% | 46%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
benchmark0% | 39% | 41% | 13% | 5%
torchtext/datasets0% | 28% | 16% | 49% | 5%
torchtext/vocab0% | 37% | 0% | 30% | 31%
torchtext/models0% | 28% | 19% | 16% | 35%
torchtext/data0% | 15% | 17% | 29% | 37%
torchtext0% | 14% | 22% | 25% | 37%
torchtext/experimental0% | 13% | 16% | 19% | 50%
torchtext/csrc0% | 0% | 37% | 27% | 34%
torchtext/nn0% | 0% | 69% | 0% | 30%
build_tools/setup_helpers0% | 0% | 48% | 26% | 24%
ROOT0% | 0% | 0% | 78% | 21%
torchtext/_internal0% | 0% | 0% | 0% | 100%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
def cache()
in torchtext/vocab/vectors.py
90 18 5
def benchmark_mha_block()
in benchmark/mha_block.py
82 28 0
def forward()
in torchtext/models/roberta/modules.py
78 5 4
def get_tokenizer()
in torchtext/data/utils.py
72 21 2
def IWSLT2016()
in torchtext/datasets/iwslt2016.py
68 17 4
def benchmark_experimental_vocab()
in benchmark/benchmark_pytext_vocab.py
60 6 0
def IWSLT2017()
in torchtext/datasets/iwslt2017.py
60 9 4
def extract_archive()
in torchtext/utils.py
53 16 3
def WMT14()
in torchtext/experimental/datasets/raw/wmt14.py
51 18 4
def benchmark_new_vocab_lookup()
in benchmark/benchmark_vocab.py
49 9 2
std::vector GPT2BPEEncoder::BPE_()
in torchtext/csrc/gpt2_bpe_tokenizer.cpp
46 15 1
def GloVe()
in torchtext/experimental/vectors.py
41 8 6
def forward()
in torchtext/nn/modules/multiheadattention.py
41 18 7
def _run_benchmark_lookup_jit_for_loop()
in benchmark/benchmark_pytext_vocab.py
39 16 4
def _build_third_party()
in build_tools/setup_helpers/extension.py
38 3 1
def _run_benchmark_lookup()
in benchmark/benchmark_pytext_vocab.py
35 18 3
35 7 3
Vocab _build_vocab_from_text_file()
in torchtext/csrc/vocab.cpp
33 2 4
Vocab _load_vocab_from_file()
in torchtext/csrc/vocab.cpp
31 2 3
def add_token()
in torchtext/functional.py
31 17 3