amazon-research / datatuner
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 135 units with 3,129 lines of code in units (75.3% of code).
    • 6 very long units (1,289 lines of code)
    • 8 long units (536 lines of code)
    • 14 medium size units (444 lines of code)
    • 33 small units (477 lines of code)
    • 74 very small units (383 lines of code)
41% | 17% | 14% | 15% | 12%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
py41% | 17% | 14% | 15% | 12%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
src/datatuner/lm38% | 19% | 13% | 16% | 12%
src/datatuner/classification46% | 17% | 11% | 16% | 8%
paper/experiments/mturk78% | 0% | 6% | 9% | 5%
paper/experiments/webnlg0% | 55% | 0% | 20% | 23%
paper/experiments/viggo0% | 0% | 100% | 0% | 0%
paper/experiments/ldc0% | 0% | 88% | 0% | 11%
src/datatuner0% | 0% | 0% | 19% | 80%
src/datatuner/ops0% | 0% | 0% | 0% | 100%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
def train()
in src/datatuner/lm/train.py
350 64 0
def run()
in src/datatuner/lm/evaluate.py
286 54 0
def main()
in src/datatuner/classification/run_classifier.py
265 55 1
def score()
in paper/experiments/mturk/prepare_mturk.py
166 37 4
def train()
in src/datatuner/classification/run_classifier.py
115 48 4
def prepare()
in paper/experiments/mturk/prepare_mturk.py
107 37 5
def get_data_loaders()
in src/datatuner/lm/data_loader.py
80 25 3
def evaluate()
in src/datatuner/classification/run_classifier.py
80 23 4
def parse()
in paper/experiments/webnlg/preprocess.py
77 14 4
def get_dataset_from_file()
in src/datatuner/lm/data_loader.py
70 27 5
def load_and_cache_examples()
in src/datatuner/classification/run_classifier.py
60 32 4
def clean_beam()
in src/datatuner/lm/evaluate.py
59 25 4
def setup()
in src/datatuner/lm/evaluate.py
56 13 1
def tokenize()
in src/datatuner/lm/custom_tokenizer.py
54 20 3
def bleu()
in src/datatuner/lm/metrics.py
42 16 6
def step()
in src/datatuner/lm/novograd.py
42 14 2
def preprocess()
in paper/experiments/viggo/preprocess.py
40 8 3
def preprocess()
in paper/experiments/ldc/preprocess.py
39 8 5
def get_e2e_metrics()
in src/datatuner/lm/metrics.py
38 9 2
def get_distractors()
in src/datatuner/classification/distractors.py
38 13 7