Intro

Unit size measurements show the distribution of size of units of code (methods, functions...).
Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

Unit Size Overall

Legend:

101+

51-100

21-50

11-20

1-10

Unit Size per Extension

101+

51-100

21-50

11-20

1-10

Unit Size per Logical Component

primary logical decomposition

101+

51-100

21-50

11-20

1-10

Alternative Visuals

Longest Units

Top 20 longest units

Unit	# lines	McCabe index	# params
def train() in src/datatuner/lm/train.py	350	64	0
def run() in src/datatuner/lm/evaluate.py	286	54	0
def main() in src/datatuner/classification/run_classifier.py	265	55	1
def score() in paper/experiments/mturk/prepare_mturk.py	166	37	4
def train() in src/datatuner/classification/run_classifier.py	115	48	4
def prepare() in paper/experiments/mturk/prepare_mturk.py	107	37	5
def get_data_loaders() in src/datatuner/lm/data_loader.py	80	25	3
def evaluate() in src/datatuner/classification/run_classifier.py	80	23	4
def parse() in paper/experiments/webnlg/preprocess.py	77	14	4
def get_dataset_from_file() in src/datatuner/lm/data_loader.py	70	27	5
def load_and_cache_examples() in src/datatuner/classification/run_classifier.py	60	32	4
def clean_beam() in src/datatuner/lm/evaluate.py	59	25	4
def setup() in src/datatuner/lm/evaluate.py	56	13	1
def tokenize() in src/datatuner/lm/custom_tokenizer.py	54	20	3
def bleu() in src/datatuner/lm/metrics.py	42	16	6
def step() in src/datatuner/lm/novograd.py	42	14	2
def preprocess() in paper/experiments/viggo/preprocess.py	40	8	3
def preprocess() in paper/experiments/ldc/preprocess.py	39	8	5
def get_e2e_metrics() in src/datatuner/lm/metrics.py	38	9	2
def get_distractors() in src/datatuner/classification/distractors.py	38	13	7