amazon-research / datatuner
Conditional Complexity

The distribution of complexity of units (measured with McCabe index).

Intro
  • Conditional complexity (also called cyclomatic complexity) is a term used to measure the complexity of software. The term refers to the number of possible paths through a program function. A higher value ofter means higher maintenance and testing costs (infosecinstitute.com).
  • Conditional complexity is calculated by counting all conditions in the program that can affect the execution path (e.g. if statement, loops, switches, and/or operators, try and catch blocks...).
  • Conditional complexity is measured at the unit level (methods, functions...).
  • Units are classified in four categories based on the measured McCabe index: 1-5 (simple units), 6-10 (medium complex units), 11-25 (complex units), 26+ (very complex units).
Learn more...
Conditional Complexity Overall
  • There are 135 units with 3,129 lines of code in units (75.3% of code).
    • 3 very complex units (901 lines of code)
    • 5 complex units (518 lines of code)
    • 10 medium complex units (544 lines of code)
    • 17 simple units (357 lines of code)
    • 100 very simple units (809 lines of code)
28% | 16% | 17% | 11% | 25%
Legend:
51+
26-50
11-25
6-10
1-5
Alternative Visuals
Conditional Complexity per Extension
51+
26-50
11-25
6-10
1-5
py28% | 16% | 17% | 11% | 25%
Conditional Complexity per Logical Component
primary logical decomposition
51+
26-50
11-25
6-10
1-5
src/datatuner/lm38% | 4% | 21% | 9% | 26%
src/datatuner/classification32% | 21% | 14% | 7% | 24%
paper/experiments/mturk0% | 78% | 0% | 5% | 15%
paper/experiments/webnlg0% | 0% | 55% | 0% | 44%
paper/experiments/viggo0% | 0% | 0% | 100% | 0%
paper/experiments/ldc0% | 0% | 0% | 100% | 0%
src/datatuner0% | 0% | 0% | 19% | 80%
src/datatuner/ops0% | 0% | 0% | 0% | 100%
Most Complex Units
Top 20 most complex units
Unit# linesMcCabe index# params
def train()
in src/datatuner/lm/train.py
350 64 0
def main()
in src/datatuner/classification/run_classifier.py
265 55 1
def run()
in src/datatuner/lm/evaluate.py
286 54 0
def train()
in src/datatuner/classification/run_classifier.py
115 48 4
def prepare()
in paper/experiments/mturk/prepare_mturk.py
107 37 5
def score()
in paper/experiments/mturk/prepare_mturk.py
166 37 4
def load_and_cache_examples()
in src/datatuner/classification/run_classifier.py
60 32 4
def get_dataset_from_file()
in src/datatuner/lm/data_loader.py
70 27 5
def get_data_loaders()
in src/datatuner/lm/data_loader.py
80 25 3
def clean_beam()
in src/datatuner/lm/evaluate.py
59 25 4
def evaluate()
in src/datatuner/classification/run_classifier.py
80 23 4
def tokenize()
in src/datatuner/lm/custom_tokenizer.py
54 20 3
def bleu()
in src/datatuner/lm/metrics.py
42 16 6
def step()
in src/datatuner/lm/novograd.py
42 14 2
def parse()
in paper/experiments/webnlg/preprocess.py
77 14 4
def setup()
in src/datatuner/lm/evaluate.py
56 13 1
def get_distractors()
in src/datatuner/classification/distractors.py
38 13 7
def read_special_tokens()
in src/datatuner/lm/model_loader.py
16 12 3
def is_middle_token()
in src/datatuner/lm/utils.py
21 10 3
def get_e2e_metrics()
in src/datatuner/lm/metrics.py
38 9 2