facebookresearch / TransCoder
Conditional Complexity

The distribution of complexity of units (measured with McCabe index).

Intro
  • Conditional complexity (also called cyclomatic complexity) is a term used to measure the complexity of software. The term refers to the number of possible paths through a program function. A higher value ofter means higher maintenance and testing costs (infosecinstitute.com).
  • Conditional complexity is calculated by counting all conditions in the program that can affect the execution path (e.g. if statement, loops, switches, and/or operators, try and catch blocks...).
  • Conditional complexity is measured at the unit level (methods, functions...).
  • Units are classified in four categories based on the measured McCabe index: 1-5 (simple units), 6-10 (medium complex units), 11-25 (complex units), 26+ (very complex units).
Learn more...
Conditional Complexity Overall
  • There are 3,674 units with 61,680 lines of code in units (73.2% of code).
    • 1 very complex units (76 lines of code)
    • 7 complex units (702 lines of code)
    • 86 medium complex units (2,894 lines of code)
    • 515 simple units (9,755 lines of code)
    • 3,065 very simple units (48,253 lines of code)
<1% | 1% | 4% | 15% | 78%
Legend:
51+
26-50
11-25
6-10
1-5
Alternative Visuals
Conditional Complexity per Extension
51+
26-50
11-25
6-10
1-5
py<1% | 5% | 16% | 30% | 46%
cpp0% | 0% | 2% | 15% | 81%
java0% | 0% | 1% | 10% | 88%
Conditional Complexity per Logical Component
primary logical decomposition
51+
26-50
11-25
6-10
1-5
XLM/src/data11% | 0% | 31% | 15% | 41%
XLM/src0% | 18% | 25% | 27% | 28%
XLM/src/model0% | 29% | 37% | 6% | 26%
XLM/src/evaluation0% | 32% | 31% | 21% | 14%
preprocessing/src0% | 5% | 25% | 36% | 32%
data/evaluation/geeks_for_geeks_successful_test_scripts/cpp0% | 0% | 2% | 15% | 81%
data/evaluation/geeks_for_geeks_successful_test_scripts/python0% | 0% | 7% | 35% | 57%
data/evaluation/geeks_for_geeks_successful_test_scripts/java0% | 0% | 1% | 10% | 88%
XLM0% | 0% | 23% | 0% | 76%
ROOT0% | 0% | 38% | 47% | 13%
preprocessing0% | 0% | 0% | 30% | 69%
Most Complex Units
Top 20 most complex units
Unit# linesMcCabe index# params
def check_data_params()
in XLM/src/data/loader.py
76 65 1
def build_model()
in XLM/src/model/__init__.py
106 50 2
def generate_beam()
in XLM/src/model/transformer.py
110 39 8
def evaluate_mt()
in XLM/src/evaluation/evaluator.py
134 37 7
def __init__()
in XLM/src/trainer.py
69 33 3
129 29 7
72 27 4
def extract_functions_cpp()
in preprocessing/src/code_tokenizer.py
82 27 1
def run_all_evals()
in XLM/src/evaluation/evaluator.py
27 23 2
def extract_functions_java()
in preprocessing/src/code_tokenizer.py
69 23 1
def load_mono_data()
in XLM/src/data/loader.py
35 22 2
def tokenize_python()
in preprocessing/src/code_tokenizer.py
66 21 2
def check_model_params()
in XLM/src/model/__init__.py
53 20 1
def tokenize()
in preprocessing/src/javalang_tokenizer.py
52 20 2
def main()
in XLM/train.py
50 18 1
string f_gold()
in data/evaluation/geeks_for_geeks_successful_test_scripts/cpp/DECODE_STRING_RECURSIVELY_ENCODED_COUNT_FOLLOWED_SUBSTRING.cpp
51 18 1
def shuf_order()
in XLM/src/utils.py
23 17 3
def detokenize_cpp()
in preprocessing/src/code_tokenizer.py
49 17 1
def f_gold()
in data/evaluation/geeks_for_geeks_successful_test_scripts/python/CHECK_GIVEN_SENTENCE_GIVEN_SET_SIMPLE_GRAMMER_RULES.py
27 17 1
def optimize()
in XLM/src/trainer.py
30 16 2