facebookresearch / TransCoder
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 3,674 units with 61,680 lines of code in units (73.2% of code).
    • 6 very long units (767 lines of code)
    • 76 long units (4,614 lines of code)
    • 940 medium size units (27,118 lines of code)
    • 1,565 small units (22,340 lines of code)
    • 1,087 very small units (6,841 lines of code)
1% | 7% | 43% | 36% | 11%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
py5% | 8% | 24% | 36% | 24%
java<1% | 11% | 68% | 13% | 6%
cpp0% | <1% | 15% | 73% | 10%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
XLM/src/model29% | 31% | 14% | 10% | 14%
XLM76% | 0% | 23% | 0% | 0%
XLM/src/evaluation32% | 0% | 53% | 11% | 2%
XLM/src8% | 25% | 40% | 14% | 9%
data/evaluation/geeks_for_geeks_successful_test_scripts/java<1% | 11% | 68% | 13% | 6%
preprocessing/src0% | 18% | 42% | 24% | 14%
XLM/src/data0% | 19% | 38% | 29% | 13%
data/evaluation/geeks_for_geeks_successful_test_scripts/cpp0% | <1% | 15% | 73% | 10%
data/evaluation/geeks_for_geeks_successful_test_scripts/python0% | 0% | 13% | 50% | 35%
ROOT0% | 0% | 86% | 13% | 0%
preprocessing0% | 0% | 82% | 17% | 0%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
def get_parser()
in XLM/train.py
165 2 0
def evaluate_mt()
in XLM/src/evaluation/evaluator.py
134 37 7
129 29 7
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/SORT_ARRAY_APPLYING_GIVEN_EQUATION.java
123 7 1
def generate_beam()
in XLM/src/model/transformer.py
110 39 8
def build_model()
in XLM/src/model/__init__.py
106 50 2
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K_1.java
88 3 1
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K.java
88 3 1
def extract_functions_cpp()
in preprocessing/src/code_tokenizer.py
82 27 1
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/FUNCTION_COPY_STRING_ITERATIVE_RECURSIVE_1.java
79 5 1
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/STOOGE_SORT.java
79 5 1
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/SORT_EVEN_PLACED_ELEMENTS_INCREASING_ODD_PLACED_DECREASING_ORDER.java
79 5 1
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/SPLIT_ARRAY_ADD_FIRST_PART_END.java
79 5 1
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/FIND_MAXIMUM_SUM_POSSIBLE_EQUAL_SUM_THREE_STACKS.java
77 3 1
public static void main()
in data/evaluation/geeks_for_geeks_successful_test_scripts/java/CHECK_TWO_GIVEN_CIRCLES_TOUCH_INTERSECT.java
77 3 1
def check_data_params()
in XLM/src/data/loader.py
76 65 1
def init_distributed_mode()
in XLM/src/slurm.py
72 15 1
72 27 4
def __init__()
in XLM/src/trainer.py
69 33 3
def extract_functions_java()
in preprocessing/src/code_tokenizer.py
69 23 1