facebookresearch / CodeGen
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 3,981 units with 67,644 lines of code in units (71.7% of code).
    • 16 very long units (2,918 lines of code)
    • 91 long units (5,814 lines of code)
    • 968 medium size units (27,997 lines of code)
    • 1,614 small units (23,008 lines of code)
    • 1,292 very small units (7,907 lines of code)
4% | 8% | 41% | 34% | 11%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
py15% | 12% | 21% | 28% | 22%
java<1% | 11% | 68% | 13% | 6%
cpp0% | <1% | 15% | 73% | 10%
perl0% | 0% | 0% | 77% | 22%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
codegen_sources/model/src14% | 28% | 27% | 15% | 14%
codegen_sources/model81% | 0% | 16% | 1% | <1%
CodeXGLUE/Text-Code/text-to-code39% | 7% | 34% | 9% | 8%
CodeXGLUE/Code-Text/code-to-text38% | 6% | 24% | 16% | 13%
CodeXGLUE/Text-Code/NL-code-search-Adv56% | 0% | 27% | 5% | 10%
CodeXGLUE/Code-Code/Clone-detection-BigCloneBench52% | 11% | 12% | 10% | 12%
data/transcoder_evaluation_gfg/java<1% | 11% | 68% | 13% | 6%
codegen_sources/preprocessing/lang_processors16% | 37% | 19% | 11% | 15%
codegen_sources/preprocessing/obfuscation16% | 32% | 17% | 11% | 22%
codegen_sources/preprocessing/dataset_modes0% | 14% | 48% | 7% | 29%
codegen_sources/preprocessing0% | 24% | 28% | 22% | 23%
data/transcoder_evaluation_gfg/cpp0% | <1% | 15% | 73% | 10%
data/transcoder_evaluation_gfg/python0% | 0% | 13% | 50% | 35%
codegen_sources/wrappers0% | 0% | 27% | 48% | 23%
codegen_sources/preprocessing/bpe_modes0% | 0% | 0% | 58% | 41%
codegen_sources/model/tools0% | 0% | 0% | 0% | 100%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
def get_parser()
in codegen_sources/model/train.py
592 2 0
def main()
in CodeXGLUE/Code-Text/code-to-text/code/run.py
301 76 0
def check_data_params()
in codegen_sources/model/src/data/loader.py
281 135 1
def main()
in CodeXGLUE/Text-Code/text-to-code/code/run.py
172 28 0
def main()
in CodeXGLUE/Text-Code/NL-code-search-Adv/code/run.py
169 32 0
def main()
in CodeXGLUE/Code-Code/Clone-detection-BigCloneBench/code/run.py
167 30 0
def __init__()
in codegen_sources/model/src/trainer.py
150 54 4
def train()
in CodeXGLUE/Text-Code/text-to-code/code/run.py
144 49 6
def st_step()
in codegen_sources/model/src/trainer.py
139 28 5
public static void main()
in data/transcoder_evaluation_gfg/java/SORT_ARRAY_APPLYING_GIVEN_EQUATION.java
123 7 1
def run_all_evals()
in codegen_sources/model/src/evaluation/evaluator.py
123 41 2
def extract_functions()
in codegen_sources/preprocessing/lang_processors/cpp_processor.py
121 32 2
def obfuscate()
in codegen_sources/preprocessing/obfuscation/javalang_obfuscator.py
114 45 1
def train()
in CodeXGLUE/Text-Code/NL-code-search-Adv/code/run.py
111 37 4
def train()
in CodeXGLUE/Code-Code/Clone-detection-BigCloneBench/code/run.py
110 40 5
def main()
in codegen_sources/model/train.py
101 28 1
def getMangledName()
in codegen_sources/preprocessing/obfuscation/bobskater_obfuscator.py
96 28 4
def vizualize_do_files()
in codegen_sources/model/src/utils.py
93 13 4
def create_reference_files()
in codegen_sources/model/src/evaluation/evaluator.py
92 13 1
def deobfuscate_by_variable()
in codegen_sources/model/src/trainer.py
92 27 6