amazon-research / statistical-byte-pair-encoding
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 49 units with 531 lines of code in units (96.9% of code).
    • 0 very long units (0 lines of code)
    • 0 long units (0 lines of code)
    • 8 medium size units (240 lines of code)
    • 11 small units (164 lines of code)
    • 30 very small units (127 lines of code)
0% | 0% | 45% | 30% | 23%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
py0% | 0% | 45% | 30% | 23%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
ROOT0% | 0% | 45% | 30% | 23%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
def learn_bpe()
in learn_bpe.py
43 11 10
41 2 1
def pop_max()
in learn_bpe.py
36 9 1
def sift_down()
in heap.py
27 13 2
def _update_stats()
in learn_bpe.py
25 12 7
24 12 2
22 6 1
def __init__()
in learn_bpe.py
22 8 3
def __init__()
in heap.py
20 3 7
def main()
in learn_bpe.py
20 5 0
18 9 3
def sift_up()
in heap.py
16 6 2
def _swap_pos()
in heap.py
14 3 3
def __init__()
in learn_bpe.py
14 6 3
14 4 3
def insert()
in heap.py
13 2 2
def remove()
in heap.py
12 4 2
12 6 2
def dfs()
in heap.py
11 4 2
10 1 3