microsoft / near-duplicate-code-detector
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 36 units with 469 lines of code in units (61.1% of code).
    • 0 very long units (0 lines of code)
    • 0 long units (0 lines of code)
    • 6 medium size units (178 lines of code)
    • 13 small units (181 lines of code)
    • 17 very small units (110 lines of code)
0% | 0% | 37% | 38% | 23%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
cs0% | 0% | 35% | 39% | 24%
java0% | 0% | 81% | 18% | 0%
js0% | 0% | 0% | 76% | 23%
py0% | 0% | 0% | 30% | 69%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
DuplicateCodeDetector0% | 0% | 59% | 13% | 27%
tokenizers/java/src/main/java/javatokenizer0% | 0% | 81% | 18% | 0%
DuplicateCodeDetector/Utils0% | 0% | 0% | 87% | 12%
tokenizers/CsharpTokenizer/CsharpTokenizer0% | 0% | 0% | 69% | 30%
tokenizers/javascript0% | 0% | 0% | 76% | 23%
tokenizers/python0% | 0% | 0% | 30% | 69%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
public static SerializableTokens TokenizeFile()
in tokenizers/java/src/main/java/javatokenizer/Extractor.java
39 17 3
public CloneGroups()
in DuplicateCodeDetector/CloneGroups.cs
32 5 1
public static void ExtractForFolder()
in tokenizers/java/src/main/java/javatokenizer/Extractor.java
30 4 4
public void BuildIndexForProject()
in DuplicateCodeDetector/CloneDetector.cs
29 4 1
private IEnumerable FindNearDuplicates()
in DuplicateCodeDetector/CloneDetector.cs
24 5 4
private int MakeCloneSetTransitive()
in DuplicateCodeDetector/CloneGroups.cs
24 4 0
public double JaccardSimilarity()
in DuplicateCodeDetector/Utils/SparseVector.cs
19 4 1
public double KeyJaccardSimilarity()
in DuplicateCodeDetector/Utils/SparseVector.cs
17 3 1
function extractForFolder()
in tokenizers/javascript/parser.js
17 3 4
public static void main()
in tokenizers/java/src/main/java/javatokenizer/Extractor.java
16 2 1
function listAllFilesRecursive()
in tokenizers/javascript/parser.js
16 4 3
public static void ExtractForProjectFolder()
in tokenizers/CsharpTokenizer/CsharpTokenizer/Program.cs
14 2 4
private static IEnumerable Count()
in DuplicateCodeDetector/CloneDetector.cs
13 3 1
private IEnumerable GetAllProjectCombinations()
in DuplicateCodeDetector/CloneDetector.cs
12 3 0
static void Main()
in tokenizers/CsharpTokenizer/CsharpTokenizer/Program.cs
12 2 1
private static string GetJsonForFile()
in tokenizers/CsharpTokenizer/CsharpTokenizer/Program.cs
12 1 3
public int AddOrGet()
in DuplicateCodeDetector/Utils/FeatureDictionary.cs
11 2 1
public void AddElements()
in DuplicateCodeDetector/Utils/SparseVector.cs
11 3 1
def tokenize_file()
in tokenizers/python/tokenizepythoncorpus.py
11 6 2
function getTokens()
in tokenizers/javascript/parser.js
10 4 2