microsoft / near-duplicate-code-detector
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 12 files with 767 lines of code.
    • 0 very long files (0 lines of code)
    • 0 long files (0 lines of code)
    • 0 medium size files (0 lines of codeclsfd_ftr_w_mp_ins)
    • 2 small files (252 lines of code)
    • 10 very small files (515 lines of code)
0% | 0% | 0% | 32% | 67%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
cs0% | 0% | 0% | 31% | 68%
java0% | 0% | 0% | 100% | 0%
js0% | 0% | 0% | 0% | 100%
fs0% | 0% | 0% | 0% | 100%
py0% | 0% | 0% | 0% | 100%
fsproj0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
DuplicateCodeDetector0% | 0% | 0% | 49% | 50%
tokenizers/java/src/main/java/javatokenizer0% | 0% | 0% | 100% | 0%
DuplicateCodeDetector/Utils0% | 0% | 0% | 0% | 100%
tokenizers/FSharpTokenizer/FSharpTokenizer0% | 0% | 0% | 0% | 100%
tokenizers/javascript0% | 0% | 0% | 0% | 100%
tokenizers/CsharpTokenizer/CsharpTokenizer0% | 0% | 0% | 0% | 100%
tokenizers/python0% | 0% | 0% | 0% | 100%
Longest Files (Top 12)
File# lines# units
CloneDetector.cs
in DuplicateCodeDetector
137 9
Extractor.java
in tokenizers/java/src/main/java/javatokenizer
115 3
CloneGroups.cs
in DuplicateCodeDetector
84 4
parser.js
in tokenizers/javascript
77 3
Program.cs
in tokenizers/CsharpTokenizer/CsharpTokenizer
76 5
fs
Program.fs
in tokenizers/FSharpTokenizer/FSharpTokenizer
60 -
CloneDetectorCli.cs
in DuplicateCodeDetector
58 1
SparseVector.cs
in DuplicateCodeDetector/Utils
57 3
tokenizepythoncorpus.py
in tokenizers/python
35 3
FeatureDictionary.cs
in DuplicateCodeDetector/Utils
28 3
baronetokenizer.py
in tokenizers/python
23 2
fsproj
FSharpTokenizer.fsproj
in tokenizers/FSharpTokenizer/FSharpTokenizer
17 -
Files With Most Units (Top 10)
File# lines# units
CloneDetector.cs
in DuplicateCodeDetector
137 9
Program.cs
in tokenizers/CsharpTokenizer/CsharpTokenizer
76 5
CloneGroups.cs
in DuplicateCodeDetector
84 4
FeatureDictionary.cs
in DuplicateCodeDetector/Utils
28 3
SparseVector.cs
in DuplicateCodeDetector/Utils
57 3
Extractor.java
in tokenizers/java/src/main/java/javatokenizer
115 3
parser.js
in tokenizers/javascript
77 3
tokenizepythoncorpus.py
in tokenizers/python
35 3
baronetokenizer.py
in tokenizers/python
23 2
CloneDetectorCli.cs
in DuplicateCodeDetector
58 1
Files With Long Lines (Top 5)

There are 5 files with lines longer than 120 characters. In total, there are 16 long lines.

File# lines# units# long lines
CloneDetector.cs
in DuplicateCodeDetector
137 9 9
CloneDetectorCli.cs
in DuplicateCodeDetector
58 1 3
Extractor.java
in tokenizers/java/src/main/java/javatokenizer
115 3 2
Program.cs
in tokenizers/CsharpTokenizer/CsharpTokenizer
76 5 1
tokenizepythoncorpus.py
in tokenizers/python
35 3 1