facebookresearch / CodeGen
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 2,111 files with 94,368 lines of code.
    • 1 very long files (1,538 lines of code)
    • 8 long files (5,465 lines of code)
    • 14 medium size files (4,644 lines of codeclsfd_ftr_w_mp_ins)
    • 23 small files (3,331 lines of code)
    • 2,065 very small files (79,390 lines of code)
1% | 5% | 4% | 3% | 84%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py4% | 14% | 13% | 7% | 59%
html0% | 100% | 0% | 0% | 0%
java0% | 0% | 0% | 1% | 98%
perl0% | 0% | 0% | 100% | 0%
cpp0% | 0% | 0% | 0% | 100%
jsonl0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
codegen_sources/model/src22% | 44% | 23% | 7% | 2%
codegen_sources/model0% | 61% | 19% | 15% | 3%
codegen_sources/preprocessing/dataset_modes0% | 54% | 0% | 45% | <1%
CodeXGLUE0% | 100% | 0% | 0% | 0%
CodeXGLUE/Code-Code/Clone-detection-BigCloneBench0% | 85% | 0% | 0% | 14%
codegen_sources/wrappers0% | 0% | 99% | 0% | <1%
codegen_sources/preprocessing/obfuscation0% | 0% | 72% | 17% | 9%
CodeXGLUE/Text-Code/NL-code-search-Adv0% | 0% | 85% | 0% | 14%
CodeXGLUE/Text-Code/text-to-code0% | 0% | 52% | 16% | 31%
CodeXGLUE/Code-Text/code-to-text0% | 0% | 48% | 51% | 0%
codegen_sources/preprocessing/lang_processors0% | 0% | 27% | 59% | 13%
data/transcoder_evaluation_gfg/java0% | 0% | 0% | 1% | 98%
codegen_sources/preprocessing0% | 0% | 0% | 91% | 8%
data/transcoder_evaluation_gfg/cpp0% | 0% | 0% | 0% | 100%
data/transcoder_evaluation_gfg/python0% | 0% | 0% | 0% | 100%
codegen_sources/preprocessing/bpe_modes0% | 0% | 0% | 0% | 100%
codegen_sources/model/tools0% | 0% | 0% | 0% | 100%
codegen_sources0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
trainer.py
in codegen_sources/model/src
1538 36
evaluator.py
in codegen_sources/model/src/evaluation
982 16
utils.py
in codegen_sources/model/src
876 39
train.py
in codegen_sources/model
714 2
dataset_mode.py
in codegen_sources/preprocessing/dataset_modes
640 24
transformer.py
in codegen_sources/model/src/model
631 25
loader.py
in codegen_sources/model/src/data
561 7
index.html
in CodeXGLUE
554 -
run.py
in CodeXGLUE/Code-Code/Clone-detection-BigCloneBench/code
507 12
run.py
in CodeXGLUE/Text-Code/NL-code-search-Adv/code
487 10
run.py
in CodeXGLUE/Text-Code/text-to-code/code
462 7
run.py
in CodeXGLUE/Code-Text/code-to-text/code
435 6
dataset.py
in codegen_sources/model/src/data
428 26
comp_acc_computation.py
in codegen_sources/model/src/evaluation
405 14
__init__.py
in codegen_sources/model/src/model
368 9
bobskater_obfuscator.py
in codegen_sources/preprocessing/obfuscation
332 15
models.py
in codegen_sources/wrappers
298 26
tokenizer.py
in codegen_sources/wrappers
296 13
javalang_obfuscator.py
in codegen_sources/preprocessing/obfuscation
251 7
translate.py
in codegen_sources/model
228 3
python_processor.py
in codegen_sources/preprocessing/lang_processors
227 6
optim.py
in codegen_sources/model/src
220 10
dictionary.py
in codegen_sources/model/src/data
207 11
utils.py
in codegen_sources/preprocessing
199 16
preprocess.py
in codegen_sources/preprocessing
189 1
cpp_processor.py
in codegen_sources/preprocessing/lang_processors
186 6
deobfuscate.py
in codegen_sources/model
182 3
tree_sitter_processor.py
in codegen_sources/preprocessing/lang_processors
176 9
SORT_ARRAY_APPLYING_GIVEN_EQUATION.java
in data/transcoder_evaluation_gfg/java
157 2
evaluator.py
in CodeXGLUE/Code-Text/code-to-text/evaluator
155 9
bleu.py
in CodeXGLUE/Code-Text/code-to-text/code
155 9
obfuscation_functions_mode.py
in codegen_sources/preprocessing/dataset_modes
155 7
model.py
in CodeXGLUE/Code-Text/code-to-text/code
153 12
beam.py
in CodeXGLUE/Text-Code/text-to-code/code
147 12
java_processor.py
in codegen_sources/preprocessing/lang_processors
141 6
multi-bleu.perl
in codegen_sources/model/src/evaluation
139 2
bobskater_frameUtils.py
in codegen_sources/preprocessing/obfuscation
139 17
cache.py
in codegen_sources/model/src
136 19
obfuscation_mode.py
in codegen_sources/preprocessing/dataset_modes
130 7
monolingual_mode.py
in codegen_sources/preprocessing/dataset_modes
126 5
monolingual_functions_mode.py
in codegen_sources/preprocessing/dataset_modes
122 5
subtoken_score.py
in codegen_sources/model/src/evaluation
117 6
FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K_1.java
in data/transcoder_evaluation_gfg/java
115 2
FIND_MAXIMUM_SUM_POSSIBLE_EQUAL_SUM_THREE_STACKS.java
in data/transcoder_evaluation_gfg/java
104 2
FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K.java
in data/transcoder_evaluation_gfg/java
104 2
slurm.py
in codegen_sources/model/src
104 4
STOOGE_SORT.java
in data/transcoder_evaluation_gfg/java
99 2
dataset.py
in CodeXGLUE/Text-Code/text-to-code/code
97 4
SORT_EVEN_PLACED_ELEMENTS_INCREASING_ODD_PLACED_DECREASING_ORDER.java
in data/transcoder_evaluation_gfg/java
96 2
SPLIT_ARRAY_ADD_FIRST_PART_END.java
in data/transcoder_evaluation_gfg/java
96 2
Files With Most Units (Top 20)
File# lines# units
utils.py
in codegen_sources/model/src
876 39
trainer.py
in codegen_sources/model/src
1538 36
models.py
in codegen_sources/wrappers
298 26
dataset.py
in codegen_sources/model/src/data
428 26
transformer.py
in codegen_sources/model/src/model
631 25
dataset_mode.py
in codegen_sources/preprocessing/dataset_modes
640 24
cache.py
in codegen_sources/model/src
136 19
bobskater_frameUtils.py
in codegen_sources/preprocessing/obfuscation
139 17
evaluator.py
in codegen_sources/model/src/evaluation
982 16
utils.py
in codegen_sources/preprocessing
199 16
bobskater_obfuscator.py
in codegen_sources/preprocessing/obfuscation
332 15
comp_acc_computation.py
in codegen_sources/model/src/evaluation
405 14
tokenizer.py
in codegen_sources/wrappers
296 13
beam.py
in CodeXGLUE/Text-Code/text-to-code/code
147 12
model.py
in CodeXGLUE/Code-Text/code-to-text/code
153 12
run.py
in CodeXGLUE/Code-Code/Clone-detection-BigCloneBench/code
507 12
dictionary.py
in codegen_sources/model/src/data
207 11
run.py
in CodeXGLUE/Text-Code/NL-code-search-Adv/code
487 10
optim.py
in codegen_sources/model/src
220 10
evaluator.py
in CodeXGLUE/Code-Text/code-to-text/evaluator
155 9
Files With Long Lines (Top 20)

There are 977 files with lines longer than 120 characters. In total, there are 3035 long lines.

File# lines# units# long lines
index.html
in CodeXGLUE
554 - 29
FUNCTION_COPY_STRING_ITERATIVE_RECURSIVE_1.java
in data/transcoder_evaluation_gfg/java
92 2 19
LEXICOGRAPHICALLY_LARGEST_SUBSEQUENCE_EVERY_CHARACTER_OCCURS_LEAST_K_TIMES.py
in data/transcoder_evaluation_gfg/python
50 1 18
SWAP_TWO_NUMBERS_WITHOUT_USING_TEMPORARY_VARIABLE.java
in data/transcoder_evaluation_gfg/java
68 2 17
PROGRAM_WORST_FIT_ALGORITHM_MEMORY_MANAGEMENT.py
in data/transcoder_evaluation_gfg/python
52 1 16
SWAP_TWO_NUMBERS_WITHOUT_USING_TEMPORARY_VARIABLE.py
in data/transcoder_evaluation_gfg/python
36 1 16
REMOVE_CONSECUTIVE_DUPLICATES_STRING.py
in data/transcoder_evaluation_gfg/python
44 1 14
GIVEN_1S_2S_3S_KS_PRINT_ZIG_ZAG_WAY.py
in data/transcoder_evaluation_gfg/python
56 1 14
run.py
in CodeXGLUE/Code-Text/code-to-text/code
435 6 12
SUM_MANHATTAN_DISTANCES_PAIRS_POINTS.py
in data/transcoder_evaluation_gfg/python
31 1 12
REARRANGE_ARRAY_MAXIMUM_MINIMUM_FORM_SET_2_O1_EXTRA_SPACE.py
in data/transcoder_evaluation_gfg/python
45 1 12
K_SMALLEST_ELEMENTS_ORDER_USING_O1_EXTRA_SPACE.py
in data/transcoder_evaluation_gfg/python
48 1 12
REPLACE_OCCURRENCES_STRING_AB_C_WITHOUT_USING_EXTRA_SPACE.py
in data/transcoder_evaluation_gfg/python
56 1 12
PROGRAM_BEST_FIT_ALGORITHM_MEMORY_MANAGEMENT.py
in data/transcoder_evaluation_gfg/python
52 1 12
PROGRAM_REVERSE_STRING_ITERATIVE_RECURSIVE.py
in data/transcoder_evaluation_gfg/python
38 1 12
SORT_ARRAY_APPLYING_GIVEN_EQUATION.java
in data/transcoder_evaluation_gfg/java
157 2 12
run.py
in CodeXGLUE/Text-Code/text-to-code/code
462 7 11
SMALLEST_DIFFERENCE_PAIR_VALUES_TWO_UNSORTED_ARRAYS.py
in data/transcoder_evaluation_gfg/python
42 1 11
REARRANGE_ARRAY_MAXIMUM_MINIMUM_FORM_SET_2_O1_EXTRA_SPACE.java
in data/transcoder_evaluation_gfg/java
82 2 11
LONGEST_COMMON_SUBSTRING.java
in data/transcoder_evaluation_gfg/java
80 2 11