facebookresearch / CodeGen

File Size

The distribution of size of files (measured in lines of code).

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

Learn more...

File Size Overall

There are 2,111 files with 94,368 lines of code.

1 very long files (1,538 lines of code)
8 long files (5,465 lines of code)
14 medium size files (4,644 lines of codeclsfd_ftr_w_mp_ins)
23 small files (3,331 lines of code)
2,065 very small files (79,390 lines of code)

Legend:

1001+

501-1000

201-500

101-200

1-100

explore: zoomable circles | sunburst | 3D view

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 50)

File	# lines	# units
trainer.py in codegen_sources/model/src	1538	36
evaluator.py in codegen_sources/model/src/evaluation	982	16
utils.py in codegen_sources/model/src	876	39
train.py in codegen_sources/model	714	2
dataset_mode.py in codegen_sources/preprocessing/dataset_modes	640	24
transformer.py in codegen_sources/model/src/model	631	25
loader.py in codegen_sources/model/src/data	561	7
index.html in CodeXGLUE	554	-
run.py in CodeXGLUE/Code-Code/Clone-detection-BigCloneBench/code	507	12
run.py in CodeXGLUE/Text-Code/NL-code-search-Adv/code	487	10
run.py in CodeXGLUE/Text-Code/text-to-code/code	462	7
run.py in CodeXGLUE/Code-Text/code-to-text/code	435	6
dataset.py in codegen_sources/model/src/data	428	26
comp_acc_computation.py in codegen_sources/model/src/evaluation	405	14
__init__.py in codegen_sources/model/src/model	368	9
bobskater_obfuscator.py in codegen_sources/preprocessing/obfuscation	332	15
models.py in codegen_sources/wrappers	298	26
tokenizer.py in codegen_sources/wrappers	296	13
javalang_obfuscator.py in codegen_sources/preprocessing/obfuscation	251	7
translate.py in codegen_sources/model	228	3
python_processor.py in codegen_sources/preprocessing/lang_processors	227	6
optim.py in codegen_sources/model/src	220	10
dictionary.py in codegen_sources/model/src/data	207	11
utils.py in codegen_sources/preprocessing	199	16
preprocess.py in codegen_sources/preprocessing	189	1
cpp_processor.py in codegen_sources/preprocessing/lang_processors	186	6
deobfuscate.py in codegen_sources/model	182	3
tree_sitter_processor.py in codegen_sources/preprocessing/lang_processors	176	9
SORT_ARRAY_APPLYING_GIVEN_EQUATION.java in data/transcoder_evaluation_gfg/java	157	2
evaluator.py in CodeXGLUE/Code-Text/code-to-text/evaluator	155	9
bleu.py in CodeXGLUE/Code-Text/code-to-text/code	155	9
obfuscation_functions_mode.py in codegen_sources/preprocessing/dataset_modes	155	7
model.py in CodeXGLUE/Code-Text/code-to-text/code	153	12
beam.py in CodeXGLUE/Text-Code/text-to-code/code	147	12
java_processor.py in codegen_sources/preprocessing/lang_processors	141	6
multi-bleu.perl in codegen_sources/model/src/evaluation	139	2
bobskater_frameUtils.py in codegen_sources/preprocessing/obfuscation	139	17
cache.py in codegen_sources/model/src	136	19
obfuscation_mode.py in codegen_sources/preprocessing/dataset_modes	130	7
monolingual_mode.py in codegen_sources/preprocessing/dataset_modes	126	5
monolingual_functions_mode.py in codegen_sources/preprocessing/dataset_modes	122	5
subtoken_score.py in codegen_sources/model/src/evaluation	117	6
FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K_1.java in data/transcoder_evaluation_gfg/java	115	2
FIND_MAXIMUM_SUM_POSSIBLE_EQUAL_SUM_THREE_STACKS.java in data/transcoder_evaluation_gfg/java	104	2
FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K.java in data/transcoder_evaluation_gfg/java	104	2
slurm.py in codegen_sources/model/src	104	4
STOOGE_SORT.java in data/transcoder_evaluation_gfg/java	99	2
dataset.py in CodeXGLUE/Text-Code/text-to-code/code	97	4
SORT_EVEN_PLACED_ELEMENTS_INCREASING_ODD_PLACED_DECREASING_ORDER.java in data/transcoder_evaluation_gfg/java	96	2
SPLIT_ARRAY_ADD_FIRST_PART_END.java in data/transcoder_evaluation_gfg/java	96	2

Files With Most Units (Top 20)

File	# lines	# units
utils.py in codegen_sources/model/src	876	39
trainer.py in codegen_sources/model/src	1538	36
models.py in codegen_sources/wrappers	298	26
dataset.py in codegen_sources/model/src/data	428	26
transformer.py in codegen_sources/model/src/model	631	25
dataset_mode.py in codegen_sources/preprocessing/dataset_modes	640	24
cache.py in codegen_sources/model/src	136	19
bobskater_frameUtils.py in codegen_sources/preprocessing/obfuscation	139	17
evaluator.py in codegen_sources/model/src/evaluation	982	16
utils.py in codegen_sources/preprocessing	199	16
bobskater_obfuscator.py in codegen_sources/preprocessing/obfuscation	332	15
comp_acc_computation.py in codegen_sources/model/src/evaluation	405	14
tokenizer.py in codegen_sources/wrappers	296	13
beam.py in CodeXGLUE/Text-Code/text-to-code/code	147	12
model.py in CodeXGLUE/Code-Text/code-to-text/code	153	12
run.py in CodeXGLUE/Code-Code/Clone-detection-BigCloneBench/code	507	12
dictionary.py in codegen_sources/model/src/data	207	11
run.py in CodeXGLUE/Text-Code/NL-code-search-Adv/code	487	10
optim.py in codegen_sources/model/src	220	10
evaluator.py in CodeXGLUE/Code-Text/code-to-text/evaluator	155	9

Files With Long Lines (Top 20)

There are 977 files with lines longer than 120 characters. In total, there are 3035 long lines.

File	# lines	# units	# long lines
index.html in CodeXGLUE	554	-	29
FUNCTION_COPY_STRING_ITERATIVE_RECURSIVE_1.java in data/transcoder_evaluation_gfg/java	92	2	19
LEXICOGRAPHICALLY_LARGEST_SUBSEQUENCE_EVERY_CHARACTER_OCCURS_LEAST_K_TIMES.py in data/transcoder_evaluation_gfg/python	50	1	18
SWAP_TWO_NUMBERS_WITHOUT_USING_TEMPORARY_VARIABLE.java in data/transcoder_evaluation_gfg/java	68	2	17
PROGRAM_WORST_FIT_ALGORITHM_MEMORY_MANAGEMENT.py in data/transcoder_evaluation_gfg/python	52	1	16
SWAP_TWO_NUMBERS_WITHOUT_USING_TEMPORARY_VARIABLE.py in data/transcoder_evaluation_gfg/python	36	1	16
REMOVE_CONSECUTIVE_DUPLICATES_STRING.py in data/transcoder_evaluation_gfg/python	44	1	14
GIVEN_1S_2S_3S_KS_PRINT_ZIG_ZAG_WAY.py in data/transcoder_evaluation_gfg/python	56	1	14
run.py in CodeXGLUE/Code-Text/code-to-text/code	435	6	12
SUM_MANHATTAN_DISTANCES_PAIRS_POINTS.py in data/transcoder_evaluation_gfg/python	31	1	12
REARRANGE_ARRAY_MAXIMUM_MINIMUM_FORM_SET_2_O1_EXTRA_SPACE.py in data/transcoder_evaluation_gfg/python	45	1	12
K_SMALLEST_ELEMENTS_ORDER_USING_O1_EXTRA_SPACE.py in data/transcoder_evaluation_gfg/python	48	1	12
REPLACE_OCCURRENCES_STRING_AB_C_WITHOUT_USING_EXTRA_SPACE.py in data/transcoder_evaluation_gfg/python	56	1	12
PROGRAM_BEST_FIT_ALGORITHM_MEMORY_MANAGEMENT.py in data/transcoder_evaluation_gfg/python	52	1	12
PROGRAM_REVERSE_STRING_ITERATIVE_RECURSIVE.py in data/transcoder_evaluation_gfg/python	38	1	12
SORT_ARRAY_APPLYING_GIVEN_EQUATION.java in data/transcoder_evaluation_gfg/java	157	2	12
run.py in CodeXGLUE/Text-Code/text-to-code/code	462	7	11
SMALLEST_DIFFERENCE_PAIR_VALUES_TWO_UNSORTED_ARRAYS.py in data/transcoder_evaluation_gfg/python	42	1	11
REARRANGE_ARRAY_MAXIMUM_MINIMUM_FORM_SET_2_O1_EXTRA_SPACE.java in data/transcoder_evaluation_gfg/java	82	2	11
LONGEST_COMMON_SUBSTRING.java in data/transcoder_evaluation_gfg/java	80	2	11