facebookresearch / TransCoder
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 2,062 files with 84,245 lines of code.
    • 0 very long files (0 lines of code)
    • 3 long files (1,927 lines of code)
    • 8 medium size files (2,576 lines of codeclsfd_ftr_w_mp_ins)
    • 8 small files (1,132 lines of code)
    • 2,043 very small files (78,610 lines of code)
0% | 2% | 3% | 1% | 93%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 7% | 10% | 2% | 79%
java0% | 0% | 0% | 1% | 98%
cpp0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
XLM/src0% | 79% | 0% | 11% | 8%
preprocessing/src0% | 42% | 54% | 0% | 2%
XLM/src/data0% | 0% | 75% | 23% | <1%
XLM/src/model0% | 0% | 60% | 22% | 17%
XLM/src/evaluation0% | 0% | 99% | 0% | <1%
XLM0% | 0% | 89% | 0% | 10%
data/evaluation/geeks_for_geeks_successful_test_scripts/java0% | 0% | 0% | 1% | 98%
ROOT0% | 0% | 0% | 100% | 0%
data/evaluation/geeks_for_geeks_successful_test_scripts/cpp0% | 0% | 0% | 0% | 100%
data/evaluation/geeks_for_geeks_successful_test_scripts/python0% | 0% | 0% | 0% | 100%
preprocessing0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
code_tokenizer.py
in preprocessing/src
691 28
trainer.py
in XLM/src
681 29
utils.py
in XLM/src
555 27
transformer.py
in XLM/src/model
478 22
javalang_tokenizer.py
in preprocessing/src
455 31
evaluator.py
in XLM/src/evaluation
447 13
dataset.py
in XLM/src/data
284 22
loader.py
in XLM/src/data
246 7
train.py
in XLM
235 2
dataset.py
in preprocessing/src
228 13
utils.py
in preprocessing/src
203 21
optim.py
in XLM/src
184 10
__init__.py
in XLM/src/model
177 3
dictionary.py
in XLM/src/data
167 11
SORT_ARRAY_APPLYING_GIVEN_EQUATION.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
157 2
translate.py
in root
124 3
FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K_1.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
115 2
FIND_MAXIMUM_SUM_POSSIBLE_EQUAL_SUM_THREE_STACKS.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
104 2
FIND_THREE_ELEMENT_FROM_DIFFERENT_THREE_ARRAYS_SUCH_THAT_THAT_A_B_C_K.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
104 2
STOOGE_SORT.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
99 2
slurm.py
in XLM/src
98 4
SORT_EVEN_PLACED_ELEMENTS_INCREASING_ODD_PLACED_DECREASING_ORDER.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
96 2
SPLIT_ARRAY_ADD_FIRST_PART_END.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
96 2
HIGHWAY_BILLBOARD_PROBLEM.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
93 2
SCHEDULE_JOBS_SERVER_GETS_EQUAL_LOAD.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
92 2
FUNCTION_COPY_STRING_ITERATIVE_RECURSIVE_1.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
92 2
DYNAMIC_PROGRAMMING_SET_37_BOOLEAN_PARENTHESIZATION_PROBLEM.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
90 2
CHECK_TWO_GIVEN_CIRCLES_TOUCH_INTERSECT.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
90 2
preprocess.py
in preprocessing
89 2
FIND_DISTINCT_SUBSET_SUBSEQUENCE_SUMS_ARRAY.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
89 2
CHANGE_ARRAY_PERMUTATION_NUMBERS_1_N.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
87 2
MINIMUM_COST_CUT_BOARD_SQUARES.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
87 2
DYNAMIC_PROGRAMMING_SET_37_BOOLEAN_PARENTHESIZATION_PROBLEM.cpp
in data/evaluation/geeks_for_geeks_successful_test_scripts/cpp
87 2
COUNT_PAIRS_TWO_SORTED_ARRAYS_WHOSE_SUM_EQUAL_GIVEN_VALUE_X_2.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
86 2
FIND_THE_MINIMUM_DISTANCE_BETWEEN_TWO_NUMBERS_1.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
86 2
K_TH_MISSING_ELEMENT_INCREASING_SEQUENCE_NOT_PRESENT_GIVEN_SEQUENCE.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
86 2
REARRANGE_POSITIVE_AND_NEGATIVE_NUMBERS_PUBLISH.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
84 2
COUNT_PAIRS_TWO_SORTED_ARRAYS_WHOSE_SUM_EQUAL_GIVEN_VALUE_X_1.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
83 2
K_TH_ELEMENT_TWO_SORTED_ARRAYS.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
83 2
MINIMUM_COST_TO_FILL_GIVEN_WEIGHT_IN_A_BAG.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
83 2
LONGEST_COMMON_INCREASING_SUBSEQUENCE_LCS_LIS.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
83 2
REARRANGE_ARRAY_MAXIMUM_MINIMUM_FORM_SET_2_O1_EXTRA_SPACE.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
82 2
STACK_PERMUTATIONS_CHECK_IF_AN_ARRAY_IS_STACK_PERMUTATION_OF_OTHER.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
82 2
MAKE_LARGEST_PALINDROME_CHANGING_K_DIGITS.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
81 2
COUNT_PAIRS_TWO_SORTED_ARRAYS_WHOSE_SUM_EQUAL_GIVEN_VALUE_X.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
81 2
MEDIAN_OF_TWO_SORTED_ARRAYS.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
81 2
REMOVE_MINIMUM_NUMBER_ELEMENTS_NO_COMMON_ELEMENT_EXIST_ARRAY.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
81 2
PRODUCT_MAXIMUM_FIRST_ARRAY_MINIMUM_SECOND_1.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
81 2
embedder.py
in XLM/src/model
80 7
LONGEST_COMMON_SUBSTRING.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
80 2
Files With Most Units (Top 20)
File# lines# units
javalang_tokenizer.py
in preprocessing/src
455 31
trainer.py
in XLM/src
681 29
code_tokenizer.py
in preprocessing/src
691 28
utils.py
in XLM/src
555 27
transformer.py
in XLM/src/model
478 22
dataset.py
in XLM/src/data
284 22
utils.py
in preprocessing/src
203 21
evaluator.py
in XLM/src/evaluation
447 13
dataset.py
in preprocessing/src
228 13
dictionary.py
in XLM/src/data
167 11
optim.py
in XLM/src
184 10
embedder.py
in XLM/src/model
80 7
loader.py
in XLM/src/data
246 7
slurm.py
in XLM/src
98 4
pretrain.py
in XLM/src/model
59 4
translate.py
in root
124 3
logger.py
in XLM/src
38 3
__init__.py
in XLM/src/model
177 3
train.py
in XLM
235 2
preprocess.py
in preprocessing
89 2
Files With Long Lines (Top 20)

There are 982 files with lines longer than 120 characters. In total, there are 3030 long lines.

File# lines# units# long lines
FUNCTION_COPY_STRING_ITERATIVE_RECURSIVE_1.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
92 2 19
LEXICOGRAPHICALLY_LARGEST_SUBSEQUENCE_EVERY_CHARACTER_OCCURS_LEAST_K_TIMES.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
50 1 18
SWAP_TWO_NUMBERS_WITHOUT_USING_TEMPORARY_VARIABLE.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
68 2 17
PROGRAM_WORST_FIT_ALGORITHM_MEMORY_MANAGEMENT.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
52 1 16
SWAP_TWO_NUMBERS_WITHOUT_USING_TEMPORARY_VARIABLE.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
36 1 16
REPLACE_OCCURRENCES_STRING_AB_C_WITHOUT_USING_EXTRA_SPACE.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
40 1 14
REMOVE_CONSECUTIVE_DUPLICATES_STRING.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
44 1 14
GIVEN_1S_2S_3S_KS_PRINT_ZIG_ZAG_WAY.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
56 1 14
REARRANGE_ARRAY_MAXIMUM_MINIMUM_FORM_SET_2_O1_EXTRA_SPACE.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
45 1 12
K_SMALLEST_ELEMENTS_ORDER_USING_O1_EXTRA_SPACE.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
48 1 12
PROGRAM_BEST_FIT_ALGORITHM_MEMORY_MANAGEMENT.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
52 1 12
PROGRAM_REVERSE_STRING_ITERATIVE_RECURSIVE.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
38 1 12
SORT_ARRAY_APPLYING_GIVEN_EQUATION.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
157 2 12
REARRANGE_ARRAY_MAXIMUM_MINIMUM_FORM_SET_2_O1_EXTRA_SPACE.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
82 2 11
LONGEST_COMMON_SUBSTRING.java
in data/evaluation/geeks_for_geeks_successful_test_scripts/java
80 2 11
FIND_TRIPLETS_ARRAY_WHOSE_SUM_EQUAL_ZERO_2.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
51 1 10
SORT_ARRAY_APPLYING_GIVEN_EQUATION.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
65 1 10
THIRD_LARGEST_ELEMENT_ARRAY_DISTINCT_ELEMENTS.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
50 1 10
ODD_EVEN_SORT_BRICK_SORT.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
46 1 10
RECURSIVE_INSERTION_SORT.py
in data/evaluation/geeks_for_geeks_successful_test_scripts/python
42 1 10