facebookresearch / libri-light
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 66 files with 4,779 lines of code.
    • 0 very long files (0 lines of code)
    • 0 long files (0 lines of code)
    • 6 medium size files (1,445 lines of codeclsfd_ftr_w_mp_ins)
    • 9 small files (1,242 lines of code)
    • 51 very small files (2,092 lines of code)
0% | 0% | 30% | 25% | 43%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 0% | 33% | 29% | 37%
cfg0% | 0% | 0% | 0% | 100%
pyx0% | 0% | 0% | 0% | 100%
yml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
eval0% | 0% | 56% | 43% | 0%
eval/ABX_src0% | 0% | 62% | 23% | 14%
data_preparation/metadata_completion0% | 0% | 37% | 21% | 40%
data_preparation0% | 0% | 31% | 14% | 54%
data_preparation/split_librilight0% | 0% | 42% | 27% | 30%
eval/PER_src0% | 0% | 0% | 88% | 11%
data_preparation/text_retrieval0% | 0% | 0% | 32% | 67%
baselines/TDS0% | 0% | 0% | 0% | 100%
data_preparation/rebuild_limited_train0% | 0% | 0% | 0% | 100%
eval/WER_src0% | 0% | 0% | 0% | 100%
ROOT0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
abx_iterators.py
in eval/ABX_src
299 30
utilities.py
in data_preparation/metadata_completion
246 24
download_librivox.py
in data_preparation
241 11
eval_WER.py
in eval
236 11
puts_json.py
in data_preparation/split_librilight
213 9
eval_PER.py
in eval
210 7
CPC_loader.py
in eval
180 20
simplePhonemLearner.py
in eval/PER_src
171 12
eval_ABX.py
in eval
160 7
DuplicateSearch.py
in data_preparation/metadata_completion
140 7
split.py
in data_preparation/split_librilight
137 7
bartleby.py
in data_preparation/text_retrieval
116 12
seq_alignment.py
in eval/PER_src
116 6
calculate_snr.py
in data_preparation
111 6
abx_group_computation.py
in eval/ABX_src
111 10
text_cleaner.py
in data_preparation/metadata_completion
99 3
utils.py
in data_preparation/rebuild_limited_train
97 8
hathitrust.py
in data_preparation/text_retrieval
96 9
simple_dataset.py
in eval/WER_src
90 6
make_vad_inputs.py
in data_preparation
88 5
ReaderScapper.py
in data_preparation/metadata_completion
82 7
complete_metadata.py
in data_preparation
81 2
main_lesson.py
in data_preparation/text_retrieval
78 12
cut_by_vad.py
in data_preparation
77 5
unzip_and_convert.py
in data_preparation
70 3
prepare_vads.py
in data_preparation/split_librilight
67 4
GenreScrapper.py
in data_preparation/metadata_completion
66 7
wl_decoder.py
in eval/WER_src
62 3
pyx
dtw.pyx
in eval/ABX_src
59 1
plot.py
in data_preparation
57 4
materialize_split.py
in data_preparation/split_librilight
53 3
sample_10h.py
in data_preparation/rebuild_limited_train
53 2
build_all_stats.py
in data_preparation
48 -
select_1h.py
in data_preparation/rebuild_limited_train
46 2
archive_org.py
in data_preparation/text_retrieval
45 4
letter_ctc.py
in eval/WER_src
41 5
split_1h_in10min.py
in data_preparation/rebuild_limited_train
36 2
cfg
1h+pseudo-label_letter_37M_TDS.cfg
in baselines/TDS/experiments/config/training
35 -
cfg
10h+pseudo-label_letter_37M_TDS.cfg
in baselines/TDS/experiments/config/training
35 -
cfg
10h+pseudo-label_phone_37M_TDS.cfg
in baselines/TDS/experiments/config/training
34 -
cfg
1h+pseudo-label_phone_37M_TDS.cfg
in baselines/TDS/experiments/config/training
34 -
get_stats.py
in data_preparation/rebuild_limited_train
33 1
extract_test_speakers.py
in data_preparation/split_librilight
32 2
pyx
per_operator.pyx
in eval/PER_src
31 1
cfg
10h_phone_20M_TDS.cfg
in baselines/TDS/experiments/config/training
31 -
cfg
1h_phone_20M_TDS.cfg
in baselines/TDS/experiments/config/training
31 -
cfg
10h_letter_20M_TDS.cfg
in baselines/TDS/experiments/config/training
31 -
cfg
1h_letter_20M_TDS.cfg
in baselines/TDS/experiments/config/training
31 -
clean_texts.py
in data_preparation/rebuild_limited_train
23 1
environment.yml
in root
22 -
Files With Most Units (Top 20)
File# lines# units
abx_iterators.py
in eval/ABX_src
299 30
utilities.py
in data_preparation/metadata_completion
246 24
CPC_loader.py
in eval
180 20
bartleby.py
in data_preparation/text_retrieval
116 12
main_lesson.py
in data_preparation/text_retrieval
78 12
simplePhonemLearner.py
in eval/PER_src
171 12
download_librivox.py
in data_preparation
241 11
eval_WER.py
in eval
236 11
abx_group_computation.py
in eval/ABX_src
111 10
puts_json.py
in data_preparation/split_librilight
213 9
hathitrust.py
in data_preparation/text_retrieval
96 9
utils.py
in data_preparation/rebuild_limited_train
97 8
split.py
in data_preparation/split_librilight
137 7
ReaderScapper.py
in data_preparation/metadata_completion
82 7
GenreScrapper.py
in data_preparation/metadata_completion
66 7
DuplicateSearch.py
in data_preparation/metadata_completion
140 7
eval_ABX.py
in eval
160 7
eval_PER.py
in eval
210 7
calculate_snr.py
in data_preparation
111 6
seq_alignment.py
in eval/PER_src
116 6
Files With Long Lines (Top 8)

There are 8 files with lines longer than 120 characters. In total, there are 17 long lines.

File# lines# units# long lines
genre_folding.py
in data_preparation/metadata_completion
18 - 8
split.py
in data_preparation/split_librilight
137 7 2
sample_10h.py
in data_preparation/rebuild_limited_train
53 2 2
puts_json.py
in data_preparation/split_librilight
213 9 1
get_stats.py
in data_preparation/rebuild_limited_train
33 1 1
simplePhonemLearner.py
in eval/PER_src
171 12 1
seq_alignment.py
in eval/PER_src
116 6 1
eval_WER.py
in eval
236 11 1