tensorflow / datasets
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 680 files with 52,582 lines of code.
    • 3 very long files (4,178 lines of code)
    • 4 long files (2,502 lines of code)
    • 40 medium size files (11,435 lines of codeclsfd_ftr_w_mp_ins)
    • 119 small files (16,154 lines of code)
    • 514 very small files (18,313 lines of code)
7% | 4% | 21% | 30% | 34%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py8% | 4% | 21% | 30% | 34%
proto0% | 0% | 63% | 19% | 17%
jsonl0% | 0% | 0% | 0% | 100%
ru0% | 0% | 0% | 0% | 100%
pt0% | 0% | 0% | 0% | 100%
es0% | 0% | 0% | 0% | 100%
cs0% | 0% | 0% | 0% | 100%
nl0% | 0% | 0% | 0% | 100%
toml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
tensorflow_datasets/text18% | 0% | 20% | 23% | 37%
tensorflow_datasets/structured36% | 0% | 18% | 18% | 26%
tensorflow_datasets/image_classification14% | 6% | 15% | 23% | 39%
tensorflow_datasets/core0% | 10% | 31% | 36% | 21%
tensorflow_datasets/translate0% | 48% | 15% | 0% | 35%
tensorflow_datasets/testing0% | 0% | 28% | 13% | 58%
tensorflow_datasets/object_detection0% | 0% | 42% | 52% | 5%
tensorflow_datasets/d4rl0% | 0% | 54% | 15% | 30%
tensorflow_datasets/proto0% | 0% | 98% | 0% | 1%
tensorflow_datasets/scripts0% | 0% | 23% | 24% | 52%
tensorflow_datasets/rl_unplugged0% | 0% | 33% | 19% | 46%
tensorflow_datasets/audio0% | 0% | 11% | 51% | 36%
tensorflow_datasets/vision_language0% | 0% | 30% | 64% | 5%
tensorflow_datasets/question_answering0% | 0% | 17% | 18% | 63%
tensorflow_datasets/text_simplification0% | 0% | 66% | 31% | 1%
tensorflow_datasets/image0% | 0% | 0% | 48% | 51%
tensorflow_datasets/summarization0% | 0% | 0% | 49% | 50%
tensorflow_datasets/video0% | 0% | 0% | 59% | 40%
tensorflow_datasets/rlds0% | 0% | 0% | 80% | 19%
tensorflow_datasets/graphs0% | 0% | 0% | 73% | 26%
tensorflow_datasets/ranking0% | 0% | 0% | 54% | 45%
tensorflow_datasets/robomimic0% | 0% | 0% | 84% | 15%
ROOT0% | 0% | 0% | 100% | 0%
tensorflow_datasets0% | 0% | 0% | 0% | 100%
tensorflow_datasets/time_series0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
gem.py
in tensorflow_datasets/text/gem
1535 4
covid19.py
in tensorflow_datasets/structured/covid19
1451 3
siscore_labels.py
in tensorflow_datasets/image_classification/siscore
1192 -
wmt.py
in tensorflow_datasets/translate
769 19
dataset_builder.py
in tensorflow_datasets/core
633 43
cbis_ddsm.py
in tensorflow_datasets/image_classification
567 23
dataset_info.py
in tensorflow_datasets/core
533 53
dataset_builder.py
in tensorflow_datasets/d4rl
419 7
dataset_builder_testing.py
in tensorflow_datasets/testing
412 26
download_manager.py
in tensorflow_datasets/core/download
397 35
splits.py
in tensorflow_datasets/core
388 41
c4.py
in tensorflow_datasets/text
369 5
c4_utils.py
in tensorflow_datasets/text
369 20
glue.py
in tensorflow_datasets/text
367 6
corruptions.py
in tensorflow_datasets/image_classification
350 24
proto
waymo_dataset.proto
in tensorflow_datasets/proto
346 -
text_encoder.py
in tensorflow_datasets/core/deprecated/text
331 47
tfrecords_writer.py
in tensorflow_datasets/core
315 20
super_glue.py
in tensorflow_datasets/text
313 9
cars196.py
in tensorflow_datasets/image_classification
312 5
feature.py
in tensorflow_datasets/core/features
302 45
py_utils.py
in tensorflow_datasets/core/utils
299 36
build.py
in tensorflow_datasets/scripts/cli
297 10
rlu_rwrl.py
in tensorflow_datasets/rl_unplugged/rlu_rwrl
289 12
naming.py
in tensorflow_datasets/core
282 32
open_images.py
in tensorflow_datasets/object_detection
281 8
subword_text_encoder.py
in tensorflow_datasets/core/deprecated/text
280 22
star_cfq.py
in tensorflow_datasets/text/star_cfq
277 11
movielens.py
in tensorflow_datasets/structured
276 9
coco.py
in tensorflow_datasets/object_detection
274 12
image_feature.py
in tensorflow_datasets/core/features
266 27
kddcup99.py
in tensorflow_datasets/structured/kddcup99
248 3
nsynth.py
in tensorflow_datasets/audio
247 4
feature_test_case.py
in tensorflow_datasets/testing
244 12
opus.py
in tensorflow_datasets/translate
241 7
mnist.py
in tensorflow_datasets/image_classification
241 10
wit_kaggle.py
in tensorflow_datasets/vision_language/wit_kaggle
237 4
tfrecords_reader.py
in tensorflow_datasets/core
235 12
trivia_qa.py
in tensorflow_datasets/question_answering
228 6
wiki_auto.py
in tensorflow_datasets/text_simplification/wiki_auto
226 3
split_builder.py
in tensorflow_datasets/core
221 12
movielens_parsing.py
in tensorflow_datasets/structured
219 6
sun.py
in tensorflow_datasets/image_classification
213 15
caltech_birds.py
in tensorflow_datasets/image_classification
212 9
load.py
in tensorflow_datasets/core
208 12
mocking.py
in tensorflow_datasets/testing
203 12
kitti.py
in tensorflow_datasets/object_detection
201 6
robosuite_panda_pick_place_can.py
in tensorflow_datasets/rlds/robosuite_panda_pick_place_can
199 3
math_dataset.py
in tensorflow_datasets/text
199 5
waymo_open_dataset.py
in tensorflow_datasets/object_detection
195 7
Files With Most Units (Top 20)
File# lines# units
dataset_info.py
in tensorflow_datasets/core
533 53
text_encoder.py
in tensorflow_datasets/core/deprecated/text
331 47
type_utils.py
in tensorflow_datasets/core/utils
169 45
feature.py
in tensorflow_datasets/core/features
302 45
dataset_builder.py
in tensorflow_datasets/core
633 43
splits.py
in tensorflow_datasets/core
388 41
py_utils.py
in tensorflow_datasets/core/utils
299 36
download_manager.py
in tensorflow_datasets/core/download
397 35
github_path.py
in tensorflow_datasets/core/github_api
172 33
naming.py
in tensorflow_datasets/core
282 32
lazy_imports_lib.py
in tensorflow_datasets/core
153 31
image_feature.py
in tensorflow_datasets/core/features
266 27
dataset_builder_testing.py
in tensorflow_datasets/testing
412 26
corruptions.py
in tensorflow_datasets/image_classification
350 24
cbis_ddsm.py
in tensorflow_datasets/image_classification
567 23
subword_text_encoder.py
in tensorflow_datasets/core/deprecated/text
280 22
sequence_feature.py
in tensorflow_datasets/core/features
126 22
gpath.py
in tensorflow_datasets/core/utils
120 21
features_dict.py
in tensorflow_datasets/core/features
137 21
shuffle.py
in tensorflow_datasets/core
136 20
Files With Long Lines (Top 20)

There are 69 files with lines longer than 120 characters. In total, there are 101 long lines.

File# lines# units# long lines
c4.py
in tensorflow_datasets/text
369 5 5
cbis_ddsm.py
in tensorflow_datasets/image_classification
567 23 5
robosuite_panda_pick_place_can.py
in tensorflow_datasets/rlds/robosuite_panda_pick_place_can
199 3 3
jsonl
es_train.jsonl
in tensorflow_datasets/text/gem/dummy_data/mlsum
3 - 3
jsonl
de_train.jsonl
in tensorflow_datasets/text/gem/dummy_data/mlsum
3 - 3
ru
train.tgt.ru
in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/russian
3 - 3
train.tgt.es
in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/spanish
3 - 3
train.tgt.cs
in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/czech
3 - 3
nl
train.tgt.nl
in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/dutch
3 - 3
wiki_auto.py
in tensorflow_datasets/text_simplification/wiki_auto
226 3 2
tydi_qa.py
in tensorflow_datasets/question_answering
81 3 2
xquad.py
in tensorflow_datasets/question_answering
68 4 2
tfrecords_reader.py
in tensorflow_datasets/core
235 12 2
gem.py
in tensorflow_datasets/text/gem
1535 4 2
pt
train.tgt.pt
in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/portuguese
3 - 2
star_cfq.py
in tensorflow_datasets/text/star_cfq
277 11 2
jsonl
train.jsonl
in tensorflow_datasets/text/bool_q/dummy_data
2 - 2
jsonl
dev.jsonl
in tensorflow_datasets/text/bool_q/dummy_data
2 - 2
wmt.py
in tensorflow_datasets/translate
769 19 2
xsum.py
in tensorflow_datasets/summarization
85 3 1