tensorflow / datasets

File Size

The distribution of size of files (measured in lines of code).

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

Learn more...

File Size Overall

There are 680 files with 52,582 lines of code.

3 very long files (4,178 lines of code)
4 long files (2,502 lines of code)
40 medium size files (11,435 lines of codeclsfd_ftr_w_mp_ins)
119 small files (16,154 lines of code)
514 very small files (18,313 lines of code)

Legend:

1001+

501-1000

201-500

101-200

1-100

explore: zoomable circles | sunburst | 3D view

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 50)

File	# lines	# units
gem.py in tensorflow_datasets/text/gem	1535	4
covid19.py in tensorflow_datasets/structured/covid19	1451	3
siscore_labels.py in tensorflow_datasets/image_classification/siscore	1192	-
wmt.py in tensorflow_datasets/translate	769	19
dataset_builder.py in tensorflow_datasets/core	633	43
cbis_ddsm.py in tensorflow_datasets/image_classification	567	23
dataset_info.py in tensorflow_datasets/core	533	53
dataset_builder.py in tensorflow_datasets/d4rl	419	7
dataset_builder_testing.py in tensorflow_datasets/testing	412	26
download_manager.py in tensorflow_datasets/core/download	397	35
splits.py in tensorflow_datasets/core	388	41
c4.py in tensorflow_datasets/text	369	5
c4_utils.py in tensorflow_datasets/text	369	20
glue.py in tensorflow_datasets/text	367	6
corruptions.py in tensorflow_datasets/image_classification	350	24
proto waymo_dataset.proto in tensorflow_datasets/proto	346	-
text_encoder.py in tensorflow_datasets/core/deprecated/text	331	47
tfrecords_writer.py in tensorflow_datasets/core	315	20
super_glue.py in tensorflow_datasets/text	313	9
cars196.py in tensorflow_datasets/image_classification	312	5
feature.py in tensorflow_datasets/core/features	302	45
py_utils.py in tensorflow_datasets/core/utils	299	36
build.py in tensorflow_datasets/scripts/cli	297	10
rlu_rwrl.py in tensorflow_datasets/rl_unplugged/rlu_rwrl	289	12
naming.py in tensorflow_datasets/core	282	32
open_images.py in tensorflow_datasets/object_detection	281	8
subword_text_encoder.py in tensorflow_datasets/core/deprecated/text	280	22
star_cfq.py in tensorflow_datasets/text/star_cfq	277	11
movielens.py in tensorflow_datasets/structured	276	9
coco.py in tensorflow_datasets/object_detection	274	12
image_feature.py in tensorflow_datasets/core/features	266	27
kddcup99.py in tensorflow_datasets/structured/kddcup99	248	3
nsynth.py in tensorflow_datasets/audio	247	4
feature_test_case.py in tensorflow_datasets/testing	244	12
opus.py in tensorflow_datasets/translate	241	7
mnist.py in tensorflow_datasets/image_classification	241	10
wit_kaggle.py in tensorflow_datasets/vision_language/wit_kaggle	237	4
tfrecords_reader.py in tensorflow_datasets/core	235	12
trivia_qa.py in tensorflow_datasets/question_answering	228	6
wiki_auto.py in tensorflow_datasets/text_simplification/wiki_auto	226	3
split_builder.py in tensorflow_datasets/core	221	12
movielens_parsing.py in tensorflow_datasets/structured	219	6
sun.py in tensorflow_datasets/image_classification	213	15
caltech_birds.py in tensorflow_datasets/image_classification	212	9
load.py in tensorflow_datasets/core	208	12
mocking.py in tensorflow_datasets/testing	203	12
kitti.py in tensorflow_datasets/object_detection	201	6
robosuite_panda_pick_place_can.py in tensorflow_datasets/rlds/robosuite_panda_pick_place_can	199	3
math_dataset.py in tensorflow_datasets/text	199	5
waymo_open_dataset.py in tensorflow_datasets/object_detection	195	7

Files With Most Units (Top 20)

File	# lines	# units
dataset_info.py in tensorflow_datasets/core	533	53
text_encoder.py in tensorflow_datasets/core/deprecated/text	331	47
type_utils.py in tensorflow_datasets/core/utils	169	45
feature.py in tensorflow_datasets/core/features	302	45
dataset_builder.py in tensorflow_datasets/core	633	43
splits.py in tensorflow_datasets/core	388	41
py_utils.py in tensorflow_datasets/core/utils	299	36
download_manager.py in tensorflow_datasets/core/download	397	35
github_path.py in tensorflow_datasets/core/github_api	172	33
naming.py in tensorflow_datasets/core	282	32
lazy_imports_lib.py in tensorflow_datasets/core	153	31
image_feature.py in tensorflow_datasets/core/features	266	27
dataset_builder_testing.py in tensorflow_datasets/testing	412	26
corruptions.py in tensorflow_datasets/image_classification	350	24
cbis_ddsm.py in tensorflow_datasets/image_classification	567	23
subword_text_encoder.py in tensorflow_datasets/core/deprecated/text	280	22
sequence_feature.py in tensorflow_datasets/core/features	126	22
gpath.py in tensorflow_datasets/core/utils	120	21
features_dict.py in tensorflow_datasets/core/features	137	21
shuffle.py in tensorflow_datasets/core	136	20

Files With Long Lines (Top 20)

There are 69 files with lines longer than 120 characters. In total, there are 101 long lines.

File	# lines	# units	# long lines
c4.py in tensorflow_datasets/text	369	5	5
cbis_ddsm.py in tensorflow_datasets/image_classification	567	23	5
robosuite_panda_pick_place_can.py in tensorflow_datasets/rlds/robosuite_panda_pick_place_can	199	3	3
jsonl es_train.jsonl in tensorflow_datasets/text/gem/dummy_data/mlsum	3	-	3
jsonl de_train.jsonl in tensorflow_datasets/text/gem/dummy_data/mlsum	3	-	3
ru train.tgt.ru in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/russian	3	-	3
train.tgt.es in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/spanish	3	-	3
train.tgt.cs in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/czech	3	-	3
nl train.tgt.nl in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/dutch	3	-	3
wiki_auto.py in tensorflow_datasets/text_simplification/wiki_auto	226	3	2
tydi_qa.py in tensorflow_datasets/question_answering	81	3	2
xquad.py in tensorflow_datasets/question_answering	68	4	2
tfrecords_reader.py in tensorflow_datasets/core	235	12	2
gem.py in tensorflow_datasets/text/gem	1535	4	2
pt train.tgt.pt in tensorflow_datasets/text/gem/dummy_data/wiki_lingua/portuguese	3	-	2
star_cfq.py in tensorflow_datasets/text/star_cfq	277	11	2
jsonl train.jsonl in tensorflow_datasets/text/bool_q/dummy_data	2	-	2
jsonl dev.jsonl in tensorflow_datasets/text/bool_q/dummy_data	2	-	2
wmt.py in tensorflow_datasets/translate	769	19	2
xsum.py in tensorflow_datasets/summarization	85	3	1