pytorch / text

File Size

The distribution of size of files (measured in lines of code).

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

Learn more...

File Size Overall

There are 94 files with 6,762 lines of code.

0 very long files (0 lines of code)
0 long files (0 lines of code)
8 medium size files (2,211 lines of codeclsfd_ftr_w_mp_ins)
14 small files (2,050 lines of code)
72 very small files (2,501 lines of code)

Legend:

1001+

501-1000

201-500

101-200

1-100

explore: zoomable circles | sunburst | 3D view

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 50)

File	# lines	# units
vectors.py in torchtext/experimental	490	12
vocab.cpp in torchtext/csrc	282	18
vectors.cpp in torchtext/csrc	267	9
modules.py in torchtext/models/roberta	267	12
datasets_utils.py in torchtext/data	260	25
gpt2_bpe_tokenizer.cpp in torchtext/csrc	241	20
iwslt2016.py in torchtext/datasets	203	3
vectors.py in torchtext/vocab	201	10
bundler.py in torchtext/models/roberta	199	4
transforms.py in torchtext	174	20
iwslt2017.py in torchtext/datasets	169	3
register_pybindings.cpp in torchtext/csrc	163	1
utils.py in torchtext/data	158	14
extension.py in build_tools/setup_helpers	151	12
transforms.py in torchtext/experimental	151	25
benchmark_pytext_vocab.py in benchmark	147	3
pkg_helpers.bash in packaging	143	-
utils.py in torchtext	141	9
register_torchbindings.cpp in torchtext/csrc	124	1
functional.py in torchtext/data	123	9
wmt14.py in torchtext/experimental/datasets/raw	104	3
benchmark_vocab.py in benchmark	103	3
multiheadattention.py in torchtext/nn/modules	100	7
model.py in torchtext/models/roberta	99	6
mha_block.py in benchmark	88	1
setup.py in root	81	4
vocab.h in torchtext/csrc	78	4
sst2.py in torchtext/experimental/datasets	74	4
functional.py in torchtext	74	3
vocab.py in torchtext/vocab	72	16
sentencepiece.cpp in torchtext/csrc	67	8
multi30k.py in torchtext/datasets	57	1
gpt2_bpe_tokenizer.h in torchtext/csrc	56	-
__init__.py in torchtext/datasets	53	-
wmtnewscrawl.py in torchtext/experimental/datasets/raw	51	1
_download_hooks.py in torchtext	48	3
imdb.py in torchtext/datasets	48	1
conll2000chunking.py in torchtext/datasets	48	1
meta.yaml in packaging/torchtext	47	-
regex_tokenizer.cpp in torchtext/csrc	46	4
yelpreviewpolarity.py in torchtext/datasets	45	1
yahooanswers.py in torchtext/datasets	45	1
get_checksums_fast_text.py in torchtext/experimental/asset	44	3
udpos.py in torchtext/datasets	43	1
penntreebank.py in torchtext/datasets	43	1
amazonreviewfull.py in torchtext/datasets	42	1
yelpreviewfull.py in torchtext/datasets	41	1
enwik9.py in torchtext/datasets	41	1
sogounews.py in torchtext/datasets	41	1
wikitext103.py in torchtext/datasets	41	1

Files With Most Units (Top 20)

File	# lines	# units
transforms.py in torchtext/experimental	151	25
datasets_utils.py in torchtext/data	260	25
gpt2_bpe_tokenizer.cpp in torchtext/csrc	241	20
transforms.py in torchtext	174	20
vocab.cpp in torchtext/csrc	282	18
vocab.py in torchtext/vocab	72	16
utils.py in torchtext/data	158	14
extension.py in build_tools/setup_helpers	151	12
vectors.py in torchtext/experimental	490	12
modules.py in torchtext/models/roberta	267	12
vectors.py in torchtext/vocab	201	10
vectors.cpp in torchtext/csrc	267	9
utils.py in torchtext	141	9
functional.py in torchtext/data	123	9
sentencepiece.cpp in torchtext/csrc	67	8
multiheadattention.py in torchtext/nn/modules	100	7
model.py in torchtext/models/roberta	99	6
setup.py in root	81	4
regex_tokenizer.cpp in torchtext/csrc	46	4
vocab.h in torchtext/csrc	78	4

Files With Long Lines (Top 20)

There are 31 files with lines longer than 120 characters. In total, there are 66 long lines.

File	# lines	# units	# long lines
iwslt2016.py in torchtext/datasets	203	3	8
iwslt2017.py in torchtext/datasets	169	3	7
modules.py in torchtext/models/roberta	267	12	6
bundler.py in torchtext/models/roberta	199	4	4
pkg_helpers.bash in packaging	143	-	3
wikitext103.py in torchtext/datasets	41	1	3
amazonreviewpolarity.py in torchtext/datasets	38	1	3
wikitext2.py in torchtext/datasets	41	1	3
multi30k.py in torchtext/datasets	57	1	3
vectors.py in torchtext/experimental	490	12	2
transforms.py in torchtext/experimental	151	25	2
yelpreviewfull.py in torchtext/datasets	41	1	2
utils.py in torchtext/data	158	14	2
benchmark_sentencepiece.py in benchmark	27	1	1
wmt14.py in torchtext/experimental/datasets/raw	104	3	1
vocab_factory.py in torchtext/experimental	26	2	1
dbpedia.py in torchtext/datasets	38	1	1
udpos.py in torchtext/datasets	43	1	1
squad1.py in torchtext/datasets	38	1	1
sogounews.py in torchtext/datasets	41	1	1