pytorch / text
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 94 files with 6,762 lines of code.
    • 0 very long files (0 lines of code)
    • 0 long files (0 lines of code)
    • 8 medium size files (2,211 lines of codeclsfd_ftr_w_mp_ins)
    • 14 small files (2,050 lines of code)
    • 72 very small files (2,501 lines of code)
0% | 0% | 32% | 30% | 36%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 0% | 28% | 32% | 38%
cpp0% | 0% | 63% | 23% | 13%
bash0% | 0% | 0% | 100% | 0%
h0% | 0% | 0% | 0% | 100%
yaml0% | 0% | 0% | 0% | 100%
yml0% | 0% | 0% | 0% | 100%
cfg0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
torchtext/csrc0% | 0% | 51% | 18% | 29%
torchtext/experimental0% | 0% | 49% | 25% | 25%
torchtext/models0% | 0% | 45% | 33% | 20%
torchtext/data0% | 0% | 43% | 46% | 9%
torchtext/datasets0% | 0% | 17% | 14% | 68%
torchtext/vocab0% | 0% | 61% | 0% | 38%
torchtext0% | 0% | 0% | 65% | 34%
benchmark0% | 0% | 0% | 56% | 43%
build_tools/setup_helpers0% | 0% | 0% | 99% | <1%
packaging0% | 0% | 0% | 100% | 0%
torchtext/nn0% | 0% | 0% | 0% | 100%
ROOT0% | 0% | 0% | 0% | 100%
packaging/torchtext0% | 0% | 0% | 0% | 100%
packaging/vs20190% | 0% | 0% | 0% | 100%
build_tools/conda0% | 0% | 0% | 0% | 100%
torchtext/_internal0% | 0% | 0% | 0% | 100%
build_tools0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
vectors.py
in torchtext/experimental
490 12
vocab.cpp
in torchtext/csrc
282 18
vectors.cpp
in torchtext/csrc
267 9
modules.py
in torchtext/models/roberta
267 12
datasets_utils.py
in torchtext/data
260 25
gpt2_bpe_tokenizer.cpp
in torchtext/csrc
241 20
iwslt2016.py
in torchtext/datasets
203 3
vectors.py
in torchtext/vocab
201 10
bundler.py
in torchtext/models/roberta
199 4
transforms.py
in torchtext
174 20
iwslt2017.py
in torchtext/datasets
169 3
register_pybindings.cpp
in torchtext/csrc
163 1
utils.py
in torchtext/data
158 14
extension.py
in build_tools/setup_helpers
151 12
transforms.py
in torchtext/experimental
151 25
benchmark_pytext_vocab.py
in benchmark
147 3
pkg_helpers.bash
in packaging
143 -
utils.py
in torchtext
141 9
register_torchbindings.cpp
in torchtext/csrc
124 1
functional.py
in torchtext/data
123 9
wmt14.py
in torchtext/experimental/datasets/raw
104 3
benchmark_vocab.py
in benchmark
103 3
multiheadattention.py
in torchtext/nn/modules
100 7
model.py
in torchtext/models/roberta
99 6
mha_block.py
in benchmark
88 1
setup.py
in root
81 4
vocab.h
in torchtext/csrc
78 4
sst2.py
in torchtext/experimental/datasets
74 4
functional.py
in torchtext
74 3
vocab.py
in torchtext/vocab
72 16
sentencepiece.cpp
in torchtext/csrc
67 8
multi30k.py
in torchtext/datasets
57 1
gpt2_bpe_tokenizer.h
in torchtext/csrc
56 -
__init__.py
in torchtext/datasets
53 -
wmtnewscrawl.py
in torchtext/experimental/datasets/raw
51 1
_download_hooks.py
in torchtext
48 3
imdb.py
in torchtext/datasets
48 1
conll2000chunking.py
in torchtext/datasets
48 1
meta.yaml
in packaging/torchtext
47 -
regex_tokenizer.cpp
in torchtext/csrc
46 4
yelpreviewpolarity.py
in torchtext/datasets
45 1
yahooanswers.py
in torchtext/datasets
45 1
get_checksums_fast_text.py
in torchtext/experimental/asset
44 3
udpos.py
in torchtext/datasets
43 1
penntreebank.py
in torchtext/datasets
43 1
amazonreviewfull.py
in torchtext/datasets
42 1
yelpreviewfull.py
in torchtext/datasets
41 1
enwik9.py
in torchtext/datasets
41 1
sogounews.py
in torchtext/datasets
41 1
wikitext103.py
in torchtext/datasets
41 1
Files With Most Units (Top 20)
File# lines# units
transforms.py
in torchtext/experimental
151 25
datasets_utils.py
in torchtext/data
260 25
gpt2_bpe_tokenizer.cpp
in torchtext/csrc
241 20
transforms.py
in torchtext
174 20
vocab.cpp
in torchtext/csrc
282 18
vocab.py
in torchtext/vocab
72 16
utils.py
in torchtext/data
158 14
extension.py
in build_tools/setup_helpers
151 12
vectors.py
in torchtext/experimental
490 12
modules.py
in torchtext/models/roberta
267 12
vectors.py
in torchtext/vocab
201 10
vectors.cpp
in torchtext/csrc
267 9
utils.py
in torchtext
141 9
functional.py
in torchtext/data
123 9
sentencepiece.cpp
in torchtext/csrc
67 8
multiheadattention.py
in torchtext/nn/modules
100 7
model.py
in torchtext/models/roberta
99 6
setup.py
in root
81 4
regex_tokenizer.cpp
in torchtext/csrc
46 4
vocab.h
in torchtext/csrc
78 4
Files With Long Lines (Top 20)

There are 31 files with lines longer than 120 characters. In total, there are 66 long lines.

File# lines# units# long lines
iwslt2016.py
in torchtext/datasets
203 3 8
iwslt2017.py
in torchtext/datasets
169 3 7
modules.py
in torchtext/models/roberta
267 12 6
bundler.py
in torchtext/models/roberta
199 4 4
pkg_helpers.bash
in packaging
143 - 3
wikitext103.py
in torchtext/datasets
41 1 3
amazonreviewpolarity.py
in torchtext/datasets
38 1 3
wikitext2.py
in torchtext/datasets
41 1 3
multi30k.py
in torchtext/datasets
57 1 3
vectors.py
in torchtext/experimental
490 12 2
transforms.py
in torchtext/experimental
151 25 2
yelpreviewfull.py
in torchtext/datasets
41 1 2
utils.py
in torchtext/data
158 14 2
benchmark_sentencepiece.py
in benchmark
27 1 1
wmt14.py
in torchtext/experimental/datasets/raw
104 3 1
vocab_factory.py
in torchtext/experimental
26 2 1
dbpedia.py
in torchtext/datasets
38 1 1
udpos.py
in torchtext/datasets
43 1 1
squad1.py
in torchtext/datasets
38 1 1
sogounews.py
in torchtext/datasets
41 1 1