awslabs / pptod
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 47 files with 8,147 lines of code.
    • 0 very long files (0 lines of code)
    • 2 long files (1,135 lines of code)
    • 12 medium size files (3,381 lines of codeclsfd_ftr_w_mp_ins)
    • 14 small files (2,329 lines of code)
    • 19 very small files (1,302 lines of code)
0% | 13% | 41% | 28% | 15%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 13% | 41% | 28% | 15%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
E2E_TOD0% | 38% | 36% | 17% | 7%
DST0% | 0% | 56% | 25% | 17%
data/multiwoz0% | 0% | 46% | 41% | 11%
data/pre-training_corpora0% | 0% | 34% | 34% | 31%
Pretraining0% | 0% | 55% | 44% | 0%
IC0% | 0% | 45% | 54% | 0%
E2E_TOD/modelling0% | 0% | 0% | 0% | 100%
DST/modelling0% | 0% | 0% | 0% | 100%
IC/modelling0% | 0% | 0% | 0% | 100%
Pretraining/modelling0% | 0% | 0% | 0% | 100%
Longest Files (Top 47)
File# lines# units
eval.py
in E2E_TOD
614 20
dataclass.py
in E2E_TOD
521 20
dataclass.py
in DST
417 18
reader.py
in data/multiwoz/utlis
376 18
reader.py
in E2E_TOD
361 18
clean_dataset.py
in E2E_TOD
291 4
learn.py
in DST
290 3
process_schema_guided_dataset.py
in data/pre-training_corpora/utlis
259 11
dataclass.py
in Pretraining
246 13
process_mse2e_dataset.py
in data/pre-training_corpora/utlis
243 11
learn.py
in E2E_TOD
234 2
db_ops.py
in data/multiwoz/utlis
234 9
learn.py
in IC
225 1
db_ops.py
in E2E_TOD
205 9
pretrain.py
in Pretraining
199 1
utils.py
in data/multiwoz/utlis
199 22
utils.py
in E2E_TOD
198 22
postprocessing_dataset.py
in data/multiwoz/utlis
198 9
dst.py
in DST
192 4
process_frame_dataset.py
in data/pre-training_corpora/utlis
187 6
process_taskmaster_dataset.py
in data/pre-training_corpora/utlis
180 7
inference_utlis.py
in E2E_TOD
164 2
processing_funcs.py
in data/multiwoz/utlis
150 9
dataclass.py
in IC
146 10
inference_pptod.py
in E2E_TOD
145 2
inference.py
in DST
127 3
process_kvret_dataset.py
in data/pre-training_corpora/utlis
125 6
inference.py
in IC
119 2
ontology.py
in E2E_TOD
99 -
ontology.py
in data/multiwoz/utlis
99 -
ontology.py
in DST
99 -
process_camres676_dataset.py
in data/pre-training_corpora/utlis
92 5
e2e_inference_utlis.py
in E2E_TOD
90 2
T5Model.py
in E2E_TOD/modelling
89 5
process_woz_dataset.py
in data/pre-training_corpora/utlis
89 4
T5Model.py
in DST/modelling
89 5
process_metalwoz_dataset.py
in data/pre-training_corpora/utlis
88 3
tokenize_all_datasets.py
in data/pre-training_corpora/utlis
83 4
compute_joint_acc.py
in DST
61 1
inference_utlis.py
in DST
59 3
T5Model.py
in IC/modelling
58 5
config.py
in data/multiwoz/utlis
50 2
tokenize_intent_classification_dataset.py
in data/pre-training_corpora/utlis
40 2
process_intent_classification_dataset.py
in data/pre-training_corpora/utlis
39 -
config.py
in E2E_TOD
32 2
T5Model.py
in Pretraining/modelling
28 3
build_tokenizer.py
in data/pre-training_corpora/utlis
18 -
Files With Most Units (Top 20)
File# lines# units
utils.py
in E2E_TOD
198 22
utils.py
in data/multiwoz/utlis
199 22
eval.py
in E2E_TOD
614 20
dataclass.py
in E2E_TOD
521 20
reader.py
in E2E_TOD
361 18
reader.py
in data/multiwoz/utlis
376 18
dataclass.py
in DST
417 18
dataclass.py
in Pretraining
246 13
process_mse2e_dataset.py
in data/pre-training_corpora/utlis
243 11
process_schema_guided_dataset.py
in data/pre-training_corpora/utlis
259 11
dataclass.py
in IC
146 10
db_ops.py
in E2E_TOD
205 9
postprocessing_dataset.py
in data/multiwoz/utlis
198 9
db_ops.py
in data/multiwoz/utlis
234 9
processing_funcs.py
in data/multiwoz/utlis
150 9
process_taskmaster_dataset.py
in data/pre-training_corpora/utlis
180 7
process_frame_dataset.py
in data/pre-training_corpora/utlis
187 6
process_kvret_dataset.py
in data/pre-training_corpora/utlis
125 6
T5Model.py
in E2E_TOD/modelling
89 5
process_camres676_dataset.py
in data/pre-training_corpora/utlis
92 5
Files With Long Lines (Top 20)

There are 26 files with lines longer than 120 characters. In total, there are 95 long lines.

File# lines# units# long lines
eval.py
in E2E_TOD
614 20 14
learn.py
in DST
290 3 10
dst.py
in DST
192 4 7
dataclass.py
in E2E_TOD
521 20 6
learn.py
in IC
225 1 6
T5Model.py
in E2E_TOD/modelling
89 5 5
pretrain.py
in Pretraining
199 1 5
dataclass.py
in DST
417 18 5
T5Model.py
in DST/modelling
89 5 5
inference_pptod.py
in E2E_TOD
145 2 4
learn.py
in E2E_TOD
234 2 4
T5Model.py
in IC/modelling
58 5 4
inference_utlis.py
in E2E_TOD
164 2 3
dataclass.py
in Pretraining
246 13 3
reader.py
in data/multiwoz/utlis
376 18 2
inference.py
in DST
127 3 2
db_ops.py
in E2E_TOD
205 9 1
ontology.py
in E2E_TOD
99 - 1
clean_dataset.py
in E2E_TOD
291 4 1
utils.py
in E2E_TOD
198 22 1