microsoft / AzureML-BERT
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 24 files with 4,367 lines of code.
    • 0 very long files (0 lines of code)
    • 3 long files (2,216 lines of code)
    • 3 medium size files (1,168 lines of codeclsfd_ftr_w_mp_ins)
    • 4 small files (545 lines of code)
    • 14 very small files (438 lines of code)
0% | 50% | 26% | 12% | 10%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 50% | 26% | 12% | 10%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
finetune0% | 95% | 0% | 0% | 4%
finetune/TensorFlow0% | 92% | 0% | 0% | 7%
pretrain/PyTorch0% | 0% | 53% | 32% | 13%
finetune/PyTorch0% | 0% | 81% | 18% | 0%
pretrain/PyTorch/dataprep0% | 0% | 0% | 0% | 100%
Longest Files (Top 24)
File# lines# units
run_classifier_azureml.py
in finetune
814 61
run_squad_azureml.py
in finetune
768 16
run_classifier.py
in finetune/TensorFlow
634 39
run_classifier_azureml.py
in finetune/PyTorch
448 24
train.py
in pretrain/PyTorch
393 6
distributed_apex.py
in pretrain/PyTorch
327 20
sources.py
in pretrain/PyTorch
178 12
dataset.py
in pretrain/PyTorch
147 11
models.py
in pretrain/PyTorch
115 16
azureml_bert_util.py
in finetune/PyTorch
105 8
evaluate_squad.py
in finetune
74 5
create_pretraining.py
in pretrain/PyTorch/dataprep
59 1
download_model_and_dataset.py
in finetune/TensorFlow
55 -
checkpoint.py
in pretrain/PyTorch
34 3
configuration.py
in pretrain/PyTorch
34 15
benchmark.py
in pretrain/PyTorch
33 2
optimization.py
in pretrain/PyTorch
29 4
azureml_adapter.py
in pretrain/PyTorch
27 5
single_line_doc_file_creation.py
in pretrain/PyTorch/dataprep
24 -
split_data_into_files.py
in pretrain/PyTorch/dataprep
23 -
sentence_segmentation.py
in pretrain/PyTorch/dataprep
16 -
logger.py
in pretrain/PyTorch
16 3
utils.py
in pretrain/PyTorch
8 1
text.py
in pretrain/PyTorch
6 2
Files With Most Units (Top 20)
File# lines# units
run_classifier_azureml.py
in finetune
814 61
run_classifier.py
in finetune/TensorFlow
634 39
run_classifier_azureml.py
in finetune/PyTorch
448 24
distributed_apex.py
in pretrain/PyTorch
327 20
run_squad_azureml.py
in finetune
768 16
models.py
in pretrain/PyTorch
115 16
configuration.py
in pretrain/PyTorch
34 15
sources.py
in pretrain/PyTorch
178 12
dataset.py
in pretrain/PyTorch
147 11
azureml_bert_util.py
in finetune/PyTorch
105 8
train.py
in pretrain/PyTorch
393 6
evaluate_squad.py
in finetune
74 5
azureml_adapter.py
in pretrain/PyTorch
27 5
optimization.py
in pretrain/PyTorch
29 4
checkpoint.py
in pretrain/PyTorch
34 3
logger.py
in pretrain/PyTorch
16 3
benchmark.py
in pretrain/PyTorch
33 2
text.py
in pretrain/PyTorch
6 2
create_pretraining.py
in pretrain/PyTorch/dataprep
59 1
utils.py
in pretrain/PyTorch
8 1
Files With Long Lines (Top 10)

There are 10 files with lines longer than 120 characters. In total, there are 33 long lines.

File# lines# units# long lines
train.py
in pretrain/PyTorch
393 6 9
download_model_and_dataset.py
in finetune/TensorFlow
55 - 4
models.py
in pretrain/PyTorch
115 16 4
run_classifier_azureml.py
in finetune
814 61 3
run_squad_azureml.py
in finetune
768 16 3
sources.py
in pretrain/PyTorch
178 12 3
run_classifier_azureml.py
in finetune/PyTorch
448 24 2
dataset.py
in pretrain/PyTorch
147 11 2
distributed_apex.py
in pretrain/PyTorch
327 20 2
optimization.py
in pretrain/PyTorch
29 4 1