awslabs / unsupervised-qa
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 22 files with 1,689 lines of code.
    • 0 very long files (0 lines of code)
    • 0 long files (0 lines of code)
    • 2 medium size files (541 lines of codeclsfd_ftr_w_mp_ins)
    • 3 small files (375 lines of code)
    • 17 very small files (773 lines of code)
0% | 0% | 32% | 22% | 45%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
toml0% | 0% | 99% | 0% | <1%
py0% | 0% | 18% | 26% | 54%
cfg0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
resources0% | 0% | 99% | 0% | <1%
distant_supervision0% | 0% | 24% | 24% | 51%
spark_scripts0% | 0% | 0% | 33% | 66%
ROOT0% | 0% | 0% | 0% | 100%
Longest Files (Top 22)
File# lines# units
whxx_ngram_table.toml
in resources
274 -
synthetic_data_creator.py
in distant_supervision
267 13
ds_es_client.py
in distant_supervision
173 7
stat_for_ner_category_to_wh_words.py
in spark_scripts
101 6
data_models.py
in distant_supervision
101 20
create_ds_synthetic_dataset.py
in spark_scripts
90 2
text_preprocessor.py
in distant_supervision
80 11
stat_computation.py
in distant_supervision
78 10
entity_to_queries_mapper.py
in distant_supervision
76 6
utils.py
in distant_supervision
74 11
ner_entity_gatherer.py
in distant_supervision
68 8
input_parser.py
in distant_supervision
60 6
question_generator.py
in distant_supervision
48 7
create_squad_ner_dataset.py
in spark_scripts
44 2
squad_ner_creator.py
in distant_supervision
43 4
write_sentence_level_es_index.py
in spark_scripts
35 1
tokenize_and_ner_inputs.py
in spark_scripts
34 1
whxx_ngram_table.py
in distant_supervision
28 5
constants.py
in distant_supervision
9 -
cfg
setup.cfg
in root
3 -
exceptions.py
in distant_supervision
2 -
default_config.toml
in resources
1 -
Files With Most Units (Top 17)
File# lines# units
data_models.py
in distant_supervision
101 20
synthetic_data_creator.py
in distant_supervision
267 13
text_preprocessor.py
in distant_supervision
80 11
utils.py
in distant_supervision
74 11
stat_computation.py
in distant_supervision
78 10
ner_entity_gatherer.py
in distant_supervision
68 8
question_generator.py
in distant_supervision
48 7
ds_es_client.py
in distant_supervision
173 7
stat_for_ner_category_to_wh_words.py
in spark_scripts
101 6
input_parser.py
in distant_supervision
60 6
entity_to_queries_mapper.py
in distant_supervision
76 6
whxx_ngram_table.py
in distant_supervision
28 5
squad_ner_creator.py
in distant_supervision
43 4
create_ds_synthetic_dataset.py
in spark_scripts
90 2
create_squad_ner_dataset.py
in spark_scripts
44 2
tokenize_and_ner_inputs.py
in spark_scripts
34 1
write_sentence_level_es_index.py
in spark_scripts
35 1
Files With Long Lines (Top 3)

There are 3 files with lines longer than 120 characters. In total, there are 3 long lines.

File# lines# units# long lines
synthetic_data_creator.py
in distant_supervision
267 13 1
entity_to_queries_mapper.py
in distant_supervision
76 6 1
ner_entity_gatherer.py
in distant_supervision
68 8 1