The distribution of size of files (measured in lines of code).

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

File Size Overall

Legend:

1001+

501-1000

201-500

101-200

1-100

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 22)

File	# lines	# units
whxx_ngram_table.toml in resources	274	-
synthetic_data_creator.py in distant_supervision	267	13
ds_es_client.py in distant_supervision	173	7
stat_for_ner_category_to_wh_words.py in spark_scripts	101	6
data_models.py in distant_supervision	101	20
create_ds_synthetic_dataset.py in spark_scripts	90	2
text_preprocessor.py in distant_supervision	80	11
stat_computation.py in distant_supervision	78	10
entity_to_queries_mapper.py in distant_supervision	76	6
utils.py in distant_supervision	74	11
ner_entity_gatherer.py in distant_supervision	68	8
input_parser.py in distant_supervision	60	6
question_generator.py in distant_supervision	48	7
create_squad_ner_dataset.py in spark_scripts	44	2
squad_ner_creator.py in distant_supervision	43	4
write_sentence_level_es_index.py in spark_scripts	35	1
tokenize_and_ner_inputs.py in spark_scripts	34	1
whxx_ngram_table.py in distant_supervision	28	5
constants.py in distant_supervision	9	-
cfg setup.cfg in root	3	-
exceptions.py in distant_supervision	2	-
default_config.toml in resources	1	-

Files With Most Units (Top 17)

File	# lines	# units
data_models.py in distant_supervision	101	20
synthetic_data_creator.py in distant_supervision	267	13
text_preprocessor.py in distant_supervision	80	11
utils.py in distant_supervision	74	11
stat_computation.py in distant_supervision	78	10
ner_entity_gatherer.py in distant_supervision	68	8
question_generator.py in distant_supervision	48	7
ds_es_client.py in distant_supervision	173	7
stat_for_ner_category_to_wh_words.py in spark_scripts	101	6
input_parser.py in distant_supervision	60	6
entity_to_queries_mapper.py in distant_supervision	76	6
whxx_ngram_table.py in distant_supervision	28	5
squad_ner_creator.py in distant_supervision	43	4
create_ds_synthetic_dataset.py in spark_scripts	90	2
create_squad_ner_dataset.py in spark_scripts	44	2
tokenize_and_ner_inputs.py in spark_scripts	34	1
write_sentence_level_es_index.py in spark_scripts	35	1

Files With Long Lines (Top 3)

There are 3 files with lines longer than 120 characters. In total, there are 3 long lines.

File	# lines	# units	# long lines
synthetic_data_creator.py in distant_supervision	267	13	1
entity_to_queries_mapper.py in distant_supervision	76	6	1
ner_entity_gatherer.py in distant_supervision	68	8	1