The distribution of size of files (measured in lines of code).

Intro

File size measurements show the distribution of size of files.
Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.

File Size Overall

Legend:

1001+

501-1000

201-500

101-200

1-100

File Size per Extension

1001+

501-1000

201-500

101-200

1-100

File Size per Logical Decomposition

primary

1001+

501-1000

201-500

101-200

1-100

Longest Files (Top 40)

File	# lines	# units
train.py in scripts/reader	354	9
model.py in drqa/reader	299	17
drqa.py in drqa/pipeline	219	8
generate.py in scripts/distant	211	7
utils.py in drqa/reader	174	21
layers.py in drqa/reader	146	12
build_tfidf.py in scripts/retriever	126	7
eval.py in scripts/retriever	117	4
preprocess.py in scripts/reader	107	5
predict.py in scripts/pipeline	106	-
config.py in drqa/reader	89	4
predictor.py in drqa/reader	88	7
vector.py in drqa/reader	86	2
interactive.py in scripts/pipeline	84	2
predict.py in scripts/reader	83	-
corenlp_tokenizer.py in drqa/tokenizers	81	4
rnn_reader.py in drqa/reader	79	2
data.py in drqa/reader	79	16
build_db.py in scripts/retriever	78	5
tokenizer.py in drqa/tokenizers	76	14
regexp_tokenizer.py in drqa/tokenizers	71	2
utils.py in drqa/retriever	67	7
tfidf_doc_ranker.py in drqa/retriever	66	7
elastic_doc_ranker.py in drqa/retriever	58	9
interactive.py in scripts/reader	48	2
eval.py in scripts/pipeline	39	1
spacy_tokenizer.py in drqa/tokenizers	36	2
__init__.py in drqa/tokenizers	35	4
simple_tokenizer.py in drqa/tokenizers	33	2
doc_db.py in drqa/retriever	30	7
interactive.py in scripts/retriever	28	2
__init__.py in drqa/retriever	24	2
check_data.py in scripts/distant	21	-
webquestions.py in scripts/convert	19	-
setup.py in root	18	-
prep_wikipedia.py in scripts/retriever	17	1
squad.py in scripts/convert	16	-
__init__.py in drqa/reader	16	1
__init__.py in drqa/pipeline	15	1
__init__.py in drqa	13	-

Files With Most Units (Top 20)

File	# lines	# units
utils.py in drqa/reader	174	21
model.py in drqa/reader	299	17
data.py in drqa/reader	79	16
tokenizer.py in drqa/tokenizers	76	14
layers.py in drqa/reader	146	12
train.py in scripts/reader	354	9
elastic_doc_ranker.py in drqa/retriever	58	9
drqa.py in drqa/pipeline	219	8
build_tfidf.py in scripts/retriever	126	7
generate.py in scripts/distant	211	7
predictor.py in drqa/reader	88	7
doc_db.py in drqa/retriever	30	7
utils.py in drqa/retriever	67	7
tfidf_doc_ranker.py in drqa/retriever	66	7
preprocess.py in scripts/reader	107	5
build_db.py in scripts/retriever	78	5
eval.py in scripts/retriever	117	4
config.py in drqa/reader	89	4
__init__.py in drqa/tokenizers	35	4
corenlp_tokenizer.py in drqa/tokenizers	81	4

Files With Long Lines (Top 1)

There is only one file with lines longer than 120 characters. In total, there are 2 long lines.

File	# lines	# units	# long lines
elastic_doc_ranker.py in drqa/retriever	58	9	2