microsoft / BlingFire
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 694 files with 159,678 lines of code.
    • 10 very long files (73,438 lines of code)
    • 17 long files (10,404 lines of code)
    • 119 medium size files (36,416 lines of codeclsfd_ftr_w_mp_ins)
    • 159 small files (22,879 lines of code)
    • 389 very small files (16,541 lines of code)
45% | 6% | 22% | 14% | 10%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
cxx99% | 0% | <1% | 0% | 0%
cpp6% | 13% | 44% | 27% | 7%
h0% | 7% | 37% | 15% | 39%
htm0% | 100% | 0% | 0% | 0%
gnu0% | 100% | 0% | 0% | 0%
cmd0% | 0% | 25% | 54% | 19%
py0% | 0% | 26% | 21% | 51%
cs0% | 0% | 68% | 31% | 0%
y0% | 0% | 100% | 0% | 0%
TXT0% | 0% | 0% | 0% | 100%
js0% | 0% | 0% | 0% | 100%
html0% | 0% | 0% | 0% | 100%
yml0% | 0% | 0% | 0% | 100%
inc0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
blingfiretools68% | 5% | 19% | 6% | <1%
ldbsrc93% | 1% | <1% | <1% | 3%
blingfireclient.library49% | 1% | 25% | 12% | 10%
blingfirecompile.library0% | 13% | 38% | 25% | 22%
doc0% | 100% | 0% | 0% | 0%
scripts0% | 0% | 27% | 53% | 19%
nuget0% | 0% | 68% | 31% | 0%
dist-pypi0% | 0% | 0% | 87% | 12%
wasm0% | 0% | 0% | 0% | 100%
ROOT0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
cxx
BlingFireTokLibSbdData.cxx
in blingfiretools/blingfiretokdll
21006 -
cxx
BlingFireTokLibSbdData.cxx
in ldbsrc/sbd
21006 -
cxx
BlingFireTokLibWbdData.cxx
in blingfiretools/blingfiretokdll
7776 -
cxx
BlingFireTokLibWbdData.cxx
in ldbsrc/wbd
7776 -
cxx
FANormalizeDiacriticsMapPreserve.cxx
in blingfireclient.library/src
4103 -
cxx
FANormalizeDiacriticsMapProd.cxx
in blingfireclient.library/src
4103 -
cxx
FANormalizeDiacriticsMapRemove.cxx
in blingfireclient.library/src
4103 -
FAUtf32ToLower.cpp
in blingfireclient.library/src
1239 -
FAUtf32ToUpper.cpp
in blingfireclient.library/src
1239 -
blingfiretokdll.cpp
in blingfiretools/blingfiretokdll
1087 27
FADfaPack_triv.cpp
in blingfirecompile.library/src
952 37
lex.htm
in doc
817 -
FAPrmInterpreter_t.h
in blingfirecompile.library/inc
706 11
fa_line2chain_unicode.cpp
in blingfiretools/fa_line2chain_unicode
662 9
fa_ts2stat.cpp
in blingfiretools/fa_ts2stat
653 12
FAParser2WRE.cpp
in blingfirecompile.library/src
623 24
FADfa2MinDfa_hg_t.h
in blingfirecompile.library/inc
609 9
FARegexpParser_msyacc.cpp
in blingfirecompile.library/src
576 19
fa_fsm2fsm_pack.cpp
in blingfiretools/fa_fsm2fsm_pack
560 5
gnu
Makefile.gnu
in ldbsrc
553 -
FAPosNfaPack_triv.cpp
in blingfirecompile.library/src
547 27
FATaggedTextStat.cpp
in blingfirecompile.library/src
544 19
FAUtils.cpp
in blingfirecompile.library/src
529 21
FAPrintUtils.cpp
in blingfirecompile.library/src
523 13
FAWRECompiler.cpp
in blingfirecompile.library/src
519 34
fa_line_format.cpp
in blingfiretools/fa_line_format
519 11
FAMorphLDB_t_packaged.h
in blingfireclient.library/inc
512 3
FASuffixInterpretTools_t.h
in blingfireclient.library/inc
484 -
FARegexpTree2Funcs.cpp
in blingfirecompile.library/src
483 19
FAAutIOTools.cpp
in blingfirecompile.library/src
482 10
FAUtf8Utils.cpp
in blingfireclient.library/src
477 11
fa_sortbytes.cpp
in blingfiretools/fa_sortbytes
475 17
cmd
fa_preproc.cmd
in scripts
472 -
FAWRETokens2Dicts.cpp
in blingfirecompile.library/src
463 21
FAWRETokens2Digitizers.cpp
in blingfirecompile.library/src
463 17
fa_fsm_renum.cpp
in blingfiretools/fa_fsm_renum
461 11
FASuffixInterpretToolsConst_t.h
in blingfireclient.library/inc
458 -
FANfa2Dfa_t.h
in blingfirecompile.library/inc
447 14
FAChains2MinDfa_sort.cpp
in blingfirecompile.library/src
441 31
FAStemmerConst_t.h
in blingfireclient.library/inc
432 8
FAStemmer_t.h
in blingfireclient.library/inc
431 8
fa_align.cpp
in blingfiretools/fa_align
426 5
FAEncodeUtils.h
in blingfireclient.library/inc
425 -
fa_re2nfa.cpp
in blingfiretools/fa_re2nfa
419 7
FAWreLexTools_t.h
in blingfireclient.library/inc
416 4
FARegexpTreeSimplify_disj.cpp
in blingfirecompile.library/src
411 9
fa_pats_select.cpp
in blingfiretools/fa_pats_select
400 3
FADict2Classifier.cpp
in blingfirecompile.library/src
398 14
FAWbdConfKeeper.cpp
in blingfireclient.library/src
390 21
FAFsmRenum.cpp
in blingfirecompile.library/src
390 14
Files With Most Units (Top 20)
File# lines# units
FADfaPack_triv.cpp
in blingfirecompile.library/src
952 37
FAWRECompiler.cpp
in blingfirecompile.library/src
519 34
FAChains2MinDfa_sort.cpp
in blingfirecompile.library/src
441 31
FAPosNfaPack_triv.cpp
in blingfirecompile.library/src
547 27
blingfiretokdll.cpp
in blingfiretools/blingfiretokdll
1087 27
FAParser2WRE.cpp
in blingfirecompile.library/src
623 24
tokenization.py
in ldbsrc/bert_base_tok
254 24
tokenization.py
in scripts
254 24
FARegexpTree.cpp
in blingfirecompile.library/src
209 22
FAWbdConfKeeper.cpp
in blingfireclient.library/src
390 21
FAUtils.cpp
in blingfirecompile.library/src
529 21
FAWRETokens2Dicts.cpp
in blingfirecompile.library/src
463 21
FARSDfa_ar_judy.cpp
in blingfirecompile.library/src
229 20
FARSNfa_ar_judy.cpp
in blingfirecompile.library/src
361 20
FAWREConf.cpp
in blingfirecompile.library/src
289 20
FARSDfa_renum.cpp
in blingfirecompile.library/src
164 19
FARSDfa_renum_iws.cpp
in blingfirecompile.library/src
139 19
FARegexpParser_msyacc.cpp
in blingfirecompile.library/src
576 19
FARegexpTree2Funcs.cpp
in blingfirecompile.library/src
483 19
FATaggedTextStat.cpp
in blingfirecompile.library/src
544 19
Files With Long Lines (Top 20)

There are 57 files with lines longer than 120 characters. In total, there are 12447 long lines.

File# lines# units# long lines
cxx
FANormalizeDiacriticsMapPreserve.cxx
in blingfireclient.library/src
4103 - 4096
cxx
FANormalizeDiacriticsMapProd.cxx
in blingfireclient.library/src
4103 - 4096
cxx
FANormalizeDiacriticsMapRemove.cxx
in blingfireclient.library/src
4103 - 4096
blingfiretokdll.cpp
in blingfiretools/blingfiretokdll
1087 27 15
gnu
Makefile.gnu
in ldbsrc
553 - 13
BlingFireUtils.cs
in nuget/lib
172 5 13
__init__.py
in dist-pypi/blingfire
144 17 10
FARegexpParser_msyacc.cpp
in blingfirecompile.library/src
576 19 9
BlingFireUtils2.cs
in nuget/lib
382 17 9
TXT
README.TXT
in ldbsrc/laser100k
47 - 7
TXT
README.TXT
in ldbsrc/bpe_example2
26 - 5
TXT
README.TXT
in ldbsrc/gpt2
39 - 5
TXT
README.TXT
in ldbsrc/roberta
43 - 5
fa_sortbytes.cpp
in blingfiretools/fa_sortbytes
475 17 4
TXT
README.TXT
in ldbsrc/uri250k
28 - 4
fa_line2chain_unicode.cpp
in blingfiretools/fa_line2chain_unicode
662 9 3
cmd
fa_build_dict.cmd
in scripts
196 - 3
cmd
fa_build_suff.cmd
in scripts
207 - 3
FAMapIOTools.h
in blingfirecompile.library/inc
42 - 2
FALexBreaker.cpp
in blingfirecompile.library/src
381 12 2