microsoft / LightGBM
File Size

The distribution of size of files (measured in lines of code).

Intro
  • File size measurements show the distribution of size of files.
  • Files are classified in four categories based on their size (lines of code): 1-100 (very small files), 101-200 (small files), 201-500 (medium size files), 501-1000 (long files), 1001+(very long files).
  • It is a good practice to keep files small. Long files may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
File Size Overall
  • There are 166 files with 47,114 lines of code.
    • 7 very long files (10,532 lines of code)
    • 20 long files (14,589 lines of code)
    • 41 medium size files (12,811 lines of codeclsfd_ftr_w_mp_ins)
    • 42 small files (6,357 lines of code)
    • 56 very small files (2,825 lines of code)
22% | 30% | 27% | 13% | 5%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: zoomable circles | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
cpp29% | 45% | 17% | 3% | 3%
py59% | 11% | 22% | 5% | 1%
hpp12% | 24% | 34% | 25% | 3%
h15% | 0% | 45% | 27% | 11%
cl0% | 100% | 0% | 0% | 0%
R0% | 33% | 32% | 20% | 13%
cu0% | 83% | 0% | 16% | 0%
vcxproj0% | 0% | 100% | 0% | 0%
filters0% | 0% | 100% | 0% | 0%
i0% | 0% | 42% | 22% | 35%
cmake0% | 0% | 50% | 39% | 10%
in0% | 0% | 0% | 0% | 100%
yml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
python-package/lightgbm67% | 12% | 13% | 6% | 1%
src/io29% | 40% | 25% | 5% | 0%
src98% | 0% | 0% | 0% | 1%
src/treelearner10% | 62% | 11% | 12% | 2%
include/LightGBM17% | 0% | 44% | 24% | 13%
src/boosting0% | 55% | 10% | 21% | 12%
R-package/R0% | 39% | 27% | 18% | 14%
R-package/src0% | 62% | 16% | 13% | 7%
src/objective0% | 40% | 31% | 23% | 5%
src/metric0% | 0% | 70% | 25% | 3%
windows0% | 0% | 100% | 0% | 0%
src/network0% | 0% | 44% | 42% | 13%
src/application0% | 0% | 100% | 0% | 0%
ROOT0% | 0% | 100% | 0% | 0%
python-package0% | 0% | 87% | 0% | 12%
helpers0% | 0% | 90% | 0% | 9%
swig0% | 0% | 38% | 20% | 41%
cmake/modules0% | 0% | 100% | 0% | 0%
cmake0% | 0% | 0% | 79% | 20%
R-package/pkgdown0% | 0% | 0% | 0% | 100%
R-package/inst0% | 0% | 0% | 0% | 100%
Longest Files (Top 50)
File# lines# units
basic.py
in python-package/lightgbm
2381 144
c_api.cpp
in src
2288 117
dataset.cpp
in src/io
1366 26
dataset_loader.cpp
in src/io
1249 10
feature_histogram.hpp
in src/treelearner
1109 36
dask.py
in python-package/lightgbm
1080 35
common.h
in include/LightGBM/utils
1059 74
tree.cpp
in src/io
946 22
monotone_constraints.hpp
in src/treelearner
936 63
gpu_tree_learner.cpp
in src/treelearner
918 20
lightgbm_R.cpp
in R-package/src
889 59
histogram_16_64_256.cu
in src/treelearner/kernels
832 -
config_auto.cpp
in src/io
763 5
cl
histogram256.cl
in src/treelearner/ocl
755 -
cuda_tree_learner.cpp
in src/treelearner
732 20
cl
histogram16.cl
in src/treelearner/ocl
729 -
serial_tree_learner.cpp
in src/treelearner
715 17
gbdt.cpp
in src/boosting
710 25
cl
histogram64.cl
in src/treelearner/ocl
705 -
sklearn.py
in python-package/lightgbm
645 32
bin.cpp
in src/io
645 10
regression_objective.hpp
in src/objective
636 48
lgb.Booster.R
in R-package/R
635 -
json11.cpp
in src/io
633 55
lgb.Dataset.R
in R-package/R
605 3
sparse_bin.hpp
in src/io
591 28
gbdt_model_text.cpp
in src/boosting
569 7
metadata.cpp
in src/io
482 16
dataset.h
in include/LightGBM
479 41
tree.h
in include/LightGBM
471 52
dense_bin.hpp
in src/io
456 25
c_api.h
in include/LightGBM
433 1
feature_group.h
in include/LightGBM
432 12
voting_parallel_tree_learner.cpp
in src/treelearner
413 9
lgb.cv.R
in R-package/R
399 -
train_share_states.cpp
in src/io
388 7
build_r.R
in root
364 -
plotting.py
in python-package/lightgbm
360 9
config.h
in include/LightGBM
351 7
config.cpp
in src/io
347 15
vcxproj
LightGBM.vcxproj
in windows
344 -
setup.py
in python-package
339 11
filters
LightGBM.vcxproj.filters
in windows
330 -
engine.py
in python-package/lightgbm
329 7
linear_tree_learner.cpp
in src/treelearner
326 4
multiclass_metric.hpp
in src/metric
295 16
parser.cpp
in src/io
290 10
multi_val_sparse_bin.hpp
in src/io
285 18
text_reader.h
in include/LightGBM/utils
283 13
binary_metric.hpp
in src/metric
281 19
Files With Most Units (Top 20)
File# lines# units
basic.py
in python-package/lightgbm
2381 144
c_api.cpp
in src
2288 117
common.h
in include/LightGBM/utils
1059 74
monotone_constraints.hpp
in src/treelearner
936 63
lightgbm_R.cpp
in R-package/src
889 59
json11.cpp
in src/io
633 55
tree.h
in include/LightGBM
471 52
regression_objective.hpp
in src/objective
636 48
dataset.h
in include/LightGBM
479 41
feature_histogram.hpp
in src/treelearner
1109 36
dask.py
in python-package/lightgbm
1080 35
regression_metric.hpp
in src/metric
248 33
sklearn.py
in python-package/lightgbm
645 32
sparse_bin.hpp
in src/io
591 28
dataset.cpp
in src/io
1366 26
gbdt.cpp
in src/boosting
710 25
dense_bin.hpp
in src/io
456 25
multiclass_objective.hpp
in src/objective
225 24
tree.cpp
in src/io
946 22
cuda_tree_learner.cpp
in src/treelearner
732 20
Files With Long Lines (Top 20)

There are 60 files with lines longer than 120 characters. In total, there are 506 long lines.

File# lines# units# long lines
cl
histogram16.cl
in src/treelearner/ocl
729 - 94
gpu_tree_learner.cpp
in src/treelearner
918 20 44
cuda_tree_learner.cpp
in src/treelearner
732 20 37
dataset_loader.cpp
in src/io
1249 10 31
cl
histogram64.cl
in src/treelearner/ocl
705 - 25
c_api.cpp
in src
2288 117 21
tree.cpp
in src/io
946 22 16
voting_parallel_tree_learner.cpp
in src/treelearner
413 9 15
dask.py
in python-package/lightgbm
1080 35 14
basic.py
in python-package/lightgbm
2381 144 12
vcxproj
LightGBM.vcxproj
in windows
344 - 12
config.h
in include/LightGBM
351 7 10
gbdt.cpp
in src/boosting
710 25 10
network.h
in include/LightGBM
183 6 9
bin.cpp
in src/io
645 10 9
i
lightgbmlib.i
in swig
243 - 9
gbdt_model_text.cpp
in src/boosting
569 7 8
config.cpp
in src/io
347 15 8
config_auto.cpp
in src/io
763 5 8
serial_tree_learner.cpp
in src/treelearner
715 17 8