microsoft / LightGBM
Unit Size

The distribution of size of units (measured in lines of code).

Intro
  • Unit size measurements show the distribution of size of units of code (methods, functions...).
  • Units are classified in four categories based on their size (lines of code): 1-20 (small units), 20-50 (medium size units), 51-100 (long units), 101+ (very long units).
  • You should aim at keeping units small (< 20 lines). Long units may become "bloaters", code that have increased to such gargantuan proportions that they are hard to work with.
Learn more...
Unit Size Overall
  • There are 1,732 units with 27,602 lines of code in units (58.6% of code).
    • 28 very long units (4,114 lines of code)
    • 77 long units (5,404 lines of code)
    • 275 medium size units (8,358 lines of code)
    • 347 small units (5,166 lines of code)
    • 1,005 very small units (4,560 lines of code)
14% | 19% | 30% | 18% | 16%
Legend:
101+
51-100
21-50
11-20
1-10
Unit Size per Extension
101+
51-100
21-50
11-20
1-10
cpp26% | 22% | 27% | 15% | 8%
hpp5% | 19% | 33% | 18% | 22%
py3% | 22% | 28% | 23% | 21%
h0% | 5% | 34% | 29% | 30%
R0% | 26% | 61% | 9% | 3%
Unit Size per Logical Component
primary logical decomposition
101+
51-100
21-50
11-20
1-10
src/io32% | 19% | 26% | 12% | 9%
src/treelearner24% | 25% | 27% | 13% | 9%
src/boosting16% | 25% | 29% | 16% | 11%
src5% | 26% | 25% | 22% | 19%
python-package/lightgbm3% | 24% | 29% | 21% | 21%
src/metric0% | 22% | 41% | 14% | 20%
src/network0% | 16% | 31% | 31% | 20%
src/application0% | 50% | 33% | 3% | 12%
include/LightGBM0% | 4% | 33% | 28% | 34%
src/objective0% | 6% | 37% | 32% | 22%
R-package/R0% | 26% | 61% | 9% | 3%
R-package/src0% | 0% | 48% | 25% | 25%
python-package0% | 0% | 19% | 61% | 18%
helpers0% | 0% | 0% | 50% | 50%
swig0% | 0% | 0% | 0% | 100%
Alternative Visuals
Longest Units
Top 20 longest units
Unit# linesMcCabe index# params
void GPUTreeLearner::AllocateGPUMemory()
in src/treelearner/gpu_tree_learner.cpp
272 54 0
void FindBestThresholdCategoricalInner()
in src/treelearner/feature_histogram.hpp
214 37 6
void Config::GetMembersFromString()
in src/io/config_auto.cpp
203 10 1
void LinearTreeLearner::CalculateLinear()
in src/treelearner/linear_tree_learner.cpp
192 47 5
void FindBestThresholdSequentially()
in src/treelearner/feature_histogram.hpp
191 40 8
bool GBDT::LoadModelFromString()
in src/boosting/gbdt_model_text.cpp
187 41 2
void BinMapper::FindBin()
in src/io/bin.cpp
173 43 11
170 35 5
static std::unordered_map aliases()
in src/io/config_auto.cpp
169 1 335
void Dataset::AddFeaturesFrom()
in src/io/dataset.cpp
149 29 1
void MultiValBinWrapper::CopyMultiValBinSubset()
in src/io/train_share_states.cpp
145 27 5
void DatasetLoader::SetHeader()
in src/io/dataset_loader.cpp
142 37 1
138 1 0
void TrainingShareStates::CalcBinOffsets()
in src/io/train_share_states.cpp
135 31 3
std::string GBDT::ModelToIfElse()
in src/boosting/gbdt_model_text.cpp
134 12 1
static std::unordered_set params()
in src/io/config_auto.cpp
134 1 133
void CUDATreeLearner::ConstructHistograms()
in src/treelearner/cuda_tree_learner.cpp
131 24 2
void Metadata::CheckOrPartition()
in src/io/metadata.cpp
130 39 2
void VotingParallelTreeLearner::FindBestSplits()
in src/treelearner/voting_parallel_tree_learner.cpp
128 27 1
114 45 2