apache / hadoop
Source Code Overview

Analysis scope, overview of main, test, generated, deployment, build, and other code.

Source Code Analysis Scope
Files includes and excluded from analyses
txt
hbs
properties
proto
bats
cmd
t
xsl
cfg
vcxproj
gitattributes
watchmanconfig
patch
zone
jshintrc
in
tex
props
policy
tla
filters
bib
Overview of Analyzed Files
Basic stats on analyzed files
Intro
For analysis purposes we separate files in scope into several categories: main, test, generated, deployment and build, and other.

  • The main category contains all manually created source code files that are being used in the production.
  • Files in the main category are used as input for other analyses: logical decomposition, concerns, duplication, file size, unit size, and conditional complexity.
  • Test source code files are used only for testing of the product. These files are normally not deployed to production.
  • Build and deployment source code files are used to configure or support build and deployment process.
  • Generated source code files are automatically generated files that have not been manually changed after generation.
  • While a source code folder may contain a number of files, we are primarily interested in the source code files that are being written and maintained by developers.
  • Files containing binaries, documentation, or third-party libraries, for instance, are excluded from analysis. The exception are third-party libraries that have been changed by developers.

main1132036 LOC (46%) 9,198 files
test972519 LOC (39%) 4,898 files
generated11597 LOC (<1%) 3 files
build and deployment28402 LOC (1%) 173 files
other287151 LOC (11%) 950 files
Main Code
All manually created or maintained source code that defines logic of the product that is run in a production environment.
proto
hbs
vcxproj
tla
xsl
cmd
t
filters
policy
in
props
cfg
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*".
  • 9198 files match defined criteria (1,132,036 LOC, 100.0% vs. main code):
    • 8,005 *.java files (1,001,125 LOC)
    • 115 *.c files (34,407 LOC)
    • 101 *.xml files (23,594 LOC)
    • 131 *.cc files (17,419 LOC)
    • 324 *.js files (14,730 LOC)
    • 207 *.h files (11,458 LOC)
    • 72 *.proto files (10,310 LOC)
    • 91 *.hbs files (5,352 LOC)
    • 43 *.css files (5,236 LOC)
    • 47 *.html files (4,172 LOC)
    • 8 *.scss files (799 LOC)
    • 1 *.yaml files (716 LOC)
    • 3 *.vcxproj files (515 LOC)
    • 8 *.cmake files (473 LOC)
    • 1 *.tla files (405 LOC)
    • 4 *.hh files (268 LOC)
    • 8 *.xsl files (210 LOC)
    • 4 *.cmd files (183 LOC)
    • 3 *.py files (132 LOC)
    • 1 *.cpp files (124 LOC)
    • 11 *.t files (104 LOC)
    • 2 *.ps1 files (84 LOC)
    • 1 *.filters files (72 LOC)
    • 1 *.policy files (57 LOC)
    • 1 *.in files (30 LOC)
    • 3 *.sql files (28 LOC)
    • 1 *.props files (28 LOC)
    • 1 *.cfg files (5 LOC)
  • " *.java" is biggest, containing 88.44% of LOC.
  • " *.cfg" is smallest, containing 0% of LOC.


*.java1001125 LOC (88%) 8,005 files
*.c34407 LOC (3%) 115 files
*.xml23594 LOC (2%) 101 files
*.cc17419 LOC (1%) 131 files
*.js14730 LOC (1%) 324 files
*.h11458 LOC (1%) 207 files
*.proto10310 LOC (<1%) 72 files
*.hbs5352 LOC (<1%) 91 files
*.css5236 LOC (<1%) 43 files
*.html4172 LOC (<1%) 47 files
*.scss799 LOC (<1%) 8 files
*.yaml716 LOC (<1%) 1 file
*.vcxproj515 LOC (<1%) 3 files
*.cmake473 LOC (<1%) 8 files
*.tla405 LOC (<1%) 1 file
*.hh268 LOC (<1%) 4 files
*.xsl210 LOC (<1%) 8 files
*.cmd183 LOC (<1%) 4 files
*.py132 LOC (<1%) 3 files
*.cpp124 LOC (<1%) 1 file
*.t104 LOC (<1%) 11 files
*.ps184 LOC (<1%) 2 files
*.filters72 LOC (<1%) 1 file
*.policy57 LOC (<1%) 1 file
*.in30 LOC (<1%) 1 file
*.sql28 LOC (<1%) 3 files
*.props28 LOC (<1%) 1 file
*.cfg5 LOC (<1%) 1 file
Test Code
Used only for testing of the product. Normally not deployed in a production environment.
bats
proto
cmd
zone
cfg
xsl
t
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/[Tt]est/.*".
    • files with paths like ".*/karma[.]conf[.]js".
    • files with paths like ".*/test_.*".
    • files with paths like ".*/[Tt]ests/.*".
    • files with paths like ".*/mock[a-zA-Z0-9_\- ]+/.*".
    • files with paths like ".*[-]tests/.*".
    • files with paths like ".*_test[.].*".
    • files with paths like ".*/testing[.].*".
    • files with paths like ".*_tests[.].*".
    • files with paths like ".*/test[.].*".
    • files with paths like ".*[-]test[-].*".
  • 4898 files match defined criteria (972,519 LOC, 85.9% vs. main code):
    • 4,353 *.java files (925,658 LOC)
    • 139 *.xml files (21,893 LOC)
    • 96 *.cc files (11,099 LOC)
    • 15 *.c files (5,433 LOC)
    • 171 *.js files (3,848 LOC)
    • 43 *.h files (1,913 LOC)
    • 47 *.bats files (1,757 LOC)
    • 7 *.proto files (417 LOC)
    • 10 *.sh files (209 LOC)
    • 2 *.html files (61 LOC)
    • 2 *.bash files (59 LOC)
    • 2 *.cmd files (37 LOC)
    • 1 *.zone files (35 LOC)
    • 4 *.cfg files (27 LOC)
    • 1 *.xsl files (24 LOC)
    • 1 *.avsc files (21 LOC)
    • 1 *.yaml files (16 LOC)
    • 2 *.t files (11 LOC)
    • 1 *.css files (1 LOC)
  • " *.java" is biggest, containing 95.18% of LOC.
  • " *.css" is smallest, containing 0% of LOC.


*.java925658 LOC (95%) 4,353 files
*.xml21893 LOC (2%) 139 files
*.cc11099 LOC (1%) 96 files
*.c5433 LOC (<1%) 15 files
*.js3848 LOC (<1%) 171 files
*.h1913 LOC (<1%) 43 files
*.bats1757 LOC (<1%) 47 files
*.proto417 LOC (<1%) 7 files
*.sh209 LOC (<1%) 10 files
*.html61 LOC (<1%) 2 files
*.bash59 LOC (<1%) 2 files
*.cmd37 LOC (<1%) 2 files
*.zone35 LOC (<1%) 1 file
*.cfg27 LOC (<1%) 4 files
*.xsl24 LOC (<1%) 1 file
*.avsc21 LOC (<1%) 1 file
*.yaml16 LOC (<1%) 1 file
*.t11 LOC (<1%) 2 files
*.css1 LOC (<1%) 1 file
Generated Code
Automatically generated files, not manually changed after generation.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with any line of content like "//[ ]*Generated by .*".
  • 3 files match defined criteria (11,597 LOC, 1.0% vs. main code). All matches are in *.java files.


*.java11597 LOC (100%) 3 files
Build and Deployment Code
Source code used to configure or support build and deployment process.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/pom[.]xml".
    • files with paths like ".*[.]sh".
    • files with paths like ".*[.]git[a-z]+".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*/package[.]json".
    • files with paths like ".*/docker[-]compose[.]yaml".
    • files with paths like ".*/[.]gitattributes".
  • 173 files match defined criteria (28,402 LOC, 2.5% vs. main code):
    • 112 *.xml files (25,725 LOC)
    • 60 *.sh files (2,648 LOC)
    • 1 *.yaml files (29 LOC)
  • " *.xml" is biggest, containing 90.57% of LOC.
  • " *.yaml" is smallest, containing 0.1% of LOC.


*.xml25725 LOC (90%) 112 files
*.sh2648 LOC (9%) 60 files
*.yaml29 LOC (<1%) 1 file
Other Code
txt
properties
tex
xsl
patch
bib
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]properties".
    • files with paths like ".*[.]txt".
    • files with paths like ".*[.]md".
    • files with paths like ".*/README[.][a-z0-9]+".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]svg".
    • files with paths like ".*[.]json".
    • files with paths like ".*/[Ee]xamples/.*".
    • files with paths like ".*[.]patch".
    • files with paths like ".*/LICENSE[.][a-z0-9]+".
    • files with paths like ".*/checkstyle[.]xml".
    • files with paths like ".*/checkstyle.*".
  • 950 files match defined criteria (287,151 LOC, 25.4% vs. main code):
    • 98 *.json files (144,764 LOC)
    • 514 *.md files (122,880 LOC)
    • 57 *.java files (8,661 LOC)
    • 170 *.txt files (6,677 LOC)
    • 84 *.properties files (2,407 LOC)
    • 8 *.cc files (500 LOC)
    • 1 *.tex files (318 LOC)
    • 5 *.html files (288 LOC)
    • 7 *.xml files (240 LOC)
    • 2 *.c files (146 LOC)
    • 1 *.xsl files (140 LOC)
    • 1 *.patch files (92 LOC)
    • 1 *.bib files (30 LOC)
    • 1 *.svg files (8 LOC)
  • " *.json" is biggest, containing 50.41% of LOC.
  • " *.svg" is smallest, containing 0% of LOC.


*.json144764 LOC (50%) 98 files
*.md122880 LOC (42%) 514 files
*.java8661 LOC (3%) 57 files
*.txt6677 LOC (2%) 170 files
*.properties2407 LOC (<1%) 84 files
*.cc500 LOC (<1%) 8 files
*.tex318 LOC (<1%) 1 file
*.html288 LOC (<1%) 5 files
*.xml240 LOC (<1%) 7 files
*.c146 LOC (<1%) 2 files
*.xsl140 LOC (<1%) 1 file
*.patch92 LOC (<1%) 1 file
*.bib30 LOC (<1%) 1 file
*.svg8 LOC (<1%) 1 file
Analyzers
Info about analyzers used for source code examinations.


2025-05-07 15:48