apache / spark
Source Code Overview

Analysis scope, overview of main, test, generated, deployment, build, and other code.

Source Code Analysis Scope
Files includes and excluded from analyses
q
txt
properties
proto
pyi
cmd
xsd
orc
cfg
ini
in
rmd
gitattributes
Overview of Analyzed Files
Basic stats on analyzed files
Intro
For analysis purposes we separate files in scope into several categories: main, test, generated, deployment and build, and other.

  • The main category contains all manually created source code files that are being used in the production.
  • Files in the main category are used as input for other analyses: logical decomposition, concerns, duplication, file size, unit size, and conditional complexity.
  • Test source code files are used only for testing of the product. These files are normally not deployed to production.
  • Build and deployment source code files are used to configure or support build and deployment process.
  • Generated source code files are automatically generated files that have not been manually changed after generation.
  • While a source code folder may contain a number of files, we are primarily interested in the source code files that are being written and maintained by developers.
  • Files containing binaries, documentation, or third-party libraries, for instance, are excluded from analysis. The exception are third-party libraries that have been changed by developers.

main637115 LOC (36%) 4,067 files
test802204 LOC (46%) 5,497 files
generated6220 LOC (<1%) 2 files
build and deployment11932 LOC (<1%) 88 files
other282810 LOC (16%) 2,484 files
Main Code
All manually created or maintained source code that defines logic of the product that is run in a production environment.
pyi
proto
in
cfg
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*".
  • 4067 files match defined criteria (637,115 LOC, 100.0% vs. main code):
    • 2,885 *.scala files (452,619 LOC)
    • 339 *.py files (95,631 LOC)
    • 757 *.java files (59,800 LOC)
    • 20 *.pyi files (15,193 LOC)
    • 15 *.proto files (4,545 LOC)
    • 20 *.js files (4,305 LOC)
    • 2 *.g4 files (2,729 LOC)
    • 8 *.css files (909 LOC)
    • 4 *.html files (694 LOC)
    • 2 *.xml files (350 LOC)
    • 1 *.bash files (148 LOC)
    • 6 *.yaml files (46 LOC)
    • 1 *.ps1 files (41 LOC)
    • 1 *.toml files (32 LOC)
    • 1 *.in files (28 LOC)
    • 1 *.c files (19 LOC)
    • 1 *.sbt files (14 LOC)
    • 3 *.cfg files (12 LOC)
  • " *.scala" is biggest, containing 71.04% of LOC.
  • " *.cfg" is smallest, containing 0% of LOC.


*.scala452619 LOC (71%) 2,885 files
*.py95631 LOC (15%) 339 files
*.java59800 LOC (9%) 757 files
*.pyi15193 LOC (2%) 20 files
*.proto4545 LOC (<1%) 15 files
*.js4305 LOC (<1%) 20 files
*.g42729 LOC (<1%) 2 files
*.css909 LOC (<1%) 8 files
*.html694 LOC (<1%) 4 files
*.xml350 LOC (<1%) 2 files
*.bash148 LOC (<1%) 1 file
*.yaml46 LOC (<1%) 6 files
*.ps141 LOC (<1%) 1 file
*.toml32 LOC (<1%) 1 file
*.in28 LOC (<1%) 1 file
*.c19 LOC (<1%) 1 file
*.sbt14 LOC (<1%) 1 file
*.cfg12 LOC (<1%) 3 files
Test Code
Used only for testing of the product. Normally not deployed in a production environment.
q
proto
orc
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[-]tests/.*".
    • files with paths like ".*_test[.].*".
    • files with paths like ".*/[Tt]est/.*".
    • files with paths like ".*[-]tests[-].*".
    • files with paths like ".*/test[-]data/.*".
    • files with paths like ".*/test_.*".
    • files with paths like ".*[-]test[-].*".
    • files with paths like ".*_tests[.].*".
    • files with paths like ".*/[Tt]ests/.*".
    • files with paths like ".*/testing[.].*".
    • files with paths like ".*/test[.].*".
    • files with paths like ".*/mock[a-zA-Z0-9_\- ]+/.*".
    • files with paths like ".*/[Mm]ocks/.*".
    • files with paths like ".*[.][Tt]est[.].*".
    • files with paths like ".*[.]test[.].*".
  • 5497 files match defined criteria (802,204 LOC, 125.9% vs. main code):
    • 2,240 *.scala files (585,617 LOC)
    • 788 *.py files (94,141 LOC)
    • 1,560 *.q files (41,435 LOC)
    • 303 *.java files (38,771 LOC)
    • 501 *.sql files (34,879 LOC)
    • 70 *.xml files (5,729 LOC)
    • 9 *.proto files (610 LOC)
    • 7 *.orc files (406 LOC)
    • 9 *.sh files (337 LOC)
    • 3 *.js files (106 LOC)
    • 4 *.avsc files (89 LOC)
    • 2 *.yaml files (49 LOC)
    • 1 *.thrift files (35 LOC)
  • " *.scala" is biggest, containing 73% of LOC.
  • " *.thrift" is smallest, containing 0% of LOC.


*.scala585617 LOC (73%) 2,240 files
*.py94141 LOC (11%) 788 files
*.q41435 LOC (5%) 1,560 files
*.java38771 LOC (4%) 303 files
*.sql34879 LOC (4%) 501 files
*.xml5729 LOC (<1%) 70 files
*.proto610 LOC (<1%) 9 files
*.orc406 LOC (<1%) 7 files
*.sh337 LOC (<1%) 9 files
*.js106 LOC (<1%) 3 files
*.avsc89 LOC (<1%) 4 files
*.yaml49 LOC (<1%) 2 files
*.thrift35 LOC (<1%) 1 file
Generated Code
Automatically generated files, not manually changed after generation.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/package[-]lock[.]json".
  • 2 files match defined criteria (6,220 LOC, 1.0% vs. main code). All matches are in *.json files.


*.json6220 LOC (100%) 2 files
Build and Deployment Code
Source code used to configure or support build and deployment process.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/pom[.]xml".
    • files with paths like ".*[.]sh".
    • files with paths like ".*[.]git[a-z]+".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]bat".
    • files with paths like ".*/package[-]lock[.]json".
    • files with paths like ".*/package[.]json".
    • files with paths like ".*/[.]gitattributes".
  • 88 files match defined criteria (11,932 LOC, 1.9% vs. main code):
    • 45 *.xml files (9,738 LOC)
    • 42 *.sh files (2,162 LOC)
    • 1 *.bat files (32 LOC)
  • " *.xml" is biggest, containing 81.61% of LOC.
  • " *.bat" is smallest, containing 0.27% of LOC.


*.xml9738 LOC (81%) 45 files
*.sh2162 LOC (18%) 42 files
*.bat32 LOC (<1%) 1 file
Other Code
txt
properties
xsd
ini
orc
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/[Ee]xamples/.*".
    • files with paths like ".*[.]txt".
    • files with paths like ".*[.]json".
    • files with paths like ".*[.]md".
    • files with paths like ".*/README[.][a-z0-9]+".
    • files with paths like ".*[.]properties".
    • files with paths like ".*/resources/.*[.]xsd".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.](rst|rest|resttxt|rsttxt)".
    • files with paths like ".*[.]ini".
    • files with paths like ".*/checkstyle[.]xml".
    • files with paths like ".*/checkstyle.*".
  • 2484 files match defined criteria (282,810 LOC, 44.4% vs. main code):
    • 1,097 *.txt files (153,941 LOC)
    • 794 *.json files (105,081 LOC)
    • 215 *.scala files (9,283 LOC)
    • 138 *.java files (6,514 LOC)
    • 149 *.py files (3,754 LOC)
    • 31 *.md files (2,163 LOC)
    • 37 *.properties files (1,283 LOC)
    • 4 *.xml files (330 LOC)
    • 11 *.xsd files (199 LOC)
    • 2 *.ini files (194 LOC)
    • 3 *.sh files (37 LOC)
    • 1 *.orc files (22 LOC)
    • 2 *.avsc files (9 LOC)
  • " *.txt" is biggest, containing 54.43% of LOC.
  • " *.avsc" is smallest, containing 0% of LOC.


*.txt153941 LOC (54%) 1,097 files
*.json105081 LOC (37%) 794 files
*.scala9283 LOC (3%) 215 files
*.java6514 LOC (2%) 138 files
*.py3754 LOC (1%) 149 files
*.md2163 LOC (<1%) 31 files
*.properties1283 LOC (<1%) 37 files
*.xml330 LOC (<1%) 4 files
*.xsd199 LOC (<1%) 11 files
*.ini194 LOC (<1%) 2 files
*.sh37 LOC (<1%) 3 files
*.orc22 LOC (<1%) 1 file
*.avsc9 LOC (<1%) 2 files
Analyzers
Info about analyzers used for source code examinations.


2025-05-07 15:43