apache / hive
Source Code Overview

Analysis scope, overview of main, test, generated, deployment, build, and other code.

Source Code Analysis Scope
Files includes and excluded from analyses
q
txt
properties
pig
proto
orc
m4
spec
mysql
gitattributes
xsl
Overview of Analyzed Files
Basic stats on analyzed files
Intro
For analysis purposes we separate files in scope into several categories: main, test, generated, deployment and build, and other.

  • The main category contains all manually created source code files that are being used in the production.
  • Files in the main category are used as input for other analyses: logical decomposition, concerns, duplication, file size, unit size, and conditional complexity.
  • Test source code files are used only for testing of the product. These files are normally not deployed to production.
  • Build and deployment source code files are used to configure or support build and deployment process.
  • Generated source code files are automatically generated files that have not been manually changed after generation.
  • While a source code folder may contain a number of files, we are primarily interested in the source code files that are being written and maintained by developers.
  • Files containing binaries, documentation, or third-party libraries, for instance, are excluded from analysis. The exception are third-party libraries that have been changed by developers.

main833341 LOC (43%) 6,268 files
test555485 LOC (28%) 7,129 files
generated441620 LOC (22%) 1,784 files
build and deployment23211 LOC (1%) 107 files
other80108 LOC (4%) 659 files
Main Code
All manually created or maintained source code that defines logic of the product that is run in a production environment.
proto
spec
orc
xsl
mysql
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*".
  • 6268 files match defined criteria (833,341 LOC, 100.0% vs. main code):
    • 5,935 *.java files (744,677 LOC)
    • 179 *.sql files (55,881 LOC)
    • 14 *.g files (5,992 LOC)
    • 40 *.py files (5,192 LOC)
    • 40 *.xml files (4,609 LOC)
    • 15 *.php files (4,067 LOC)
    • 4 *.thrift files (3,397 LOC)
    • 5 *.proto files (3,350 LOC)
    • 2 *.g4 files (1,926 LOC)
    • 2 *.cpp files (1,163 LOC)
    • 6 *.js files (1,042 LOC)
    • 1 *.c files (856 LOC)
    • 3 *.jsp files (373 LOC)
    • 3 *.html files (344 LOC)
    • 1 *.spec files (151 LOC)
    • 5 *.css files (127 LOC)
    • 5 *.avsc files (92 LOC)
    • 4 *.orc files (59 LOC)
    • 1 *.xsl files (24 LOC)
    • 1 *.mysql files (12 LOC)
    • 2 *.h files (7 LOC)
  • " *.java" is biggest, containing 89.36% of LOC.
  • " *.h" is smallest, containing 0% of LOC.


*.java744677 LOC (89%) 5,935 files
*.sql55881 LOC (6%) 179 files
*.g5992 LOC (<1%) 14 files
*.py5192 LOC (<1%) 40 files
*.xml4609 LOC (<1%) 40 files
*.php4067 LOC (<1%) 15 files
*.thrift3397 LOC (<1%) 4 files
*.proto3350 LOC (<1%) 5 files
*.g41926 LOC (<1%) 2 files
*.cpp1163 LOC (<1%) 2 files
*.js1042 LOC (<1%) 6 files
*.c856 LOC (<1%) 1 file
*.jsp373 LOC (<1%) 3 files
*.html344 LOC (<1%) 3 files
*.spec151 LOC (<1%) 1 file
*.css127 LOC (<1%) 5 files
*.avsc92 LOC (<1%) 5 files
*.orc59 LOC (<1%) 4 files
*.xsl24 LOC (<1%) 1 file
*.mysql12 LOC (<1%) 1 file
*.h7 LOC (<1%) 2 files
Test Code
Used only for testing of the product. Normally not deployed in a production environment.
q
pig
orc
proto
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/[Tt]est/.*".
    • files with paths like ".*/test_.*".
    • files with paths like ".*_test[.].*".
    • files with paths like ".*/e2e/.*".
    • files with paths like ".*/test[.].*".
    • files with paths like ".*_tests[.].*".
  • 7129 files match defined criteria (555,485 LOC, 66.7% vs. main code):
    • 1,744 *.java files (351,115 LOC)
    • 5,161 *.q files (195,504 LOC)
    • 8 *.pm files (3,192 LOC)
    • 141 *.sql files (2,040 LOC)
    • 31 *.xml files (1,844 LOC)
    • 3 *.pl files (752 LOC)
    • 11 *.sh files (299 LOC)
    • 13 *.pig files (250 LOC)
    • 1 *.orc files (206 LOC)
    • 2 *.proto files (84 LOC)
    • 3 *.thrift files (71 LOC)
    • 3 *.avsc files (56 LOC)
    • 4 *.py files (30 LOC)
    • 1 *.bat files (16 LOC)
    • 2 *.yaml files (14 LOC)
    • 1 *.php files (12 LOC)
  • " *.java" is biggest, containing 63.21% of LOC.
  • " *.php" is smallest, containing 0% of LOC.


*.java351115 LOC (63%) 1,744 files
*.q195504 LOC (35%) 5,161 files
*.pm3192 LOC (<1%) 8 files
*.sql2040 LOC (<1%) 141 files
*.xml1844 LOC (<1%) 31 files
*.pl752 LOC (<1%) 3 files
*.sh299 LOC (<1%) 11 files
*.pig250 LOC (<1%) 13 files
*.orc206 LOC (<1%) 1 file
*.proto84 LOC (<1%) 2 files
*.thrift71 LOC (<1%) 3 files
*.avsc56 LOC (<1%) 3 files
*.py30 LOC (<1%) 4 files
*.bat16 LOC (<1%) 1 file
*.yaml14 LOC (<1%) 2 files
*.php12 LOC (<1%) 1 file
Generated Code
Automatically generated files, not manually changed after generation.
txt
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/src/gen/.*".
    • files with any line of content like "//[ ]*Generated by .*".
    • files with paths like ".*[.](py|java|h|cc|cpp|m|rb|php)" AND any line of content like ".*Generated by the protocol buffer compiler[.][ ]+DO NOT EDIT[!].*".
  • 1784 files match defined criteria (441,620 LOC, 53.0% vs. main code):
    • 483 *.java files (246,576 LOC)
    • 1,096 *.php files (130,475 LOC)
    • 146 *.txt files (24,231 LOC)
    • 25 *.py files (12,429 LOC)
    • 10 *.cpp files (12,342 LOC)
    • 14 *.rb files (8,320 LOC)
    • 10 *.h files (7,247 LOC)
  • " *.java" is biggest, containing 55.83% of LOC.
  • " *.h" is smallest, containing 1.64% of LOC.


*.java246576 LOC (55%) 483 files
*.php130475 LOC (29%) 1,096 files
*.txt24231 LOC (5%) 146 files
*.py12429 LOC (2%) 25 files
*.cpp12342 LOC (2%) 10 files
*.rb8320 LOC (1%) 14 files
*.h7247 LOC (1%) 10 files
Build and Deployment Code
Source code used to configure or support build and deployment process.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/pom[.]xml".
    • files with paths like ".*[.]sh".
    • files with paths like ".*[.]git[a-z]+".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*/build[.]xml".
    • files with paths like ".*/[.]gitattributes".
    • files with paths like ".*/assembly/src[.]xml".
    • files with paths like ".*[.]bat".
  • 107 files match defined criteria (23,211 LOC, 2.8% vs. main code):
    • 76 *.xml files (20,679 LOC)
    • 31 *.sh files (2,532 LOC)
  • " *.xml" is biggest, containing 89.09% of LOC.
  • " *.sh" is smallest, containing 10.91% of LOC.


*.xml20679 LOC (89%) 76 files
*.sh2532 LOC (10%) 31 files
Other Code
txt
properties
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]md".
    • files with paths like ".*/README[.][a-z0-9]+".
    • files with paths like ".*[.]txt".
    • files with paths like ".*[.]properties".
    • files with paths like ".*[.]json".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*/checkstyle[.]xml".
    • files with paths like ".*/checkstyle.*".
  • 659 files match defined criteria (80,108 LOC, 9.6% vs. main code):
    • 566 *.txt files (75,588 LOC)
    • 32 *.properties files (2,230 LOC)
    • 6 *.md files (960 LOC)
    • 10 *.xml files (929 LOC)
    • 45 *.json files (401 LOC)
  • " *.txt" is biggest, containing 94.36% of LOC.
  • " *.json" is smallest, containing 0.5% of LOC.


*.txt75588 LOC (94%) 566 files
*.properties2230 LOC (2%) 32 files
*.md960 LOC (1%) 6 files
*.xml929 LOC (1%) 10 files
*.json401 LOC (<1%) 45 files
Analyzers
Info about analyzers used for source code examinations.


2025-05-07 15:24