aws / amazon-sagemaker-examples
Source Code Overview

Analysis scope, overview of main, test, generated, deployment, build, and other code.

Source Code Analysis Scope
Files includes and excluded from analyses
txt
jsonl
in
cfg
properties
ini
org
proto
jq
Overview of Analyzed Files
Basic stats on analyzed files
Intro
For analysis purposes we separate files in scope into several categories: main, test, generated, deployment and build, and other.

  • The main category contains all manually created source code files that are being used in the production.
  • Files in the main category are used as input for other analyses: logical decomposition, concerns, duplication, file size, unit size, and conditional complexity.
  • Test source code files are used only for testing of the product. These files are normally not deployed to production.
  • Build and deployment source code files are used to configure or support build and deployment process.
  • Generated source code files are automatically generated files that have not been manually changed after generation.
  • While a source code folder may contain a number of files, we are primarily interested in the source code files that are being written and maintained by developers.
  • Files containing binaries, documentation, or third-party libraries, for instance, are excluded from analysis. The exception are third-party libraries that have been changed by developers.

main654831 LOC (95%) 1,404 files
test5917 LOC (<1%) 47 files
generated28 LOC (<1%) 1 file
build and deployment2616 LOC (<1%) 117 files
other22809 LOC (3%) 322 files
Main Code
All manually created or maintained source code that defines logic of the product that is run in a production environment.
jsonl
org
cfg
in
proto
jq
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*".
  • 1404 files match defined criteria (654,831 LOC, 100.0% vs. main code):
    • 561 *.ipynb files (575,834 LOC)
    • 772 *.py files (72,158 LOC)
    • 3 *.jsonl files (2,458 LOC)
    • 28 *.yaml files (1,633 LOC)
    • 2 *.org files (1,217 LOC)
    • 1 *.js files (512 LOC)
    • 14 *.java files (466 LOC)
    • 5 *.html files (214 LOC)
    • 6 *.cfg files (177 LOC)
    • 1 *.toml files (80 LOC)
    • 2 *.css files (41 LOC)
    • 6 *.in files (15 LOC)
    • 1 *.proto files (11 LOC)
    • 1 *.c files (8 LOC)
    • 1 *.jq files (7 LOC)
  • " *.ipynb" is biggest, containing 87.94% of LOC.
  • " *.jq" is smallest, containing 0% of LOC.


*.ipynb575834 LOC (87%) 561 files
*.py72158 LOC (11%) 772 files
*.jsonl2458 LOC (<1%) 3 files
*.yaml1633 LOC (<1%) 28 files
*.org1217 LOC (<1%) 2 files
*.js512 LOC (<1%) 1 file
*.java466 LOC (<1%) 14 files
*.html214 LOC (<1%) 5 files
*.cfg177 LOC (<1%) 6 files
*.toml80 LOC (<1%) 1 file
*.css41 LOC (<1%) 2 files
*.in15 LOC (<1%) 6 files
*.proto11 LOC (<1%) 1 file
*.c8 LOC (<1%) 1 file
*.jq7 LOC (<1%) 1 file
Test Code
Used only for testing of the product. Normally not deployed in a production environment.
jsonl
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/test_.*".
    • files with paths like ".*[-]test[-].*".
    • files with paths like ".*/[Tt]est/.*".
    • files with paths like ".*/[Ss]pecs/.*".
    • files with paths like ".*/[Tt]ests/.*".
    • files with paths like ".*_test[.].*".
    • files with paths like ".*/test[.].*".
  • 47 files match defined criteria (5,917 LOC, 0.9% vs. main code):
    • 6 *.jsonl files (3,000 LOC)
    • 32 *.py files (1,510 LOC)
    • 1 *.ipynb files (770 LOC)
    • 7 *.yaml files (635 LOC)
    • 1 *.sh files (2 LOC)
  • " *.jsonl" is biggest, containing 50.7% of LOC.
  • " *.sh" is smallest, containing 0.03% of LOC.


*.jsonl3000 LOC (50%) 6 files
*.py1510 LOC (25%) 32 files
*.ipynb770 LOC (13%) 1 file
*.yaml635 LOC (10%) 7 files
*.sh2 LOC (<1%) 1 file
Generated Code
Automatically generated files, not manually changed after generation.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.](py|java|h|cc|cpp|m|rb|php)" AND any line of content like ".*Generated by the protocol buffer compiler[.][ ]+DO NOT EDIT[!].*".
    • files with any line of content like "\<\!\-\-[ ]*Generated by .*".
  • 1 file matches defined criteria (28 LOC, 0.0% vs. main code). All matches are in *.py files.


*.py28 LOC (100%) 1 file
Build and Deployment Code
Source code used to configure or support build and deployment process.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]git[a-z]+".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]sh".
    • files with paths like ".*/pom[.]xml".
    • files with paths like ".*/build[.]xml".
    • files with paths like ".*[.]bat".
  • 117 files match defined criteria (2,616 LOC, 0.4% vs. main code):
    • 114 *.sh files (2,470 LOC)
    • 2 *.xml files (119 LOC)
    • 1 *.bat files (27 LOC)
  • " *.sh" is biggest, containing 94.42% of LOC.
  • " *.bat" is smallest, containing 1.03% of LOC.


*.sh2470 LOC (94%) 114 files
*.xml119 LOC (4%) 2 files
*.bat27 LOC (1%) 1 file
Other Code
txt
properties
ini
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]json".
    • files with paths like ".*[.]txt".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]md".
    • files with paths like ".*/README[.][a-z0-9]+".
    • files with paths like ".*[.]properties".
    • files with paths like ".*[.](rst|rest|resttxt|rsttxt)".
    • files with paths like ".*/[Dd]emos?/.*".
    • files with paths like ".*[.]ini".
    • files with paths like ".*/[.]dockerignore".
    • files with paths like ".*/LICENSE[.][a-z0-9]+".
  • 322 files match defined criteria (22,809 LOC, 3.5% vs. main code):
    • 133 *.txt files (16,412 LOC)
    • 48 *.json files (2,927 LOC)
    • 115 *.md files (2,632 LOC)
    • 15 *.rst files (496 LOC)
    • 3 *.py files (237 LOC)
    • 5 *.properties files (58 LOC)
    • 3 *.ini files (47 LOC)
  • " *.txt" is biggest, containing 71.95% of LOC.
  • " *.ini" is smallest, containing 0.21% of LOC.


*.txt16412 LOC (71%) 133 files
*.json2927 LOC (12%) 48 files
*.md2632 LOC (11%) 115 files
*.rst496 LOC (2%) 15 files
*.py237 LOC (1%) 3 files
*.properties58 LOC (<1%) 5 files
*.ini47 LOC (<1%) 3 files
Analyzers
Info about analyzers used for source code examinations.


2025-05-03 23:52