awslabs / dynamic-training-with-apache-mxnet-on-aws
Source Code Overview

Analysis scope, overview of main, test, generated, deployment, build, and other code.

Source Code Analysis Scope
Files includes and excluded from analyses
jl
st
t
mk
cmake
Rmd
txt
properties
in
plist
i
pyx
proto
cfg
clang-tidy
gitattributes
pyi
htaccess
dockerignore
vcxproj
editorconfig
g4
gitmodules
ini
gv
  • 58 extensions are included in analyses: py, md, h, cc, scala, sh, cu, pm, clj, R, jl, java, st, gitignore, t, xml, mk, cmake, yml, ipynb, cpp, pl, cuh, js, Rmd, txt, hpp, properties, json, m, html, in, svg, PL, ps1, bat, plist, i, pyx, proto, cfg, css, clang-tidy, gitattributes, gradle, pyi, htaccess, dockerignore, vcxproj, editorconfig, yaml, g4, groovy, gitmodules, rst, perl, ini, gv
  • 11 criteria are used to exclude files from analysis:
    • exclude files with path like ".*/bin/.*" (Binaries for distribution) (3 files).
    • exclude files with path like ".*/[.][a-zA-Z0-9_]+.*" (Hidden files and folders) (54 files).
    • exclude files with path like ".*/gradle/wrapper/.*" (Gradle) (1 file).
    • exclude files with path like ".*/(3rd|[Tt]hird)[-_]?[Pp]arty/.*" (Dependencies) (182 files).
    • exclude files with path like ".*/docs/.*" (Documentation) (246 files).
    • exclude files with path like ".*/docs?/_?(build|themes?|templates?|static)/.*" (Sphinx) (0 files).
    • exclude files with path like ".*/git[-]history[.]txt" (Git history) (1 file).
    • exclude files with path like ".*/git[-][a-zA-Z0-9_]+[.]txt" (Git data exports for sokrates analyses) (0 files).
    • exclude files with path like ".*/vignettes/.*" (R packages) (11 files).
    • exclude files with path like ".*[.]txt" (Text files) (5 files).
    • exclude files with path like ".*/sokrates_conventions[.]json" (Sokrates scoping conventions) (1 file).
Overview of Analyzed Files
Basic stats on analyzed files
Intro
For analysis purposes we separate files in scope into several categories: main, test, generated, deployment and build, and other.

  • The main category contains all manually created source code files that are being used in the production.
  • Files in the main category are used as input for other analyses: logical decomposition, concerns, duplication, file size, unit size, and conditional complexity.
  • Test source code files are used only for testing of the product. These files are normally not deployed to production.
  • Build and deployment source code files are used to configure or support build and deployment process.
  • Generated source code files are automatically generated files that have not been manually changed after generation.
  • While a source code folder may contain a number of files, we are primarily interested in the source code files that are being written and maintained by developers.
  • Files containing binaries, documentation, or third-party libraries, for instance, are excluded from analysis. The exception are third-party libraries that have been changed by developers.

main252286 LOC (75%) 1584 files
test55422 LOC (16%) 298 files
generated0 LOC (0%) 0 files
build and deployment3831 LOC (1%) 109 files
other22992 LOC (6%) 347 files
Main Code
All manually created or maintained source code that defines logic of the product that is run in a production environment.
jl
i
proto
cmake
st
gv
pyx
cfg
pyi
g4
in
t
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*".
  • 1584 files match defined criteria (252,286 lines of code, 100.0% vs. main code):
    • 263 *.h files (64,321 lines of code)
    • 553 *.py files (61,794 lines of code)
    • 215 *.cc files (46,972 lines of code)
    • 88 *.pm files (21,541 lines of code)
    • 111 *.cu files (12,663 lines of code)
    • 89 *.scala files (10,337 lines of code)
    • 13 *.cuh files (5,815 lines of code)
    • 46 *.r files (4,759 lines of code)
    • 29 *.jl files (4,183 lines of code)
    • 4 *.i files (3,877 lines of code)
    • 2 *.proto files (3,008 lines of code)
    • 24 *.clj files (2,305 lines of code)
    • 13 *.cpp files (2,217 lines of code)
    • 38 *.java files (1,842 lines of code)
    • 8 *.hpp files (1,609 lines of code)
    • 11 *.cmake files (1,248 lines of code)
    • 41 *.st files (978 lines of code)
    • 1 *.gv files (839 lines of code)
    • 4 *.m files (827 lines of code)
    • 3 *.pyx files (258 lines of code)
    • 5 *.pl files (202 lines of code)
    • 8 *.yml files (189 lines of code)
    • 3 *.cfg files (130 lines of code)
    • 1 *.groovy files (117 lines of code)
    • 1 *.pyi files (85 lines of code)
    • 1 *.yaml files (56 lines of code)
    • 1 *.g4 files (52 lines of code)
    • 2 *.in files (37 lines of code)
    • 5 *.t files (20 lines of code)
    • 1 *.perl files (5 lines of code)
  • " *.h" is biggest, containing 25.5% of code.
  • " *.perl" is smallest, containing 0% of code.


*.h64321 LOC (25%) 263 files
*.py61794 LOC (24%) 553 files
*.cc46972 LOC (18%) 215 files
*.pm21541 LOC (8%) 88 files
*.cu12663 LOC (5%) 111 files
*.scala10337 LOC (4%) 89 files
*.cuh5815 LOC (2%) 13 files
*.r4759 LOC (1%) 46 files
*.jl4183 LOC (1%) 29 files
*.i3877 LOC (1%) 4 files
*.proto3008 LOC (1%) 2 files
*.clj2305 LOC (<1%) 24 files
*.cpp2217 LOC (<1%) 13 files
*.java1842 LOC (<1%) 38 files
*.hpp1609 LOC (<1%) 8 files
*.cmake1248 LOC (<1%) 11 files
*.st978 LOC (<1%) 41 files
*.gv839 LOC (<1%) 1 files
*.m827 LOC (<1%) 4 files
*.pyx258 LOC (<1%) 3 files
*.pl202 LOC (<1%) 5 files
*.yml189 LOC (<1%) 8 files
*.cfg130 LOC (<1%) 3 files
*.groovy117 LOC (<1%) 1 files
*.pyi85 LOC (<1%) 1 files
*.yaml56 LOC (<1%) 1 files
*.g452 LOC (<1%) 1 files
*.in37 LOC (<1%) 2 files
*.t20 LOC (<1%) 5 files
*.perl5 LOC (<1%) 1 files
Test Code
Used only for testing of the product. Normally not deployed in a production environment.
t
jl
mk
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/[Tt]est/.*".
    • files with paths like ".*/test_.*".
    • files with paths like ".*[-]test[-].*".
    • files with paths like ".*_test[.].*".
    • files with paths like ".*/[Tt]ests/.*".
    • files with paths like ".*_tests[.].*".
    • files with any line of content like ".*/simpletest/.*".
  • 298 files match defined criteria (55,422 lines of code, 22.0% vs. main code):
    • 130 *.py files (31,708 lines of code)
    • 30 *.t files (7,204 lines of code)
    • 17 *.cc files (4,808 lines of code)
    • 27 *.scala files (2,990 lines of code)
    • 9 *.h files (2,835 lines of code)
    • 17 *.jl files (2,434 lines of code)
    • 22 *.clj files (1,080 lines of code)
    • 9 *.r files (967 lines of code)
    • 24 *.sh files (833 lines of code)
    • 4 *.java files (150 lines of code)
    • 2 *.m files (137 lines of code)
    • 4 *.ps1 files (108 lines of code)
    • 2 *.cpp files (107 lines of code)
    • 1 *.mk files (61 lines of code)
  • " *.py" is biggest, containing 57.21% of code.
  • " *.mk" is smallest, containing 0.11% of code.


*.py31708 LOC (57%) 130 files
*.t7204 LOC (12%) 30 files
*.cc4808 LOC (8%) 17 files
*.scala2990 LOC (5%) 27 files
*.h2835 LOC (5%) 9 files
*.jl2434 LOC (4%) 17 files
*.clj1080 LOC (1%) 22 files
*.r967 LOC (1%) 9 files
*.sh833 LOC (1%) 24 files
*.java150 LOC (<1%) 4 files
*.m137 LOC (<1%) 2 files
*.ps1108 LOC (<1%) 4 files
*.cpp107 LOC (<1%) 2 files
*.mk61 LOC (<1%) 1 files
Build and Deployment Code
Source code used to configure or support build and deployment process.
mk
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]mk".
    • files with paths like ".*/pom[.]xml".
    • files with paths like ".*/assembly[.]xml".
    • files with paths like ".*[.]sh".
    • files with paths like ".*[.]git[a-z]+".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]bat".
    • files with paths like ".*[.]gradle".
    • files with paths like ".*/[.]gitattributes".
    • files with paths like ".*/[.]gitmodules".
  • 109 files match defined criteria (3,831 lines of code, 1.5% vs. main code):
    • 92 *.sh files (2,780 lines of code)
    • 11 *.mk files (573 lines of code)
    • 4 *.bat files (351 lines of code)
    • 2 *.gradle files (127 lines of code)
  • " *.sh" is biggest, containing 72.57% of code.
  • " *.gradle" is smallest, containing 3.32% of code.


*.sh2780 LOC (72%) 92 files
*.mk573 LOC (14%) 11 files
*.bat351 LOC (9%) 4 files
*.gradle127 LOC (3%) 2 files
Other Code
jl
properties
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]md".
    • files with paths like ".*/README[.][a-z0-9]+".
    • files with paths like ".*/[Ee]xamples/.*".
    • files with paths like ".*[.]properties".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]json".
    • files with paths like ".*[.]txt".
    • files with paths like ".*[.](rst|rest|resttxt|rsttxt)".
    • files with paths like ".*[.]plist".
    • files with paths like ".*[.]editorconfig".
    • files with paths like ".*[.]svg".
    • files with paths like ".*/[.]htaccess".
    • files with paths like ".*[.]ini".
    • files with paths like ".*/[.]dockerignore".
    • files with paths like ".*/[Dd]emos?/.*".
    • files with paths like ".*/LICENSE[.][a-z0-9]+".
    • files with paths like ".*[.](xml|xsd|robot|sql|pgsql|dashboard|profile|ipynb|raml|avsc|al)".
  • 347 files match defined criteria (22,992 lines of code, 9.1% vs. main code):
    • 135 *.md files (6,747 lines of code)
    • 61 *.scala files (5,821 lines of code)
    • 26 *.xml files (2,451 lines of code)
    • 7 *.ipynb files (1,971 lines of code)
    • 14 *.pl files (1,765 lines of code)
    • 33 *.clj files (1,575 lines of code)
    • 6 *.json files (1,260 lines of code)
    • 14 *.jl files (605 lines of code)
    • 35 *.sh files (442 lines of code)
    • 7 *.properties files (155 lines of code)
    • 7 *.r files (153 lines of code)
    • 1 *.rst files (31 lines of code)
    • 1 *.py files (16 lines of code)
  • " *.md" is biggest, containing 29.34% of code.
  • " *.py" is smallest, containing 0.07% of code.


*.md6747 LOC (29%) 135 files
*.scala5821 LOC (25%) 61 files
*.xml2451 LOC (10%) 26 files
*.ipynb1971 LOC (8%) 7 files
*.pl1765 LOC (7%) 14 files
*.clj1575 LOC (6%) 33 files
*.json1260 LOC (5%) 6 files
*.jl605 LOC (2%) 14 files
*.sh442 LOC (1%) 35 files
*.properties155 LOC (<1%) 7 files
*.r153 LOC (<1%) 7 files
*.rst31 LOC (<1%) 1 files
*.py16 LOC (<1%) 1 files
Analyzers
Info about analyzers used for source code examinations.
  • *.h files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.py files are analyzed with PythonAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.cc files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.pm files are analyzed with PerlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.cu files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.scala files are analyzed with ScalaAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.cuh files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.r files are analyzed with RAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • No dependency analysis
  • *.jl files are analyzed with JuliaAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • No dependency analysis
  • *.i files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.proto files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.clj files are analyzed with ClojureLangAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.cpp files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.java files are analyzed with JavaAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis (based on package names)
  • *.hpp files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.cmake files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.st files are analyzed with HtmlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.gv files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.m files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.pyx files are analyzed with PythonAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.pl files are analyzed with PerlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.yml files are analyzed with YamlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.cfg files are analyzed with CfgAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.groovy files are analyzed with GroovyAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.pyi files are analyzed with PythonAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.yaml files are analyzed with YamlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.g4 files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.in files are analyzed with RustAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • No dependency analysis
  • *.t files are analyzed with PerlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.perl files are analyzed with PerlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis


2022-01-31 18:28