microsoft / ADBench
Source Code Overview

Analysis scope, overview of main, test, generated, deployment, build, and other code.

Source Code Analysis Scope
Files includes and excluded from analyses
txt
mat
jl
cxx
csproj
fs
cmake
cshtml
tex
dockerignore
gitattributes
TXT
cls
gitmodules
bib
fsproj
  • 33 extensions are included in analyses: m, txt, h, py, cpp, mat, jl, cs, html, md, cxx, c, ipynb, json, csproj, fs, yml, gitignore, cmake, ps1, bat, cshtml, sh, tex, toml, dockerignore, gitattributes, TXT, xml, cls, gitmodules, bib, fsproj
  • 6 criteria are used to exclude files from analysis:
    • exclude files with path like ".*/[.][a-zA-Z0-9_]+.*" (Hidden files and folders) (11 files).
    • exclude files with path like ".*/docs/.*" (Documentation) (18 files).
    • exclude files with path like ".*/git[-]history[.]txt" (Git history) (1 file).
    • exclude files with path like ".*/git[-][a-zA-Z0-9_]+[.]txt" (Git data exports for sokrates analyses) (0 files).
    • exclude files with path like ".*/sokrates_conventions[.]json" (Sokrates scoping conventions) (1 file).
    • exclude files with path like ".*[.]txt" (Text files) (207 files).
Overview of Analyzed Files
Basic stats on analyzed files
Intro
For analysis purposes we separate files in scope into several categories: main, test, generated, deployment and build, and other.

  • The main category contains all manually created source code files that are being used in the production.
  • Files in the main category are used as input for other analyses: logical decomposition, concerns, duplication, file size, unit size, and conditional complexity.
  • Test source code files are used only for testing of the product. These files are normally not deployed to production.
  • Build and deployment source code files are used to configure or support build and deployment process.
  • Generated source code files are automatically generated files that have not been manually changed after generation.
  • While a source code folder may contain a number of files, we are primarily interested in the source code files that are being written and maintained by developers.
  • Files containing binaries, documentation, or third-party libraries, for instance, are excluded from analysis. The exception are third-party libraries that have been changed by developers.

main118609 LOC (95%) 2191 files
test2509 LOC (2%) 40 files
generated0 LOC (0%) 0 files
build and deployment161 LOC (<1%) 8 files
other3262 LOC (2%) 19 files
Main Code
All manually created or maintained source code that defines logic of the product that is run in a production environment.
jl
mat
cls
cxx
cmake
fs
bib
cshtml
tex
fsproj
txt
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*".
  • 2191 files match defined criteria (118,609 lines of code, 100.0% vs. main code):
    • 1,844 *.m files (82,649 lines of code)
    • 42 *.cpp files (5,974 lines of code)
    • 23 *.html files (5,157 lines of code)
    • 79 *.h files (4,981 lines of code)
    • 67 *.py files (4,874 lines of code)
    • 14 *.c files (4,589 lines of code)
    • 29 *.jl files (2,591 lines of code)
    • 41 *.mat files (1,813 lines of code)
    • 22 *.cs files (1,309 lines of code)
    • 1 *.cls files (1,036 lines of code)
    • 3 *.ps1 files (898 lines of code)
    • 4 *.cxx files (863 lines of code)
    • 4 *.cmake files (515 lines of code)
    • 5 *.fs files (469 lines of code)
    • 1 *.bib files (338 lines of code)
    • 2 *.toml files (222 lines of code)
    • 4 *.yml files (144 lines of code)
    • 3 *.cshtml files (74 lines of code)
    • 1 *.tex files (70 lines of code)
    • 1 *.fsproj files (26 lines of code)
    • 1 *.txt files (17 lines of code)
  • " *.m" is biggest, containing 69.68% of code.
  • " *.txt" is smallest, containing 0.01% of code.


*.m82649 LOC (69%) 1844 files
*.cpp5974 LOC (5%) 42 files
*.html5157 LOC (4%) 23 files
*.h4981 LOC (4%) 79 files
*.py4874 LOC (4%) 67 files
*.c4589 LOC (3%) 14 files
*.jl2591 LOC (2%) 29 files
*.mat1813 LOC (1%) 41 files
*.cs1309 LOC (1%) 22 files
*.cls1036 LOC (<1%) 1 files
*.ps1898 LOC (<1%) 3 files
*.cxx863 LOC (<1%) 4 files
*.cmake515 LOC (<1%) 4 files
*.fs469 LOC (<1%) 5 files
*.bib338 LOC (<1%) 1 files
*.toml222 LOC (<1%) 2 files
*.yml144 LOC (<1%) 4 files
*.cshtml74 LOC (<1%) 3 files
*.tex70 LOC (<1%) 1 files
*.fsproj26 LOC (<1%) 1 files
*.txt17 LOC (<1%) 1 files
Test Code
Used only for testing of the product. Normally not deployed in a production environment.
jl
csproj
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/[Tt]est/.*".
    • files with paths like ".*/test_.*".
    • files with paths like ".*_tests[.].*".
    • files with any line of content like ".*/simpletest/.*".
  • 40 files match defined criteria (2,509 lines of code, 2.1% vs. main code):
    • 12 *.cs files (877 lines of code)
    • 5 *.py files (499 lines of code)
    • 5 *.cpp files (411 lines of code)
    • 7 *.jl files (302 lines of code)
    • 4 *.m files (275 lines of code)
    • 4 *.csproj files (88 lines of code)
    • 3 *.h files (57 lines of code)
  • " *.cs" is biggest, containing 34.95% of code.
  • " *.h" is smallest, containing 2.27% of code.


*.cs877 LOC (34%) 12 files
*.py499 LOC (19%) 5 files
*.cpp411 LOC (16%) 5 files
*.jl302 LOC (12%) 7 files
*.m275 LOC (10%) 4 files
*.csproj88 LOC (3%) 4 files
*.h57 LOC (2%) 3 files
Build and Deployment Code
Source code used to configure or support build and deployment process.
csproj
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]git[a-z]+".
    • files with paths like ".*/[.]gitattributes".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*/[.]gitmodules".
    • files with paths like ".*[.]csproj".
    • files with paths like ".*[.]sh".
    • files with paths like ".*[.]bat".
  • 8 files match defined criteria (161 lines of code, 0.1% vs. main code):
    • 2 *.sh files (117 lines of code)
    • 3 *.csproj files (27 lines of code)
    • 3 *.bat files (17 lines of code)
  • " *.sh" is biggest, containing 72.67% of code.
  • " *.bat" is smallest, containing 10.56% of code.


*.sh117 LOC (72%) 2 files
*.csproj27 LOC (16%) 3 files
*.bat17 LOC (10%) 3 files
Other Code
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*[.]json".
    • files with paths like ".*/[.]dockerignore".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]txt".
    • files with paths like ".*[.]md".
    • files with paths like ".*/README[.][a-z0-9]+".
    • files with paths like ".*[.](xml|xsd|robot|sql|pgsql|dashboard|profile|ipynb|raml|avsc|al)".
  • 19 files match defined criteria (3,262 lines of code, 2.8% vs. main code):
    • 8 *.ipynb files (2,873 lines of code)
    • 1 *.xml files (168 lines of code)
    • 6 *.md files (135 lines of code)
    • 4 *.json files (86 lines of code)
  • " *.ipynb" is biggest, containing 88.07% of code.
  • " *.json" is smallest, containing 2.64% of code.


*.ipynb2873 LOC (88%) 8 files
*.xml168 LOC (5%) 1 files
*.md135 LOC (4%) 6 files
*.json86 LOC (2%) 4 files
Analyzers
Info about analyzers used for source code examinations.
  • *.m files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.cpp files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.html files are analyzed with HtmlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.h files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.py files are analyzed with PythonAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Basic heuristic dependency analysis
  • *.c files are analyzed with CStyleAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • No dependency analysis
  • *.jl files are analyzed with JuliaAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • No dependency analysis
  • *.mat files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.cs files are analyzed with CSharpAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis (based on namespace heuristics)
  • *.cls files are analyzed with VisualBasicAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • No dependency analysis
  • *.ps1 files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.cxx files are analyzed with CppAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.cmake files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.fs files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.bib files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.toml files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.yml files are analyzed with YamlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.cshtml files are analyzed with HtmlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • Unit size analysis
    • Conditional complexity analysis
    • Advanced heuristic dependency analysis
  • *.tex files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.fsproj files are analyzed with XmlAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Advanced code cleaning (empty lines and comments removed for LOC calculations, additional cleaning for duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis
  • *.txt files are analyzed with DefaultLanguageAnalyzer:
    • All basic standard analyses supported (source code overview, duplication, file size, concerns, findings, metrics, controls)
    • Basic code cleaning (empty lines removed for LOC calculations and duplication calculations)
    • No unit size analysis
    • No conditional complexity analysis
    • No dependency analysis


2022-01-30 10:25