huggingface / tokenizers
Source Code Overview

Analysis scope, overview of main, test, generated, deployment, build, and other code.

Source Code Analysis Scope
Files includes and excluded from analyses
mdx
pyi
txt
gitattributes
editorconfig
in
cfg
tpl
prettierignore
Overview of Analyzed Files
Basic stats on analyzed files
Intro
For analysis purposes we separate files in scope into several categories: main, test, generated, deployment and build, and other.

  • The main category contains all manually created source code files that are being used in the production.
  • Files in the main category are used as input for other analyses: logical decomposition, concerns, duplication, file size, unit size, and conditional complexity.
  • Test source code files are used only for testing of the product. These files are normally not deployed to production.
  • Build and deployment source code files are used to configure or support build and deployment process.
  • Generated source code files are automatically generated files that have not been manually changed after generation.
  • While a source code folder may contain a number of files, we are primarily interested in the source code files that are being written and maintained by developers.
  • Files containing binaries, documentation, or third-party libraries, for instance, are excluded from analysis. The exception are third-party libraries that have been changed by developers.

main29051 LOC (56%) 146 files
test4259 LOC (8%) 36 files
generated8789 LOC (17%) 2 files
build and deployment0 LOC (0%) 0 files
other9369 LOC (18%) 64 files
Main Code
All manually created or maintained source code that defines logic of the product that is run in a production environment.
pyi
cfg
in
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*".
  • 146 files match defined criteria (29,051 LOC, 100.0% vs. main code):
    • 108 *.rs files (25,848 LOC)
    • 21 *.py files (1,808 LOC)
    • 7 *.pyi files (629 LOC)
    • 1 *.js files (376 LOC)
    • 5 *.toml files (188 LOC)
    • 1 *.css files (138 LOC)
    • 1 *.cfg files (51 LOC)
    • 1 *.in files (7 LOC)
    • 1 *.ts files (6 LOC)
  • " *.rs" is biggest, containing 88.97% of LOC.
  • " *.ts" is smallest, containing 0.02% of LOC.


*.rs25848 LOC (88%) 108 files
*.py1808 LOC (6%) 21 files
*.pyi629 LOC (2%) 7 files
*.js376 LOC (1%) 1 file
*.toml188 LOC (<1%) 5 files
*.css138 LOC (<1%) 1 file
*.cfg51 LOC (<1%) 1 file
*.in7 LOC (<1%) 1 file
*.ts6 LOC (<1%) 1 file
Test Code
Used only for testing of the product. Normally not deployed in a production environment.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/[Tt]ests/.*".
    • files with paths like ".*/test[.].*".
    • files with paths like ".*/test_.*".
    • files with paths like ".*[.][Tt]est[.].*".
    • files with paths like ".*[.]test[.].*".
    • files with paths like ".*/jest[.][a-zA-Z0-9\.]+".
    • files with paths like ".*/__mock[a-zA-Z0-9_\- ]+/.*".
  • 36 files match defined criteria (4,259 LOC, 14.7% vs. main code):
    • 18 *.py files (2,187 LOC)
    • 9 *.rs files (1,168 LOC)
    • 8 *.ts files (898 LOC)
    • 1 *.js files (6 LOC)
  • " *.py" is biggest, containing 51.35% of LOC.
  • " *.js" is smallest, containing 0.14% of LOC.


*.py2187 LOC (51%) 18 files
*.rs1168 LOC (27%) 9 files
*.ts898 LOC (21%) 8 files
*.js6 LOC (<1%) 1 file
Generated Code
Automatically generated files, not manually changed after generation.
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/package[-]lock[.]json".
    • files with any line of content like "//[ ]*Generated by .*".
  • 2 files match defined criteria (8,789 LOC, 30.3% vs. main code):
    • 1 *.json files (6,701 LOC)
    • 1 *.rs files (2,088 LOC)
  • " *.json" is biggest, containing 76.24% of LOC.
  • " *.rs" is smallest, containing 23.76% of LOC.


*.json6701 LOC (76%) 1 file
*.rs2088 LOC (23%) 1 file
Other Code
txt
tpl
Explore:   circles  |  sunburst
  • The following criteria are used to filter files:
    • files with paths like ".*/[Ee]xamples/.*".
    • files with paths like ".*/[.]gitignore".
    • files with paths like ".*[.]md".
    • files with paths like ".*/README[.][a-z0-9]+".
    • files with paths like ".*[.]json".
    • files with paths like ".*[.]txt".
    • files with paths like ".*/[Dd]ocumentation/.*".
    • files with paths like ".*[.]editorconfig".
    • files with paths like ".*/[.]prettierignore".
    • files with paths like ".*[.](rst|rest|resttxt|rsttxt)".
    • files with paths like ".*[.]mdx".
  • 64 files match defined criteria (9,369 LOC, 32.3% vs. main code):
    • 18 *.json files (7,290 LOC)
    • 22 *.md files (1,161 LOC)
    • 9 *.py files (539 LOC)
    • 2 *.ts files (188 LOC)
    • 5 *.rs files (74 LOC)
    • 3 *.txt files (46 LOC)
    • 1 *.toml files (28 LOC)
    • 1 *.tpl files (16 LOC)
    • 2 *.js files (16 LOC)
    • 1 *.html files (11 LOC)
  • " *.json" is biggest, containing 77.81% of LOC.
  • " *.html" is smallest, containing 0.12% of LOC.


*.json7290 LOC (77%) 18 files
*.md1161 LOC (12%) 22 files
*.py539 LOC (5%) 9 files
*.ts188 LOC (2%) 2 files
*.rs74 LOC (<1%) 5 files
*.txt46 LOC (<1%) 3 files
*.toml28 LOC (<1%) 1 file
*.tpl16 LOC (<1%) 1 file
*.js16 LOC (<1%) 2 files
*.html11 LOC (<1%) 1 file
Analyzers
Info about analyzers used for source code examinations.


2025-06-30 08:57