azure / gpt-rag-ingestion
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
28% | 0% | 27% | 29% | 14%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py28% | 0% | 28% | 29% | 13%
yaml0% | 0% | 0% | 0% | 100%
ps10% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
ROOT83% | 0% | 0% | 12% | 4%
connectors0% | 0% | 69% | 23% | 6%
tools0% | 0% | 37% | 56% | 6%
chunking0% | 0% | 25% | 37% | 36%
utils0% | 0% | 0% | 0% | 100%
scripts0% | 0% | 0% | 0% | 100%
Longest Files (Top 31)
File# lines# units
setup.py
in root
1187 6
sharepoint_data_reader.py
in connectors/sharepoint
375 20
305 6
multimodal_chunker.py
in chunking/chunkers
263 13
sharepoint_deleted_files_purger.py
in connectors/sharepoint
216 1
sharepoint_files_indexer.py
in connectors/sharepoint
195 1
172 3
blob.py
in tools
160 9
aoai.py
in tools
158 5
spreadsheet_chunker.py
in chunking/chunkers
149 5
aisearch.py
in tools
141 1
doc_analysis_chunker.py
in chunking/chunkers
135 12
base_chunker.py
in chunking/chunkers
101 7
json_chunker.py
in chunking/chunkers
98 3
langchain_chunker.py
in chunking/chunkers
70 3
transcription_chunker.py
in chunking/chunkers
57 4
55 1
54 3
52 5
47 1
keyvault.py
in tools
44 1
nl2sql_chunker.py
in chunking/chunkers
36 2
azure.yaml
in root
24 -
19 4
__init__.py
in tools
7 -
7 -
postdeploy.ps1
in scripts
5 -
__init__.py
in connectors
4 -
__init__.py
in utils
3 -
__init__.py
in chunking
2 -
exceptions.py
in chunking
2 -
Files With Most Units (Top 23)
File# lines# units
sharepoint_data_reader.py
in connectors/sharepoint
375 20
multimodal_chunker.py
in chunking/chunkers
263 13
doc_analysis_chunker.py
in chunking/chunkers
135 12
blob.py
in tools
160 9
base_chunker.py
in chunking/chunkers
101 7
305 6
setup.py
in root
1187 6
aoai.py
in tools
158 5
52 5
spreadsheet_chunker.py
in chunking/chunkers
149 5
19 4
transcription_chunker.py
in chunking/chunkers
57 4
172 3
langchain_chunker.py
in chunking/chunkers
70 3
json_chunker.py
in chunking/chunkers
98 3
54 3
nl2sql_chunker.py
in chunking/chunkers
36 2
aisearch.py
in tools
141 1
keyvault.py
in tools
44 1
55 1
sharepoint_deleted_files_purger.py
in connectors/sharepoint
216 1
sharepoint_files_indexer.py
in connectors/sharepoint
195 1
47 1
Files With Long Lines (Top 22)

There are 22 files with lines longer than 120 characters. In total, there are 142 long lines.

File# lines# units# long lines
setup.py
in root
1187 6 44
spreadsheet_chunker.py
in chunking/chunkers
149 5 16
305 6 12
multimodal_chunker.py
in chunking/chunkers
263 13 11
aoai.py
in tools
158 5 10
sharepoint_deleted_files_purger.py
in connectors/sharepoint
216 1 10
sharepoint_files_indexer.py
in connectors/sharepoint
195 1 10
172 3 5
sharepoint_data_reader.py
in connectors/sharepoint
375 20 3
langchain_chunker.py
in chunking/chunkers
70 3 3
transcription_chunker.py
in chunking/chunkers
57 4 3
blob.py
in tools
160 9 2
aisearch.py
in tools
141 1 2
keyvault.py
in tools
44 1 2
doc_analysis_chunker.py
in chunking/chunkers
135 12 2
postdeploy.ps1
in scripts
5 - 1
7 - 1
52 5 1
nl2sql_chunker.py
in chunking/chunkers
36 2 1
base_chunker.py
in chunking/chunkers
101 7 1
json_chunker.py
in chunking/chunkers
98 3 1
54 3 1
Correlations

File Size vs. Commits (all time): 31 points

chunking/chunkers/base_chunker.py x: 38 commits (all time) y: 101 lines of code chunking/chunkers/multimodal_chunker.py x: 10 commits (all time) y: 263 lines of code utils/file_utils.py x: 18 commits (all time) y: 19 lines of code function_app.py x: 69 commits (all time) y: 172 lines of code setup.py x: 117 commits (all time) y: 1187 lines of code connectors/sharepoint/sharepoint_data_reader.py x: 8 commits (all time) y: 375 lines of code connectors/sharepoint/sharepoint_files_indexer.py x: 20 commits (all time) y: 195 lines of code tools/doc_intelligence.py x: 26 commits (all time) y: 305 lines of code chunking/chunker_factory.py x: 45 commits (all time) y: 54 lines of code chunking/chunkers/json_chunker.py x: 3 commits (all time) y: 98 lines of code chunking/chunkers/langchain_chunker.py x: 26 commits (all time) y: 70 lines of code chunking/document_chunking.py x: 21 commits (all time) y: 52 lines of code connectors/sharepoint/sharepoint_deleted_files_purger.py x: 8 commits (all time) y: 216 lines of code run_sharepoint.py x: 8 commits (all time) y: 47 lines of code utils/__init__.py x: 7 commits (all time) y: 3 lines of code chunking/chunkers/doc_analysis_chunker.py x: 23 commits (all time) y: 135 lines of code chunking/chunkers/nl2sql_chunker.py x: 14 commits (all time) y: 36 lines of code chunking/chunkers/transcription_chunker.py x: 23 commits (all time) y: 57 lines of code connectors/__init__.py x: 8 commits (all time) y: 4 lines of code connectors/images_deleted_files_purger.py x: 3 commits (all time) y: 55 lines of code tools/__init__.py x: 12 commits (all time) y: 7 lines of code tools/aisearch.py x: 8 commits (all time) y: 141 lines of code tools/aoai.py x: 24 commits (all time) y: 158 lines of code tools/blob.py x: 24 commits (all time) y: 160 lines of code azure.yaml x: 9 commits (all time) y: 24 lines of code chunking/chunkers/spreadsheet_chunker.py x: 37 commits (all time) y: 149 lines of code chunking/exceptions.py x: 5 commits (all time) y: 2 lines of code tools/keyvault.py x: 6 commits (all time) y: 44 lines of code scripts/postdeploy.ps1 x: 8 commits (all time) y: 5 lines of code scripts/preprovision.ps1 x: 2 commits (all time) y: 7 lines of code
1187.0
lines of code
  min: 2.0
  average: 133.65
  25th percentile: 19.0
  median: 57.0
  75th percentile: 160.0
  max: 1187.0
0 117.0
commits (all time)
min: 2.0 | average: 20.58 | 25th percentile: 8.0 | median: 12.0 | 75th percentile: 24.0 | max: 117.0

File Size vs. Contributors (all time): 31 points

chunking/chunkers/base_chunker.py x: 2 contributors (all time) y: 101 lines of code chunking/chunkers/multimodal_chunker.py x: 2 contributors (all time) y: 263 lines of code utils/file_utils.py x: 3 contributors (all time) y: 19 lines of code function_app.py x: 5 contributors (all time) y: 172 lines of code setup.py x: 4 contributors (all time) y: 1187 lines of code connectors/sharepoint/sharepoint_data_reader.py x: 2 contributors (all time) y: 375 lines of code connectors/sharepoint/sharepoint_files_indexer.py x: 2 contributors (all time) y: 195 lines of code tools/doc_intelligence.py x: 2 contributors (all time) y: 305 lines of code chunking/chunker_factory.py x: 2 contributors (all time) y: 54 lines of code chunking/chunkers/json_chunker.py x: 1 contributors (all time) y: 98 lines of code chunking/chunkers/langchain_chunker.py x: 2 contributors (all time) y: 70 lines of code connectors/sharepoint/sharepoint_deleted_files_purger.py x: 2 contributors (all time) y: 216 lines of code run_sharepoint.py x: 2 contributors (all time) y: 47 lines of code utils/__init__.py x: 2 contributors (all time) y: 3 lines of code chunking/chunkers/doc_analysis_chunker.py x: 2 contributors (all time) y: 135 lines of code chunking/chunkers/nl2sql_chunker.py x: 2 contributors (all time) y: 36 lines of code chunking/chunkers/transcription_chunker.py x: 2 contributors (all time) y: 57 lines of code tools/__init__.py x: 2 contributors (all time) y: 7 lines of code tools/aisearch.py x: 2 contributors (all time) y: 141 lines of code tools/aoai.py x: 2 contributors (all time) y: 158 lines of code tools/blob.py x: 2 contributors (all time) y: 160 lines of code azure.yaml x: 3 contributors (all time) y: 24 lines of code chunking/chunkers/spreadsheet_chunker.py x: 2 contributors (all time) y: 149 lines of code tools/keyvault.py x: 2 contributors (all time) y: 44 lines of code scripts/postdeploy.ps1 x: 3 contributors (all time) y: 5 lines of code
1187.0
lines of code
  min: 2.0
  average: 133.65
  25th percentile: 19.0
  median: 57.0
  75th percentile: 160.0
  max: 1187.0
0 5.0
contributors (all time)
min: 1.0 | average: 2.23 | 25th percentile: 2.0 | median: 2.0 | 75th percentile: 2.0 | max: 5.0

File Size vs. Commits (30 days): 0 points

No data for "commits (30d)" vs. "lines of code".

File Size vs. Contributors (30 days): 0 points

No data for "contributors (30d)" vs. "lines of code".


File Size vs. Commits (90 days): 18 points

chunking/chunkers/base_chunker.py x: 8 commits (90d) y: 101 lines of code chunking/chunkers/multimodal_chunker.py x: 4 commits (90d) y: 263 lines of code utils/file_utils.py x: 8 commits (90d) y: 19 lines of code function_app.py x: 4 commits (90d) y: 172 lines of code setup.py x: 14 commits (90d) y: 1187 lines of code connectors/sharepoint/sharepoint_data_reader.py x: 3 commits (90d) y: 375 lines of code connectors/sharepoint/sharepoint_files_indexer.py x: 8 commits (90d) y: 195 lines of code tools/doc_intelligence.py x: 3 commits (90d) y: 305 lines of code chunking/chunker_factory.py x: 8 commits (90d) y: 54 lines of code chunking/chunkers/json_chunker.py x: 3 commits (90d) y: 98 lines of code chunking/chunkers/langchain_chunker.py x: 5 commits (90d) y: 70 lines of code chunking/document_chunking.py x: 3 commits (90d) y: 52 lines of code connectors/sharepoint/sharepoint_deleted_files_purger.py x: 3 commits (90d) y: 216 lines of code run_sharepoint.py x: 3 commits (90d) y: 47 lines of code utils/__init__.py x: 3 commits (90d) y: 3 lines of code chunking/chunkers/doc_analysis_chunker.py x: 2 commits (90d) y: 135 lines of code chunking/chunkers/nl2sql_chunker.py x: 2 commits (90d) y: 36 lines of code chunking/chunkers/transcription_chunker.py x: 2 commits (90d) y: 57 lines of code
1187.0
lines of code
  min: 3.0
  average: 188.06
  25th percentile: 50.75
  median: 99.5
  75th percentile: 227.75
  max: 1187.0
0 14.0
commits (90d)
min: 2.0 | average: 4.78 | 25th percentile: 3.0 | median: 3.0 | 75th percentile: 8.0 | max: 14.0

File Size vs. Contributors (90 days): 18 points

chunking/chunkers/base_chunker.py x: 1 contributors (90d) y: 101 lines of code chunking/chunkers/multimodal_chunker.py x: 1 contributors (90d) y: 263 lines of code utils/file_utils.py x: 1 contributors (90d) y: 19 lines of code function_app.py x: 1 contributors (90d) y: 172 lines of code setup.py x: 1 contributors (90d) y: 1187 lines of code connectors/sharepoint/sharepoint_data_reader.py x: 1 contributors (90d) y: 375 lines of code connectors/sharepoint/sharepoint_files_indexer.py x: 1 contributors (90d) y: 195 lines of code tools/doc_intelligence.py x: 1 contributors (90d) y: 305 lines of code chunking/chunker_factory.py x: 1 contributors (90d) y: 54 lines of code chunking/chunkers/json_chunker.py x: 1 contributors (90d) y: 98 lines of code chunking/chunkers/langchain_chunker.py x: 1 contributors (90d) y: 70 lines of code connectors/sharepoint/sharepoint_deleted_files_purger.py x: 1 contributors (90d) y: 216 lines of code run_sharepoint.py x: 1 contributors (90d) y: 47 lines of code utils/__init__.py x: 1 contributors (90d) y: 3 lines of code chunking/chunkers/doc_analysis_chunker.py x: 1 contributors (90d) y: 135 lines of code chunking/chunkers/nl2sql_chunker.py x: 1 contributors (90d) y: 36 lines of code chunking/chunkers/transcription_chunker.py x: 1 contributors (90d) y: 57 lines of code
1187.0
lines of code
  min: 3.0
  average: 188.06
  25th percentile: 50.75
  median: 99.5
  75th percentile: 227.75
  max: 1187.0
0 1.0
contributors (90d)
min: 1.0 | average: 1.0 | 25th percentile: 1.0 | median: 1.0 | 75th percentile: 1.0 | max: 1.0