apache / datafusion-benchmarks
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
0% | 5% | 9% | 18% | 66%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
py0% | 58% | 0% | 27% | 14%
sql0% | 0% | 8% | 18% | 72%
rs0% | 0% | 100% | 0% | 0%
toml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
tpcds0% | 5% | 8% | 18% | 67%
runners0% | 0% | 62% | 0% | 37%
tpch0% | 0% | 0% | 19% | 80%
scripts0% | 0% | 0% | 100% | 0%
Longest Files (Top 50)
File# lines# units
tpcdsgen.py
in tpcds
556 5
main.rs
in runners/datafusion-rust/src
236 1
q66.sql
in tpcds/queries
217 -
q66.sql
in tpcds/queries-spark
217 -
q14.sql
in tpcds/queries
207 -
q14.sql
in tpcds/queries-spark
207 -
tpchgen.py
in tpch
153 5
q49.sql
in tpcds/queries
126 -
q49.sql
in tpcds/queries-spark
126 -
q5.sql
in tpcds/queries
125 -
q5.sql
in tpcds/queries-spark
125 -
q64.sql
in tpcds/queries
118 -
q64.sql
in tpcds/queries-spark
118 -
q4.sql
in tpcds/queries
113 -
q4.sql
in tpcds/queries-spark
113 -
108 6
q8.sql
in tpcds/queries
105 -
q77.sql
in tpcds/queries
105 -
q8.sql
in tpcds/queries-spark
105 -
q77.sql
in tpcds/queries-spark
105 -
q24.sql
in tpcds/queries
104 -
q23.sql
in tpcds/queries
104 -
q24.sql
in tpcds/queries-spark
104 -
q23.sql
in tpcds/queries-spark
104 -
q80.sql
in tpcds/queries
93 -
q80.sql
in tpcds/queries-spark
93 -
q88.sql
in tpcds/queries
91 -
q88.sql
in tpcds/queries-spark
91 -
q85.sql
in tpcds/queries
81 -
q85.sql
in tpcds/queries-spark
81 -
q11.sql
in tpcds/queries
78 -
q11.sql
in tpcds/queries-spark
78 -
q60.sql
in tpcds/queries
75 -
q60.sql
in tpcds/queries-spark
75 -
tpcbench.py
in runners/datafusion-comet
73 1
q33.sql
in tpcds/queries
72 -
q33.sql
in tpcds/queries-spark
72 -
q75.sql
in tpcds/queries
67 -
q75.sql
in tpcds/queries-spark
67 -
q56.sql
in tpcds/queries
66 -
q56.sql
in tpcds/queries-spark
66 -
q83.sql
in tpcds/queries
64 -
q48.sql
in tpcds/queries
64 -
q83.sql
in tpcds/queries-spark
64 -
q48.sql
in tpcds/queries-spark
64 -
q58.sql
in tpcds/queries
62 -
q58.sql
in tpcds/queries-spark
62 -
tpcbench.py
in runners/datafusion-python
60 1
q74.sql
in tpcds/queries
58 -
q74.sql
in tpcds/queries-spark
58 -
Files With Most Units (Top 6)
File# lines# units
108 6
tpcdsgen.py
in tpcds
556 5
tpchgen.py
in tpch
153 5
tpcbench.py
in runners/datafusion-comet
73 1
tpcbench.py
in runners/datafusion-python
60 1
main.rs
in runners/datafusion-rust/src
236 1
Files With Long Lines (Top 13)

There are 13 files with lines longer than 120 characters. In total, there are 23 long lines.

File# lines# units# long lines
q76.sql
in tpcds/queries
21 - 3
q76.sql
in tpcds/queries-spark
21 - 3
tpchgen.py
in tpch
153 5 3
q49.sql
in tpcds/queries
126 - 2
q14.sql
in tpcds/queries
207 - 2
q49.sql
in tpcds/queries-spark
126 - 2
q14.sql
in tpcds/queries-spark
207 - 2
q11.sql
in tpcds/queries
78 - 1
q89.sql
in tpcds/queries
25 - 1
tpcdsgen.py
in tpcds
556 5 1
q11.sql
in tpcds/queries-spark
78 - 1
q89.sql
in tpcds/queries-spark
25 - 1
108 6 1
Correlations

File Size vs. Commits (all time): 228 points

tpcds/queries-spark/q35.sql x: 2 commits (all time) y: 55 lines of code runners/datafusion-rust/src/main.rs x: 2 commits (all time) y: 236 lines of code tpcds/tpcdsgen.py x: 1 commits (all time) y: 556 lines of code runners/datafusion-comet/tpcbench.py x: 2 commits (all time) y: 73 lines of code scripts/generate-comparison.py x: 2 commits (all time) y: 108 lines of code tpcds/queries-spark/q72_optimized.sql x: 1 commits (all time) y: 26 lines of code runners/datafusion-rust/Cargo.toml x: 1 commits (all time) y: 10 lines of code tpch/tpchgen.py x: 3 commits (all time) y: 153 lines of code runners/datafusion-python/tpcbench.py x: 1 commits (all time) y: 60 lines of code tpcds/queries-spark/q1.sql x: 1 commits (all time) y: 22 lines of code tpcds/queries-spark/q10.sql x: 1 commits (all time) y: 56 lines of code tpcds/queries-spark/q11.sql x: 1 commits (all time) y: 78 lines of code tpcds/queries-spark/q12.sql x: 1 commits (all time) y: 31 lines of code tpcds/queries-spark/q13.sql x: 1 commits (all time) y: 49 lines of code tpcds/queries-spark/q14.sql x: 1 commits (all time) y: 207 lines of code tpcds/queries-spark/q15.sql x: 1 commits (all time) y: 17 lines of code tpcds/queries-spark/q16.sql x: 1 commits (all time) y: 28 lines of code tpcds/queries-spark/q17.sql x: 1 commits (all time) y: 42 lines of code tpcds/queries-spark/q2.sql x: 1 commits (all time) y: 57 lines of code tpcds/queries-spark/q23.sql x: 1 commits (all time) y: 104 lines of code tpcds/queries-spark/q25.sql x: 1 commits (all time) y: 45 lines of code tpcds/queries-spark/q26.sql x: 1 commits (all time) y: 18 lines of code tpcds/queries-spark/q27.sql x: 1 commits (all time) y: 20 lines of code tpcds/queries-spark/q33.sql x: 1 commits (all time) y: 72 lines of code tpcds/queries-spark/q37.sql x: 1 commits (all time) y: 14 lines of code tpcds/queries-spark/q39.sql x: 1 commits (all time) y: 51 lines of code tpcds/queries-spark/q4.sql x: 1 commits (all time) y: 113 lines of code tpcds/queries-spark/q47.sql x: 1 commits (all time) y: 48 lines of code tpcds/queries-spark/q48.sql x: 1 commits (all time) y: 64 lines of code tpcds/queries-spark/q49.sql x: 1 commits (all time) y: 126 lines of code tpcds/queries-spark/q54.sql x: 1 commits (all time) y: 53 lines of code tpcds/queries-spark/q55.sql x: 1 commits (all time) y: 11 lines of code tpcds/queries-spark/q56.sql x: 1 commits (all time) y: 66 lines of code tpcds/queries-spark/q58.sql x: 1 commits (all time) y: 62 lines of code tpcds/queries-spark/q59.sql x: 1 commits (all time) y: 41 lines of code tpcds/queries-spark/q60.sql x: 1 commits (all time) y: 75 lines of code tpcds/queries-spark/q64.sql x: 1 commits (all time) y: 118 lines of code tpcds/queries-spark/q66.sql x: 1 commits (all time) y: 217 lines of code tpcds/queries-spark/q68.sql x: 1 commits (all time) y: 39 lines of code tpcds/queries-spark/q70.sql x: 1 commits (all time) y: 35 lines of code tpcds/queries-spark/q71.sql x: 1 commits (all time) y: 37 lines of code tpcds/queries-spark/q80.sql x: 1 commits (all time) y: 93 lines of code tpcds/queries-spark/q85.sql x: 1 commits (all time) y: 81 lines of code tpcds/queries-spark/q88.sql x: 1 commits (all time) y: 91 lines of code tpcds/queries-spark/q95.sql x: 1 commits (all time) y: 29 lines of code
556.0
lines of code
  min: 9.0
  average: 48.18
  25th percentile: 23.0
  median: 32.0
  75th percentile: 55.0
  max: 556.0
0 3.0
commits (all time)
min: 1.0 | average: 1.03 | 25th percentile: 1.0 | median: 1.0 | 75th percentile: 1.0 | max: 3.0

File Size vs. Contributors (all time): 228 points

tpcds/queries-spark/q35.sql x: 2 contributors (all time) y: 55 lines of code runners/datafusion-rust/src/main.rs x: 2 contributors (all time) y: 236 lines of code tpcds/tpcdsgen.py x: 1 contributors (all time) y: 556 lines of code runners/datafusion-comet/tpcbench.py x: 1 contributors (all time) y: 73 lines of code scripts/generate-comparison.py x: 1 contributors (all time) y: 108 lines of code tpcds/queries-spark/q72_optimized.sql x: 1 contributors (all time) y: 26 lines of code runners/datafusion-rust/Cargo.toml x: 1 contributors (all time) y: 10 lines of code tpch/tpchgen.py x: 1 contributors (all time) y: 153 lines of code runners/datafusion-python/tpcbench.py x: 1 contributors (all time) y: 60 lines of code tpcds/queries-spark/q1.sql x: 1 contributors (all time) y: 22 lines of code tpcds/queries-spark/q10.sql x: 1 contributors (all time) y: 56 lines of code tpcds/queries-spark/q11.sql x: 1 contributors (all time) y: 78 lines of code tpcds/queries-spark/q12.sql x: 1 contributors (all time) y: 31 lines of code tpcds/queries-spark/q13.sql x: 1 contributors (all time) y: 49 lines of code tpcds/queries-spark/q14.sql x: 1 contributors (all time) y: 207 lines of code tpcds/queries-spark/q15.sql x: 1 contributors (all time) y: 17 lines of code tpcds/queries-spark/q16.sql x: 1 contributors (all time) y: 28 lines of code tpcds/queries-spark/q17.sql x: 1 contributors (all time) y: 42 lines of code tpcds/queries-spark/q2.sql x: 1 contributors (all time) y: 57 lines of code tpcds/queries-spark/q23.sql x: 1 contributors (all time) y: 104 lines of code tpcds/queries-spark/q25.sql x: 1 contributors (all time) y: 45 lines of code tpcds/queries-spark/q26.sql x: 1 contributors (all time) y: 18 lines of code tpcds/queries-spark/q27.sql x: 1 contributors (all time) y: 20 lines of code tpcds/queries-spark/q33.sql x: 1 contributors (all time) y: 72 lines of code tpcds/queries-spark/q37.sql x: 1 contributors (all time) y: 14 lines of code tpcds/queries-spark/q39.sql x: 1 contributors (all time) y: 51 lines of code tpcds/queries-spark/q4.sql x: 1 contributors (all time) y: 113 lines of code tpcds/queries-spark/q47.sql x: 1 contributors (all time) y: 48 lines of code tpcds/queries-spark/q48.sql x: 1 contributors (all time) y: 64 lines of code tpcds/queries-spark/q49.sql x: 1 contributors (all time) y: 126 lines of code tpcds/queries-spark/q54.sql x: 1 contributors (all time) y: 53 lines of code tpcds/queries-spark/q55.sql x: 1 contributors (all time) y: 11 lines of code tpcds/queries-spark/q56.sql x: 1 contributors (all time) y: 66 lines of code tpcds/queries-spark/q58.sql x: 1 contributors (all time) y: 62 lines of code tpcds/queries-spark/q59.sql x: 1 contributors (all time) y: 41 lines of code tpcds/queries-spark/q60.sql x: 1 contributors (all time) y: 75 lines of code tpcds/queries-spark/q64.sql x: 1 contributors (all time) y: 118 lines of code tpcds/queries-spark/q66.sql x: 1 contributors (all time) y: 217 lines of code tpcds/queries-spark/q68.sql x: 1 contributors (all time) y: 39 lines of code tpcds/queries-spark/q70.sql x: 1 contributors (all time) y: 35 lines of code tpcds/queries-spark/q71.sql x: 1 contributors (all time) y: 37 lines of code tpcds/queries-spark/q80.sql x: 1 contributors (all time) y: 93 lines of code tpcds/queries-spark/q85.sql x: 1 contributors (all time) y: 81 lines of code tpcds/queries-spark/q88.sql x: 1 contributors (all time) y: 91 lines of code tpcds/queries-spark/q95.sql x: 1 contributors (all time) y: 29 lines of code
556.0
lines of code
  min: 9.0
  average: 48.18
  25th percentile: 23.0
  median: 32.0
  75th percentile: 55.0
  max: 556.0
0 2.0
contributors (all time)
min: 1.0 | average: 1.01 | 25th percentile: 1.0 | median: 1.0 | 75th percentile: 1.0 | max: 2.0

File Size vs. Commits (30 days): 0 points

No data for "commits (30d)" vs. "lines of code".

File Size vs. Contributors (30 days): 0 points

No data for "contributors (30d)" vs. "lines of code".


File Size vs. Commits (90 days): 0 points

No data for "commits (90d)" vs. "lines of code".

File Size vs. Contributors (90 days): 0 points

No data for "contributors (90d)" vs. "lines of code".