apache / datasketches-spark
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
0% | 0% | 15% | 36% | 48%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
scala0% | 0% | 19% | 45% | 35%
py0% | 0% | 0% | 0% | 100%
sbt0% | 0% | 0% | 0% | 100%
toml0% | 0% | 0% | 0% | 100%
in0% | 0% | 0% | 0% | 100%
cfg0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
src0% | 0% | 20% | 47% | 32%
python0% | 0% | 0% | 0% | 100%
ROOT0% | 0% | 0% | 0% | 100%
project0% | 0% | 0% | 0% | 100%
Longest Files (Top 28)
File# lines# units
KllDoublesSketchExpressions.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/expressions
237 17
ThetaSketchAggBuild.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate
122 6
KllDoublesSketchAggMerge.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate
113 10
KllDoublesSketchAggBuild.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate
107 10
ThetaSketchAggUnion.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate
104 6
ThetaExpressions.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/expressions
101 8
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll
100 30
build.sbt
in root
77 -
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta
69 20
kll.py
in python/src/datasketches_spark
68 12
common.py
in python/src/datasketches_spark
57 10
57 3
42 -
CastAsBinary.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
36 4
KllDoublesSketchType.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/types
36 3
setup.py
in python
30 1
DatasketchesFunctionRegistry.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
29 3
ThetaSketchWrapper.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/types
28 4
in
MANIFEST.in
in python
19 -
KllFunctionRegistry.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll
17 -
ThetaSketchType.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/types
15 2
ThetaFunctionRegistry.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta
15 -
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
10 2
DatasketchesScalaFunctionsBase.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
10 1
_version.py
in python/src/datasketches_spark
8 1
ThetaSketchConstants.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta
4 -
__init__.py
in python/src/datasketches_spark
4 -
cfg
1 -
Files With Most Units (Top 20)
File# lines# units
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll
100 30
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta
69 20
KllDoublesSketchExpressions.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/expressions
237 17
kll.py
in python/src/datasketches_spark
68 12
KllDoublesSketchAggBuild.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate
107 10
KllDoublesSketchAggMerge.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate
113 10
common.py
in python/src/datasketches_spark
57 10
ThetaExpressions.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/expressions
101 8
ThetaSketchAggUnion.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate
104 6
ThetaSketchAggBuild.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate
122 6
CastAsBinary.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
36 4
ThetaSketchWrapper.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/types
28 4
DatasketchesFunctionRegistry.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
29 3
KllDoublesSketchType.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/types
36 3
57 3
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
10 2
ThetaSketchType.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/types
15 2
DatasketchesScalaFunctionsBase.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
10 1
_version.py
in python/src/datasketches_spark
8 1
setup.py
in python
30 1
Files With Long Lines (Top 13)

There are 13 files with lines longer than 120 characters. In total, there are 38 long lines.

File# lines# units# long lines
KllDoublesSketchExpressions.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/expressions
237 17 15
ThetaSketchAggBuild.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate
122 6 7
KllDoublesSketchAggBuild.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate
107 10 2
KllDoublesSketchAggMerge.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate
113 10 2
ThetaExpressions.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/expressions
101 8 2
ThetaSketchAggUnion.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate
104 6 2
kll.py
in python/src/datasketches_spark
68 12 2
DatasketchesFunctionRegistry.scala
in src/main/scala/org/apache/spark/sql/datasketches/common
29 3 1
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll
100 30 1
KllFunctionRegistry.scala
in src/main/scala/org/apache/spark/sql/datasketches/kll
17 - 1
functions.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta
69 20 1
ThetaSketchWrapper.scala
in src/main/scala/org/apache/spark/sql/datasketches/theta/types
28 4 1
57 3 1
Correlations

File Size vs. Commits (all time): 28 points

src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesScalaFunctionsBase.scala x: 4 commits (all time) y: 10 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggBuild.scala x: 8 commits (all time) y: 122 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/expressions/ThetaExpressions.scala x: 11 commits (all time) y: 101 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/functions.scala x: 8 commits (all time) y: 69 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/types/ThetaSketchWrapper.scala x: 5 commits (all time) y: 28 lines of code python/src/datasketches_spark/common.py x: 10 commits (all time) y: 57 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/CastAsBinary.scala x: 2 commits (all time) y: 36 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesFunctionRegistry.scala x: 7 commits (all time) y: 29 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/functions.scala x: 5 commits (all time) y: 10 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaFunctionRegistry.scala x: 5 commits (all time) y: 15 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaSketchConstants.scala x: 2 commits (all time) y: 4 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggUnion.scala x: 5 commits (all time) y: 104 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/expressions/KllDoublesSketchExpressions.scala x: 7 commits (all time) y: 237 lines of code build.sbt x: 17 commits (all time) y: 77 lines of code project/BuildUtils.scala x: 5 commits (all time) y: 57 lines of code python/MANIFEST.in x: 4 commits (all time) y: 19 lines of code python/pyproject.toml x: 5 commits (all time) y: 42 lines of code python/setup.py x: 6 commits (all time) y: 30 lines of code python/src/datasketches_spark/__init__.py x: 3 commits (all time) y: 4 lines of code python/src/datasketches_spark/_version.py x: 3 commits (all time) y: 8 lines of code python/src/datasketches_spark/kll.py x: 4 commits (all time) y: 68 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/types/KllDoublesSketchType.scala x: 4 commits (all time) y: 36 lines of code version.cfg x: 3 commits (all time) y: 1 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/KllFunctionRegistry.scala x: 2 commits (all time) y: 17 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate/KllDoublesSketchAggBuild.scala x: 2 commits (all time) y: 107 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate/KllDoublesSketchAggMerge.scala x: 2 commits (all time) y: 113 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/functions.scala x: 2 commits (all time) y: 100 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/types/ThetaSketchType.scala x: 3 commits (all time) y: 15 lines of code
237.0
lines of code
  min: 1.0
  average: 54.14
  25th percentile: 15.0
  median: 36.0
  75th percentile: 94.25
  max: 237.0
0 17.0
commits (all time)
min: 2.0 | average: 5.14 | 25th percentile: 3.0 | median: 4.5 | 75th percentile: 6.75 | max: 17.0

File Size vs. Contributors (all time): 28 points

src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesScalaFunctionsBase.scala x: 3 contributors (all time) y: 10 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggBuild.scala x: 3 contributors (all time) y: 122 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/expressions/ThetaExpressions.scala x: 4 contributors (all time) y: 101 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/functions.scala x: 3 contributors (all time) y: 69 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/types/ThetaSketchWrapper.scala x: 3 contributors (all time) y: 28 lines of code python/src/datasketches_spark/common.py x: 2 contributors (all time) y: 57 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/CastAsBinary.scala x: 2 contributors (all time) y: 36 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesFunctionRegistry.scala x: 2 contributors (all time) y: 29 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/functions.scala x: 2 contributors (all time) y: 10 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaFunctionRegistry.scala x: 2 contributors (all time) y: 15 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaSketchConstants.scala x: 2 contributors (all time) y: 4 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggUnion.scala x: 2 contributors (all time) y: 104 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/expressions/KllDoublesSketchExpressions.scala x: 1 contributors (all time) y: 237 lines of code build.sbt x: 2 contributors (all time) y: 77 lines of code project/BuildUtils.scala x: 1 contributors (all time) y: 57 lines of code python/MANIFEST.in x: 1 contributors (all time) y: 19 lines of code python/pyproject.toml x: 2 contributors (all time) y: 42 lines of code python/setup.py x: 2 contributors (all time) y: 30 lines of code python/src/datasketches_spark/__init__.py x: 1 contributors (all time) y: 4 lines of code python/src/datasketches_spark/_version.py x: 2 contributors (all time) y: 8 lines of code python/src/datasketches_spark/kll.py x: 1 contributors (all time) y: 68 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/types/KllDoublesSketchType.scala x: 1 contributors (all time) y: 36 lines of code version.cfg x: 1 contributors (all time) y: 1 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/KllFunctionRegistry.scala x: 1 contributors (all time) y: 17 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate/KllDoublesSketchAggBuild.scala x: 1 contributors (all time) y: 107 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/aggregate/KllDoublesSketchAggMerge.scala x: 1 contributors (all time) y: 113 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/functions.scala x: 1 contributors (all time) y: 100 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/types/ThetaSketchType.scala x: 1 contributors (all time) y: 15 lines of code
237.0
lines of code
  min: 1.0
  average: 54.14
  25th percentile: 15.0
  median: 36.0
  75th percentile: 94.25
  max: 237.0
0 4.0
contributors (all time)
min: 1.0 | average: 1.79 | 25th percentile: 1.0 | median: 2.0 | 75th percentile: 2.0 | max: 4.0

File Size vs. Commits (30 days): 0 points

No data for "commits (30d)" vs. "lines of code".

File Size vs. Contributors (30 days): 0 points

No data for "contributors (30d)" vs. "lines of code".


File Size vs. Commits (90 days): 23 points

src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesScalaFunctionsBase.scala x: 2 commits (90d) y: 10 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggBuild.scala x: 5 commits (90d) y: 122 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/expressions/ThetaExpressions.scala x: 8 commits (90d) y: 101 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/functions.scala x: 5 commits (90d) y: 69 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/types/ThetaSketchWrapper.scala x: 2 commits (90d) y: 28 lines of code python/src/datasketches_spark/common.py x: 10 commits (90d) y: 57 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/CastAsBinary.scala x: 2 commits (90d) y: 36 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesFunctionRegistry.scala x: 5 commits (90d) y: 29 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/functions.scala x: 5 commits (90d) y: 10 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaFunctionRegistry.scala x: 2 commits (90d) y: 15 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaSketchConstants.scala x: 2 commits (90d) y: 4 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggUnion.scala x: 2 commits (90d) y: 104 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/expressions/KllDoublesSketchExpressions.scala x: 3 commits (90d) y: 237 lines of code build.sbt x: 3 commits (90d) y: 77 lines of code project/BuildUtils.scala x: 5 commits (90d) y: 57 lines of code python/MANIFEST.in x: 4 commits (90d) y: 19 lines of code python/pyproject.toml x: 5 commits (90d) y: 42 lines of code python/setup.py x: 6 commits (90d) y: 30 lines of code python/src/datasketches_spark/__init__.py x: 3 commits (90d) y: 4 lines of code python/src/datasketches_spark/_version.py x: 3 commits (90d) y: 8 lines of code python/src/datasketches_spark/kll.py x: 4 commits (90d) y: 68 lines of code version.cfg x: 3 commits (90d) y: 1 lines of code
237.0
lines of code
  min: 1.0
  average: 50.61
  25th percentile: 10.0
  median: 36.0
  75th percentile: 69.0
  max: 237.0
0 10.0
commits (90d)
min: 2.0 | average: 3.96 | 25th percentile: 2.0 | median: 3.0 | 75th percentile: 5.0 | max: 10.0

File Size vs. Contributors (90 days): 23 points

src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesScalaFunctionsBase.scala x: 2 contributors (90d) y: 10 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggBuild.scala x: 3 contributors (90d) y: 122 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/expressions/ThetaExpressions.scala x: 4 contributors (90d) y: 101 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/functions.scala x: 3 contributors (90d) y: 69 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/types/ThetaSketchWrapper.scala x: 2 contributors (90d) y: 28 lines of code python/src/datasketches_spark/common.py x: 2 contributors (90d) y: 57 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/CastAsBinary.scala x: 2 contributors (90d) y: 36 lines of code src/main/scala/org/apache/spark/sql/datasketches/common/DatasketchesFunctionRegistry.scala x: 2 contributors (90d) y: 29 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaFunctionRegistry.scala x: 2 contributors (90d) y: 15 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/ThetaSketchConstants.scala x: 2 contributors (90d) y: 4 lines of code src/main/scala/org/apache/spark/sql/datasketches/theta/aggregate/ThetaSketchAggUnion.scala x: 2 contributors (90d) y: 104 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/expressions/KllDoublesSketchExpressions.scala x: 1 contributors (90d) y: 237 lines of code build.sbt x: 1 contributors (90d) y: 77 lines of code project/BuildUtils.scala x: 1 contributors (90d) y: 57 lines of code python/MANIFEST.in x: 1 contributors (90d) y: 19 lines of code python/pyproject.toml x: 2 contributors (90d) y: 42 lines of code python/setup.py x: 2 contributors (90d) y: 30 lines of code python/src/datasketches_spark/__init__.py x: 1 contributors (90d) y: 4 lines of code python/src/datasketches_spark/_version.py x: 2 contributors (90d) y: 8 lines of code python/src/datasketches_spark/kll.py x: 1 contributors (90d) y: 68 lines of code src/main/scala/org/apache/spark/sql/datasketches/kll/types/KllDoublesSketchType.scala x: 1 contributors (90d) y: 36 lines of code version.cfg x: 1 contributors (90d) y: 1 lines of code
237.0
lines of code
  min: 1.0
  average: 50.61
  25th percentile: 10.0
  median: 36.0
  75th percentile: 69.0
  max: 237.0
0 4.0
contributors (90d)
min: 1.0 | average: 1.83 | 25th percentile: 1.0 | median: 2.0 | 75th percentile: 2.0 | max: 4.0