GoogleCloudPlatform / data-science-on-gcp
File Size

The distribution of size of files (measured in lines of code).

Intro
Learn more...
File Size Overall
28% | 27% | 9% | 15% | 18%
Legend:
1001+
501-1000
201-500
101-200
1-100


explore: grouped by folders | grouped by size | sunburst | 3D view
File Size per Extension
1001+
501-1000
201-500
101-200
1-100
ipynb51% | 48% | 0% | 0% | 0%
py0% | 0% | 23% | 33% | 42%
sql0% | 0% | 0% | 61% | 38%
yaml0% | 0% | 0% | 0% | 100%
File Size per Logical Decomposition
primary
1001+
501-1000
201-500
101-200
1-100
08_bqml51% | 48% | 0% | 0% | 0%
10_mlops0% | 0% | 95% | 0% | 4%
04_streaming0% | 0% | 0% | 35% | 64%
11_realtime0% | 0% | 0% | 63% | 36%
07_sparkml0% | 0% | 0% | 60% | 39%
03_sqlstudio0% | 0% | 0% | 61% | 38%
06_dataproc0% | 0% | 0% | 0% | 100%
02_ingest0% | 0% | 0% | 0% | 100%
Longest Files (Top 35)
File# lines# units
1515 -
817 -
632 -
252 5
model.py
in 10_mlops
249 6
df06.py
in 04_streaming/transform
163 7
df07.py
in 04_streaming/transform
148 8
make_predictions.py
in 11_realtime
144 4
experiment.py
in 07_sparkml
130 4
flights_transforms.py
in 11_realtime/flightstxf
122 10
create_table.sql
in 03_sqlstudio
113 -
df05.py
in 04_streaming/transform
94 6
create_traindata.py
in 11_realtime
84 2
simulate.py
in 04_streaming/simulate
78 2
logistic.py
in 07_sparkml
75 2
avg02.py
in 04_streaming/realtime
74 3
avg03.py
in 04_streaming/realtime
74 3
df04.py
in 04_streaming/transform
70 4
bayes_on_spark.py
in 06_dataproc
53 1
avg01.py
in 04_streaming/realtime
52 1
df03.py
in 04_streaming/transform
52 3
setup.py
in 11_realtime
42 4
setup.py
in 04_streaming/transform
41 4
create_views.sql
in 03_sqlstudio
29 -
main.py
in 02_ingest/monthlyupdate
25 1
change_ch10_files.py
in 11_realtime
23 -
call_predict.py
in 10_mlops
23 -
df02.py
in 04_streaming/transform
22 1
contingency4.sql
in 03_sqlstudio
18 -
df01.py
in 04_streaming/transform
13 -
autoscale.yaml
in 07_sparkml
11 -
contingency3.sql
in 03_sqlstudio
10 -
contingency2.sql
in 03_sqlstudio
10 -
contingency1.sql
in 03_sqlstudio
4 -
__init__.py
in 11_realtime/flightstxf
1 -
Files With Most Units (Top 21)
File# lines# units
flights_transforms.py
in 11_realtime/flightstxf
122 10
df07.py
in 04_streaming/transform
148 8
df06.py
in 04_streaming/transform
163 7
df05.py
in 04_streaming/transform
94 6
model.py
in 10_mlops
249 6
252 5
df04.py
in 04_streaming/transform
70 4
setup.py
in 04_streaming/transform
41 4
experiment.py
in 07_sparkml
130 4
make_predictions.py
in 11_realtime
144 4
setup.py
in 11_realtime
42 4
avg02.py
in 04_streaming/realtime
74 3
avg03.py
in 04_streaming/realtime
74 3
df03.py
in 04_streaming/transform
52 3
simulate.py
in 04_streaming/simulate
78 2
logistic.py
in 07_sparkml
75 2
create_traindata.py
in 11_realtime
84 2
avg01.py
in 04_streaming/realtime
52 1
df02.py
in 04_streaming/transform
22 1
main.py
in 02_ingest/monthlyupdate
25 1
bayes_on_spark.py
in 06_dataproc
53 1
Files With Long Lines (Top 7)

There are 7 files with lines longer than 120 characters. In total, there are 11 long lines.

File# lines# units# long lines
1515 - 4
simulate.py
in 04_streaming/simulate
78 2 2
make_predictions.py
in 11_realtime
144 4 1
create_traindata.py
in 11_realtime
84 2 1
817 - 1
632 - 1
model.py
in 10_mlops
249 6 1
Correlations

File Size vs. Commits (all time): 35 points

10_mlops/train_on_vertexai.py x: 6 commits (all time) y: 252 lines of code 10_mlops/model.py x: 5 commits (all time) y: 249 lines of code 02_ingest/monthlyupdate/main.py x: 13 commits (all time) y: 25 lines of code 03_sqlstudio/contingency1.sql x: 3 commits (all time) y: 4 lines of code 03_sqlstudio/contingency2.sql x: 3 commits (all time) y: 10 lines of code 03_sqlstudio/contingency4.sql x: 3 commits (all time) y: 18 lines of code 03_sqlstudio/create_table.sql x: 5 commits (all time) y: 113 lines of code 03_sqlstudio/create_views.sql x: 4 commits (all time) y: 29 lines of code 04_streaming/realtime/avg01.py x: 2 commits (all time) y: 52 lines of code 04_streaming/realtime/avg02.py x: 2 commits (all time) y: 74 lines of code 04_streaming/simulate/simulate.py x: 26 commits (all time) y: 78 lines of code 04_streaming/transform/df01.py x: 4 commits (all time) y: 13 lines of code 04_streaming/transform/df02.py x: 4 commits (all time) y: 22 lines of code 04_streaming/transform/df03.py x: 4 commits (all time) y: 52 lines of code 04_streaming/transform/df04.py x: 5 commits (all time) y: 70 lines of code 04_streaming/transform/df05.py x: 7 commits (all time) y: 94 lines of code 04_streaming/transform/df06.py x: 7 commits (all time) y: 163 lines of code 04_streaming/transform/df07.py x: 7 commits (all time) y: 148 lines of code 04_streaming/transform/setup.py x: 4 commits (all time) y: 41 lines of code 07_sparkml/autoscale.yaml x: 2 commits (all time) y: 11 lines of code 07_sparkml/experiment.py x: 16 commits (all time) y: 130 lines of code 07_sparkml/logistic.py x: 7 commits (all time) y: 75 lines of code 08_bqml/bqml_logistic.ipynb x: 3 commits (all time) y: 1515 lines of code 08_bqml/bqml_nonlinear.ipynb x: 4 commits (all time) y: 632 lines of code 08_bqml/bqml_timetxf.ipynb x: 4 commits (all time) y: 817 lines of code 10_mlops/call_predict.py x: 2 commits (all time) y: 23 lines of code 11_realtime/create_traindata.py x: 8 commits (all time) y: 84 lines of code 11_realtime/flightstxf/__init__.py x: 2 commits (all time) y: 1 lines of code 11_realtime/flightstxf/flights_transforms.py x: 2 commits (all time) y: 122 lines of code 11_realtime/make_predictions.py x: 4 commits (all time) y: 144 lines of code 11_realtime/setup.py x: 3 commits (all time) y: 42 lines of code
1515.0
lines of code
  min: 1.0
  average: 150.37
  25th percentile: 23.0
  median: 70.0
  75th percentile: 130.0
  max: 1515.0
0 26.0
commits (all time)
min: 2.0 | average: 5.2 | 25th percentile: 3.0 | median: 4.0 | 75th percentile: 6.0 | max: 26.0

File Size vs. Contributors (all time): 35 points

10_mlops/train_on_vertexai.py x: 4 contributors (all time) y: 252 lines of code 10_mlops/model.py x: 2 contributors (all time) y: 249 lines of code 02_ingest/monthlyupdate/main.py x: 5 contributors (all time) y: 25 lines of code 03_sqlstudio/contingency1.sql x: 2 contributors (all time) y: 4 lines of code 03_sqlstudio/contingency2.sql x: 2 contributors (all time) y: 10 lines of code 03_sqlstudio/contingency4.sql x: 2 contributors (all time) y: 18 lines of code 03_sqlstudio/create_table.sql x: 2 contributors (all time) y: 113 lines of code 03_sqlstudio/create_views.sql x: 2 contributors (all time) y: 29 lines of code 04_streaming/realtime/avg01.py x: 2 contributors (all time) y: 52 lines of code 04_streaming/realtime/avg02.py x: 2 contributors (all time) y: 74 lines of code 04_streaming/simulate/simulate.py x: 8 contributors (all time) y: 78 lines of code 04_streaming/transform/df01.py x: 3 contributors (all time) y: 13 lines of code 04_streaming/transform/df02.py x: 3 contributors (all time) y: 22 lines of code 04_streaming/transform/df03.py x: 3 contributors (all time) y: 52 lines of code 04_streaming/transform/df04.py x: 3 contributors (all time) y: 70 lines of code 04_streaming/transform/df05.py x: 3 contributors (all time) y: 94 lines of code 04_streaming/transform/df06.py x: 3 contributors (all time) y: 163 lines of code 04_streaming/transform/df07.py x: 4 contributors (all time) y: 148 lines of code 04_streaming/transform/setup.py x: 3 contributors (all time) y: 41 lines of code 07_sparkml/experiment.py x: 4 contributors (all time) y: 130 lines of code 07_sparkml/logistic.py x: 3 contributors (all time) y: 75 lines of code 08_bqml/bqml_logistic.ipynb x: 2 contributors (all time) y: 1515 lines of code 08_bqml/bqml_nonlinear.ipynb x: 1 contributors (all time) y: 632 lines of code 08_bqml/bqml_timetxf.ipynb x: 1 contributors (all time) y: 817 lines of code 11_realtime/create_traindata.py x: 4 contributors (all time) y: 84 lines of code 11_realtime/flightstxf/flights_transforms.py x: 2 contributors (all time) y: 122 lines of code 11_realtime/make_predictions.py x: 3 contributors (all time) y: 144 lines of code 11_realtime/setup.py x: 3 contributors (all time) y: 42 lines of code
1515.0
lines of code
  min: 1.0
  average: 150.37
  25th percentile: 23.0
  median: 70.0
  75th percentile: 130.0
  max: 1515.0
0 8.0
contributors (all time)
min: 1.0 | average: 2.74 | 25th percentile: 2.0 | median: 2.0 | 75th percentile: 3.0 | max: 8.0

File Size vs. Commits (30 days): 0 points

No data for "commits (30d)" vs. "lines of code".

File Size vs. Contributors (30 days): 0 points

No data for "contributors (30d)" vs. "lines of code".


File Size vs. Commits (90 days): 0 points

No data for "commits (90d)" vs. "lines of code".

File Size vs. Contributors (90 days): 0 points

No data for "contributors (90d)" vs. "lines of code".