huggingface / datablations
File Change Frequency

File change frequency (churn) shows the distribution of file updates (days with at least one commit).

Overview
File Change Frequency Overall
  • There are 22 files with 5,758 lines of code.
    • 0 files changed more than 100 times (0 lines of code)
    • 0 files changed 51-100 times (0 lines of code)
    • 0 files changed 21-50 times (0 lines of code)
    • 0 files changed 6-20 times (0 lines of code)
    • 22 files changed 1-5 times (5,758 lines of code)
0% | 0% | 0% | 0% | 100%
Legend:
101+
51-100
21-50
6-20
1-5

explore: grouped by folders | grouped by update frequency | data
Contributors Count Frequency Overall
  • There are 22 files with 5,758 lines of code.
    • 0 files changed by more than 25 contributors (0 lines of code)
    • 0 files changed by 11-25 contributors (0 lines of code)
    • 0 files changed by 6-10 contributors (0 lines of code)
    • 19 files changed by 2-5 contributors (3,546 lines of code)
    • 3 files changed by 1 contributor (2,212 lines of code)
0% | 0% | 0% | 61% | 38%
Legend:
26+
11-25
6-10
2-5
1

explore: grouped by folders | grouped by contributors count | data
File Change Frequency per File Extension
txt, sh, ipynb, py, md, gitignore, json
File Change Frequency per Extension
The number of recorded file updates
101+
51-100
21-50
6-20
1-5
ipynb0% | 0% | 0% | 0% | 100%
py0% | 0% | 0% | 0% | 100%
File Change Frequency per Logical Decomposition
primary
primary (file change frequency)
The number of recorded file updates
101+
51-100
21-50
6-20
1-5
filtering_notebooks0% | 0% | 0% | 0% | 100%
plotstables0% | 0% | 0% | 0% | 100%
utils0% | 0% | 0% | 0% | 100%
filtering0% | 0% | 0% | 0% | 100%
training0% | 0% | 0% | 0% | 100%
Most Frequently Changed Files (Top 22)

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
hub_sync.py
in utils
153 9 2023-05-23 2023-06-04 5 3 n.muennighoff@gmail.com n.muennighoff@gmail.com
71 4 2022-12-19 2023-01-24 3 2 teven.lescao@gmail.com ola.piktus@gmail.com
save_dataset.py
in filtering/deduplication
9 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
cleandirs.py
in utils
10 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
hub_auth.py
in utils
12 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
download_oscar.py
in filtering/deduplication
15 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
suffix_dedup.py
in filtering/deduplication
22 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_rust_format.py
in filtering/deduplication
22 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
filter_oscar_jsonl.py
in filtering/deduplication
23 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
dedup_oscar.py
in filtering/deduplication
26 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
35 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_dataset_sample.py
in filtering/deduplication
43 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_roots_sample.py
in filtering/deduplication
46 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
46 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
hf_dataset_to_file.py
in filtering/deduplication
71 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
79 3 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
add_dedup_info.py
in filtering/deduplication
123 5 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
mup.py
in training
278 4 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
repetition.ipynb
in plotstables
1376 - 2023-05-24 2023-05-31 2 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
blindspots.ipynb
in filtering_notebooks
2462 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
36 4 2023-06-18 2023-06-18 1 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
filtering.ipynb
in plotstables
800 - 2023-05-24 2023-05-24 1 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
Files With Most Contributors (Top 22)
Based on the number of unique email addresses found in commits.

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
hub_sync.py
in utils
153 9 2023-05-23 2023-06-04 5 3 n.muennighoff@gmail.com n.muennighoff@gmail.com
71 4 2022-12-19 2023-01-24 3 2 teven.lescao@gmail.com ola.piktus@gmail.com
mup.py
in training
278 4 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
hub_auth.py
in utils
12 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
35 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
79 3 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
cleandirs.py
in utils
10 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
46 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_rust_format.py
in filtering/deduplication
22 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
add_dedup_info.py
in filtering/deduplication
123 5 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_roots_sample.py
in filtering/deduplication
46 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
hf_dataset_to_file.py
in filtering/deduplication
71 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
suffix_dedup.py
in filtering/deduplication
22 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_dataset.py
in filtering/deduplication
9 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
download_oscar.py
in filtering/deduplication
15 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
dedup_oscar.py
in filtering/deduplication
26 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
filter_oscar_jsonl.py
in filtering/deduplication
23 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_dataset_sample.py
in filtering/deduplication
43 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
blindspots.ipynb
in filtering_notebooks
2462 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
repetition.ipynb
in plotstables
1376 - 2023-05-24 2023-05-31 2 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
36 4 2023-06-18 2023-06-18 1 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
filtering.ipynb
in plotstables
800 - 2023-05-24 2023-05-24 1 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
Files With Least Contributors (Top 22)
Based on the number of unique email addresses found in commits.

See data for all files...

File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
repetition.ipynb
in plotstables
1376 - 2023-05-24 2023-05-31 2 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
filtering.ipynb
in plotstables
800 - 2023-05-24 2023-05-24 1 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
36 4 2023-06-18 2023-06-18 1 1 n.muennighoff@gmail.com n.muennighoff@gmail.com
blindspots.ipynb
in filtering_notebooks
2462 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
mup.py
in training
278 4 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
add_dedup_info.py
in filtering/deduplication
123 5 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
79 3 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
71 4 2022-12-19 2023-01-24 3 2 teven.lescao@gmail.com ola.piktus@gmail.com
hf_dataset_to_file.py
in filtering/deduplication
71 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
46 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_roots_sample.py
in filtering/deduplication
46 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_dataset_sample.py
in filtering/deduplication
43 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
35 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
dedup_oscar.py
in filtering/deduplication
26 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
filter_oscar_jsonl.py
in filtering/deduplication
23 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_rust_format.py
in filtering/deduplication
22 1 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
suffix_dedup.py
in filtering/deduplication
22 2 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
download_oscar.py
in filtering/deduplication
15 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
hub_auth.py
in utils
12 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
cleandirs.py
in utils
10 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
save_dataset.py
in filtering/deduplication
9 - 2023-05-23 2023-05-24 2 2 n.muennighoff@gmail.com teven.lescao@gmail.com
hub_sync.py
in utils
153 9 2023-05-23 2023-06-04 5 3 n.muennighoff@gmail.com n.muennighoff@gmail.com