huggingface / datasets
File Age & Freshness

File age measurements show the distribution of file ages (days since the first commit) and the file freshness (days since the latest commit).

Summary
File Change History Overall
File Age Distribution Overall
Days since first update
  • There are 127 files with 20,388 lines of code in files.
    • 118 files that are 366+ days old (19,965 lines of code)
    • 5 files that are 181-365 days old (255 lines of code)
    • 3 files that are 91-180 days old (155 lines of code)
    • 0 files that are 31-90 days old (0 lines of code)
    • 1 files that are 1-30 days old (13 lines of code)
97% | 1% | <1% | 0% | <1%
Legend:
366+
181-365
91-180
31-90
1-30

explore: grouped by folders | grouped by age
File Freshness Distribution Overall
Days since last update
  • There are 127 files with 20,388 lines of code in files.
    • 50 files have been last changed 366+ days ago (1,952 lines of code)
    • 9 files have been last changed 181-365 days ago (452 lines of code)
    • 30 files have been last changed 91-180 days ago (3,616 lines of code)
    • 5 files have been last changed 31-90 days ago (1,094 lines of code)
    • 33 files have been last changed 1-30 days ago (13,274 lines of code)
9% | 2% | 17% | 5% | 65%
Legend:
366+
181-365
91-180
31-90
1-30

explore: grouped by folders | grouped by freshness
File Change History per File Extension
py, mdx, yaml, json, md, gitignore, toml, txt, sh
File Age Distribution per Extension
Days since first update
366+
181-365
91-180
31-90
1-30
py97% | 1% | <1% | 0% | <1%
yaml100% | 0% | 0% | 0% | 0%
toml100% | 0% | 0% | 0% | 0%
File Freshness Distribution per Extension
Days since last update
366+
181-365
91-180
31-90
1-30
py8% | 2% | 17% | 5% | 65%
yaml100% | 0% | 0% | 0% | 0%
toml100% | 0% | 0% | 0% | 0%
File Change History per Logical Decomposition
primary
primary (file age distribution)
Days since first update
366+
181-365
91-180
31-90
1-30
src97% | 1% | <1% | 0% | <1%
benchmarks100% | 0% | 0% | 0% | 0%
ROOT100% | 0% | 0% | 0% | 0%
utils100% | 0% | 0% | 0% | 0%
primary (file freshness distribution)
Days since last update
366+
181-365
91-180
31-90
1-30
src7% | 2% | 18% | 5% | 66%
benchmarks100% | 0% | 0% | 0% | 0%
utils100% | 0% | 0% | 0% | 0%
ROOT14% | 0% | 0% | 0% | 85%
Oldest Files (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
setup.py
in root
131 - 2020-04-14 2025-06-19 278 57 thomwolf@users.noreply.gith... 49127578+tytodd@users.norep...
108 8 2020-08-27 2021-06-14 4 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
79 5 2020-08-27 2021-06-14 3 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
utils.py
in benchmarks
47 3 2020-08-27 2022-03-11 5 4 thomwolf@users.noreply.gith... mariosasko777@gmail.com
41 6 2020-08-27 2021-06-14 4 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
format.py
in benchmarks
34 1 2020-08-27 2023-02-14 6 4 thomwolf@users.noreply.gith... mariosasko777@gmail.com
42 3 2020-08-28 2021-06-14 3 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
builder.py
in src/datasets
1185 50 2020-09-10 2025-06-25 144 33 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
dataset_dict.py
in src/datasets
1075 58 2020-09-10 2025-06-25 134 42 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
load.py
in src/datasets
952 26 2020-09-10 2025-06-25 181 46 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
file_utils.py
in src/datasets/utils
807 69 2020-09-10 2025-06-09 105 26 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
arrow_writer.py
in src/datasets
469 22 2020-09-10 2025-04-28 72 26 thomwolf@users.noreply.gith... 35225576+afuetterer@users.n...
py_utils.py
in src/datasets/utils
402 26 2020-09-10 2025-06-09 75 19 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
search.py
in src/datasets
393 33 2020-09-10 2025-06-25 41 25 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
arrow_reader.py
in src/datasets
309 26 2020-09-10 2025-03-28 34 16 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
splits.py
in src/datasets
260 42 2020-09-10 2025-03-05 30 13 thomwolf@users.noreply.gith... cyyever@outlook.com
fingerprint.py
in src/datasets
258 22 2020-09-10 2025-03-05 31 15 thomwolf@users.noreply.gith... cyyever@outlook.com
info.py
in src/datasets
254 17 2020-09-10 2025-06-25 42 16 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
inspect.py
in src/datasets
149 5 2020-09-10 2025-06-09 45 14 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
logging.py
in src/datasets/utils
69 14 2020-09-10 2025-06-09 20 7 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
info_utils.py
in src/datasets/utils
67 4 2020-09-10 2024-06-10 9 5 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
version.py
in src/datasets/utils
52 11 2020-09-10 2022-12-09 11 8 thomwolf@users.noreply.gith... 59462357+stevhliu@users.nor...
naming.py
in src/datasets
47 6 2020-09-10 2024-03-01 11 8 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets
32 - 2020-09-10 2025-06-17 128 13 lhoest.q@gmail.com 42851186+lhoestq@users.nore...
env.py
in src/datasets/commands
31 4 2020-09-10 2023-10-26 7 5 thomwolf@users.noreply.gith... mariosasko777@gmail.com
__init__.py
in src/datasets/utils
11 - 2020-09-10 2024-06-04 21 7 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
__init__.py
in src/datasets/commands
10 2 2020-09-10 2021-05-10 2 2 thomwolf@users.noreply.gith... mariosasko777@gmail.com
filelock.py
in src/datasets/utils
8 - 2020-11-16 2023-11-28 17 8 42851186+lhoestq@users.nore... mariosasko777@gmail.com
csv.py
in src/datasets/packaged_modules/csv
164 6 2021-01-19 2025-03-05 27 10 42851186+lhoestq@users.nore... cyyever@outlook.com
json.py
in src/datasets/packaged_modules/json
141 8 2021-01-19 2024-06-19 31 7 42851186+lhoestq@users.nore... 8515462+albertvillanova@use...
__init__.py
in src/datasets/packaged_modules
85 1 2021-01-19 2025-03-18 28 13 42851186+lhoestq@users.nore... yabran.muvdi@gmail.com
text.py
in src/datasets/packaged_modules/text
83 4 2021-01-19 2024-08-21 19 5 42851186+lhoestq@users.nore... 8515462+albertvillanova@use...
pandas.py
in src/datasets/packaged_modules/pandas
47 5 2021-01-19 2024-06-06 9 5 42851186+lhoestq@users.nore... 8515462+albertvillanova@use...
__init__.py
in src/datasets/packaged_modules/csv
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/pandas
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/text
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/json
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/filesystems
26 2 2021-01-26 2025-03-05 18 12 32632186+philschmid@users.n... cyyever@outlook.com
formatting.py
in src/datasets/formatting
464 74 2021-02-05 2025-06-09 39 14 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/formatting
84 4 2021-02-05 2025-04-28 17 8 42851186+lhoestq@users.nore... 35225576+afuetterer@users.n...
tf_formatter.py
in src/datasets/formatting
83 8 2021-02-05 2025-06-19 14 7 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
torch_formatter.py
in src/datasets/formatting
82 8 2021-02-05 2025-06-19 14 8 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
config.py
in src/datasets
176 - 2021-02-10 2025-06-25 69 20 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
deprecation_utils.py
in src/datasets/utils
70 7 2021-02-24 2022-11-04 7 5 33657802+sbrandeis@users.no... mariosasko777@gmail.com
datasets_cli.py
in src/datasets/commands
25 2 2021-02-26 2025-06-09 8 3 mariosasko777@gmail.com 42851186+lhoestq@users.nore...
csv.py
in src/datasets/io
122 6 2021-03-12 2024-03-12 19 8 8515462+albertvillanova@use... mariosasko777@gmail.com
abc.py
in src/datasets/io
46 4 2021-03-12 2023-04-26 5 5 8515462+albertvillanova@use... maddie.dawson@databricks.com
typing.py
in src/datasets/utils
6 - 2021-03-12 2025-03-05 3 2 8515462+albertvillanova@use... cyyever@outlook.com
__init__.py
in src/datasets/io
1 - 2021-03-12 2021-03-12 1 1 8515462+albertvillanova@use... 8515462+albertvillanova@use...
json.py
in src/datasets/io
148 6 2021-03-18 2024-11-18 25 12 8515462+albertvillanova@use... varadhbhatnagar@rediffmail.com
Files Not Recently Changed (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
__init__.py
in src/datasets/packaged_modules/json
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/text
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/pandas
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/csv
1 - 2021-01-19 2021-01-19 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/io
1 - 2021-03-12 2021-03-12 1 1 8515462+albertvillanova@use... 8515462+albertvillanova@use...
__init__.py
in src/datasets/utils/resources
1 - 2021-04-26 2021-04-26 1 1 theo-m@users.noreply.github... theo-m@users.noreply.github...
__init__.py
in src/datasets/commands
10 2 2020-09-10 2021-05-10 2 2 thomwolf@users.noreply.gith... mariosasko777@gmail.com
readme_structure.yaml
in src/datasets/utils/resources
116 - 2021-05-10 2021-05-14 2 1 chhablani.gunjan@gmail.com chhablani.gunjan@gmail.com
doc_utils.py
in src/datasets/utils
6 1 2021-05-28 2021-05-28 1 1 lewis.c.tunstall@gmail.com lewis.c.tunstall@gmail.com
41 6 2020-08-27 2021-06-14 4 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
42 3 2020-08-28 2021-06-14 3 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
53 9 2021-04-06 2021-06-14 2 2 42851186+lhoestq@users.nore... 8515462+albertvillanova@use...
79 5 2020-08-27 2021-06-14 3 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
108 8 2020-08-27 2021-06-14 4 2 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
__init__.py
in src/datasets/packaged_modules/parquet
1 - 2021-06-30 2021-07-16 2 2 42851186+lhoestq@users.nore... stevhliu@gmail.com
__init__.py
in src/datasets/packaged_modules/imagefolder
1 - 2022-03-01 2022-03-01 1 1 nxr9266@g.rit.edu nxr9266@g.rit.edu
utils.py
in benchmarks
47 3 2020-08-27 2022-03-11 5 4 thomwolf@users.noreply.gith... mariosasko777@gmail.com
__init__.py
in src/datasets/download
9 - 2022-05-25 2022-05-25 1 1 8515462+albertvillanova@use... 8515462+albertvillanova@use...
__init__.py
in src/datasets/packaged_modules/folder_based_builder
1 - 2022-08-22 2022-08-22 1 1 polina@huggingface.co polina@huggingface.co
__init__.py
in src/datasets/packaged_modules/audiofolder
1 - 2022-08-22 2022-08-22 1 1 polina@huggingface.co polina@huggingface.co
__init__.py
in src/datasets/packaged_modules/generator
1 - 2022-09-16 2022-09-16 1 1 mariosasko777@gmail.com mariosasko777@gmail.com
__init__.py
in src/datasets/packaged_modules/sql
1 - 2022-10-03 2022-10-03 1 1 frederic.branchaud.charron@... frederic.branchaud.charron@...
deprecation_utils.py
in src/datasets/utils
70 7 2021-02-24 2022-11-04 7 5 33657802+sbrandeis@users.no... mariosasko777@gmail.com
version.py
in src/datasets/utils
52 11 2020-09-10 2022-12-09 11 8 thomwolf@users.noreply.gith... 59462357+stevhliu@users.nor...
format.py
in benchmarks
34 1 2020-08-27 2023-02-14 6 4 thomwolf@users.noreply.gith... mariosasko777@gmail.com
__init__.py
in src/datasets/packaged_modules/spark
1 - 2023-04-26 2023-04-26 1 1 maddie.dawson@databricks.com maddie.dawson@databricks.com
abc.py
in src/datasets/io
46 4 2021-03-12 2023-04-26 5 5 8515462+albertvillanova@use... maddie.dawson@databricks.com
spark.py
in src/datasets/io
46 2 2023-04-26 2023-05-25 3 1 maddie.dawson@databricks.com maddie.dawson@databricks.com
__init__.py
in src/datasets/packaged_modules/arrow
1 - 2023-06-13 2023-06-13 1 1 mariusz.jachimowicz.83@gmai... mariusz.jachimowicz.83@gmai...
release.py
in utils
62 5 2021-06-14 2023-07-06 3 3 8515462+albertvillanova@use... mariosasko777@gmail.com
env.py
in src/datasets/commands
31 4 2020-09-10 2023-10-26 7 5 thomwolf@users.noreply.gith... mariosasko777@gmail.com
patching.py
in src/datasets/utils
65 6 2021-07-13 2023-11-21 9 5 8515462+albertvillanova@use... mariosasko777@gmail.com
__init__.py
in src/datasets/packaged_modules/webdataset
1 - 2023-11-28 2023-11-28 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
filelock.py
in src/datasets/utils
8 - 2020-11-16 2023-11-28 17 8 42851186+lhoestq@users.nore... mariosasko777@gmail.com
_tenbin.py
in src/datasets/packaged_modules/webdataset
167 21 2023-11-28 2023-11-28 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
experimental.py
in src/datasets/utils
12 1 2023-06-14 2024-03-01 2 2 ying.chen@databricks.com 42851186+lhoestq@users.nore...
tqdm.py
in src/datasets/utils
40 6 2023-11-22 2024-03-01 2 2 mariosasko777@gmail.com 42851186+lhoestq@users.nore...
naming.py
in src/datasets
47 6 2020-09-10 2024-03-01 11 8 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
text.py
in src/datasets/io
53 2 2021-03-18 2024-03-12 6 4 8515462+albertvillanova@use... mariosasko777@gmail.com
sql.py
in src/datasets/io
101 6 2022-10-03 2024-03-12 8 3 frederic.branchaud.charron@... mariosasko777@gmail.com
csv.py
in src/datasets/io
122 6 2021-03-12 2024-03-12 19 8 8515462+albertvillanova@use... mariosasko777@gmail.com
parallel.py
in src/datasets/parallel
65 4 2023-06-14 2024-04-15 2 2 ying.chen@databricks.com 42851186+lhoestq@users.nore...
delete_from_hub.py
in src/datasets/commands
35 4 2024-04-30 2024-04-30 1 1 8515462+albertvillanova@use... 8515462+albertvillanova@use...
compression.py
in src/datasets/filesystems
80 5 2021-08-23 2024-05-13 9 5 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/parallel
1 - 2023-06-14 2024-06-04 2 2 ying.chen@databricks.com 8515462+albertvillanova@use...
__init__.py
in src/datasets/utils
11 - 2020-09-10 2024-06-04 21 7 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
pandas.py
in src/datasets/packaged_modules/pandas
47 5 2021-01-19 2024-06-06 9 5 42851186+lhoestq@users.nore... 8515462+albertvillanova@use...
info_utils.py
in src/datasets/utils
67 4 2020-09-10 2024-06-10 9 5 thomwolf@users.noreply.gith... 8515462+albertvillanova@use...
json.py
in src/datasets/packaged_modules/json
141 8 2021-01-19 2024-06-19 31 7 42851186+lhoestq@users.nore... 8515462+albertvillanova@use...
23 - 2022-10-17 2024-06-28 9 5 mariosasko777@gmail.com 8515462+albertvillanova@use...
Most Recently Created Files (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
_torchcodec.py
in src/datasets/features
13 1 2025-06-19 2025-06-19 1 1 49127578+tytodd@users.norep... 49127578+tytodd@users.norep...
pdf.py
in src/datasets/features
141 8 2025-03-18 2025-06-17 6 4 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
pdffolder.py
in src/datasets/packaged_modules/pdffolder
13 1 2025-03-18 2025-03-18 1 1 yabran.muvdi@gmail.com yabran.muvdi@gmail.com
__init__.py
in src/datasets/packaged_modules/pdffolder
1 - 2025-03-18 2025-03-18 1 1 yabran.muvdi@gmail.com yabran.muvdi@gmail.com
video.py
in src/datasets/features
186 9 2024-10-24 2025-06-19 8 5 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
xml.py
in src/datasets/packaged_modules/xml
46 4 2024-10-24 2024-10-24 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
videofolder.py
in src/datasets/packaged_modules/videofolder
21 1 2024-10-24 2025-03-05 2 2 42851186+lhoestq@users.nore... cyyever@outlook.com
__init__.py
in src/datasets/packaged_modules/videofolder
1 - 2024-10-24 2024-10-24 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/xml
1 - 2024-10-24 2024-10-24 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
hub.py
in src/datasets
100 2 2024-04-30 2025-06-09 4 2 8515462+albertvillanova@use... 42851186+lhoestq@users.nore...
delete_from_hub.py
in src/datasets/commands
35 4 2024-04-30 2024-04-30 1 1 8515462+albertvillanova@use... 8515462+albertvillanova@use...
_dataset_viewer.py
in src/datasets/utils
70 2 2024-04-08 2025-03-05 4 3 sylvain.lesage@huggingface.co cyyever@outlook.com
polars_formatter.py
in src/datasets/formatting
88 11 2024-03-08 2025-01-30 2 2 psmyth1994@gmail.com 42851186+lhoestq@users.nore...
track.py
in src/datasets/utils
45 10 2023-12-19 2025-03-05 3 2 42851186+lhoestq@users.nore... cyyever@outlook.com
_dill.py
in src/datasets/utils
334 17 2023-11-28 2025-06-16 7 4 mariosasko777@gmail.com 45557362+qgallouedec@users....
webdataset.py
in src/datasets/packaged_modules/webdataset
263 12 2023-11-28 2025-03-05 12 4 42851186+lhoestq@users.nore... cyyever@outlook.com
_tenbin.py
in src/datasets/packaged_modules/webdataset
167 21 2023-11-28 2023-11-28 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/webdataset
1 - 2023-11-28 2023-11-28 1 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
_filelock.py
in src/datasets/utils
30 2 2023-11-23 2025-03-05 4 3 mariosasko777@gmail.com cyyever@outlook.com
tqdm.py
in src/datasets/utils
40 6 2023-11-22 2024-03-01 2 2 mariosasko777@gmail.com 42851186+lhoestq@users.nore...
exceptions.py
in src/datasets
61 1 2023-10-10 2025-03-05 7 3 8515462+albertvillanova@use... cyyever@outlook.com
parallel.py
in src/datasets/parallel
65 4 2023-06-14 2024-04-15 2 2 ying.chen@databricks.com 42851186+lhoestq@users.nore...
experimental.py
in src/datasets/utils
12 1 2023-06-14 2024-03-01 2 2 ying.chen@databricks.com 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/parallel
1 - 2023-06-14 2024-06-04 2 2 ying.chen@databricks.com 8515462+albertvillanova@use...
arrow.py
in src/datasets/packaged_modules/arrow
56 5 2023-06-13 2025-01-09 6 5 mariusz.jachimowicz.83@gmai... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules/arrow
1 - 2023-06-13 2023-06-13 1 1 mariusz.jachimowicz.83@gmai... mariusz.jachimowicz.83@gmai...
spark.py
in src/datasets/packaged_modules/spark
289 20 2023-04-26 2025-03-05 9 5 42851186+lhoestq@users.nore... cyyever@outlook.com
spark.py
in src/datasets/io
46 2 2023-04-26 2023-05-25 3 1 maddie.dawson@databricks.com maddie.dawson@databricks.com
__init__.py
in src/datasets/packaged_modules/spark
1 - 2023-04-26 2023-04-26 1 1 maddie.dawson@databricks.com maddie.dawson@databricks.com
distributed.py
in src/datasets
9 1 2023-01-16 2024-10-25 2 1 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
sharding.py
in src/datasets/utils
56 5 2022-11-09 2025-03-05 4 4 teven.lescao@gmail.com cyyever@outlook.com
hub.py
in src/datasets/utils
9 1 2022-11-09 2024-08-20 9 4 8515462+albertvillanova@use... 8515462+albertvillanova@use...
23 - 2022-10-17 2024-06-28 9 5 mariosasko777@gmail.com 8515462+albertvillanova@use...
np_formatter.py
in src/datasets/formatting
79 8 2022-10-10 2025-06-19 8 5 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
sql.py
in src/datasets/io
101 6 2022-10-03 2024-03-12 8 3 frederic.branchaud.charron@... mariosasko777@gmail.com
sql.py
in src/datasets/packaged_modules/sql
92 7 2022-10-03 2025-03-05 5 5 frederic.branchaud.charron@... cyyever@outlook.com
__init__.py
in src/datasets/packaged_modules/sql
1 - 2022-10-03 2022-10-03 1 1 frederic.branchaud.charron@... frederic.branchaud.charron@...
generator.py
in src/datasets/io
52 2 2022-09-16 2024-07-26 8 6 mariosasko777@gmail.com piercus@users.noreply.githu...
generator.py
in src/datasets/packaged_modules/generator
23 4 2022-09-16 2025-03-05 5 5 mariosasko777@gmail.com cyyever@outlook.com
__init__.py
in src/datasets/packaged_modules/generator
1 - 2022-09-16 2022-09-16 1 1 mariosasko777@gmail.com mariosasko777@gmail.com
folder_based_builder.py
in src/datasets/packaged_modules/folder_based_builder
347 8 2022-08-22 2025-06-25 19 8 polina@huggingface.co 42851186+lhoestq@users.nore...
audiofolder.py
in src/datasets/packaged_modules/audiofolder
61 1 2022-08-22 2025-06-19 8 5 polina@huggingface.co 49127578+tytodd@users.norep...
__init__.py
in src/datasets/packaged_modules/audiofolder
1 - 2022-08-22 2022-08-22 1 1 polina@huggingface.co polina@huggingface.co
__init__.py
in src/datasets/packaged_modules/folder_based_builder
1 - 2022-08-22 2022-08-22 1 1 polina@huggingface.co polina@huggingface.co
tf_utils.py
in src/datasets/utils
390 21 2022-06-06 2025-03-05 9 6 rocketknight1@users.noreply... cyyever@outlook.com
download_manager.py
in src/datasets/download
172 14 2022-05-25 2025-03-05 23 7 8515462+albertvillanova@use... cyyever@outlook.com
streaming_download_manager.py
in src/datasets/download
106 11 2022-05-25 2025-03-05 36 9 8515462+albertvillanova@use... cyyever@outlook.com
stratify.py
in src/datasets/utils
46 2 2022-05-25 2025-01-09 3 2 48522685+nandwalritik@users... 42851186+lhoestq@users.nore...
download_config.py
in src/datasets/download
33 2 2022-05-25 2025-04-28 15 9 8515462+albertvillanova@use... 35225576+afuetterer@users.n...
__init__.py
in src/datasets/download
9 - 2022-05-25 2022-05-25 1 1 8515462+albertvillanova@use... 8515462+albertvillanova@use...
Most Recently Changed Files (Top 50)
File# lines# unitscreatedlast modified# changes
(days)
# contributorsfirst
contributor
latest
contributor
iterable_dataset.py
in src/datasets
2750 234 2021-06-23 2025-06-25 96 37 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
features.py
in src/datasets/features
1354 96 2021-10-13 2025-06-25 91 32 8515462+albertvillanova@use... 42851186+lhoestq@users.nore...
builder.py
in src/datasets
1185 50 2020-09-10 2025-06-25 144 33 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
dataset_dict.py
in src/datasets
1075 58 2020-09-10 2025-06-25 134 42 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
load.py
in src/datasets
952 26 2020-09-10 2025-06-25 181 46 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
table.py
in src/datasets
942 131 2021-03-26 2025-06-25 63 21 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
search.py
in src/datasets
393 33 2020-09-10 2025-06-25 41 25 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
folder_based_builder.py
in src/datasets/packaged_modules/folder_based_builder
347 8 2022-08-22 2025-06-25 19 8 polina@huggingface.co 42851186+lhoestq@users.nore...
info.py
in src/datasets
254 17 2020-09-10 2025-06-25 42 16 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
config.py
in src/datasets
176 - 2021-02-10 2025-06-25 69 20 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
translation.py
in src/datasets/features
52 6 2021-10-13 2025-06-25 14 9 8515462+albertvillanova@use... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/features
24 - 2021-10-13 2025-06-25 10 5 8515462+albertvillanova@use... 42851186+lhoestq@users.nore...
image.py
in src/datasets/features
250 11 2021-12-06 2025-06-19 33 10 mario@huggingface.co 49127578+tytodd@users.norep...
video.py
in src/datasets/features
186 9 2024-10-24 2025-06-19 8 5 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
audio.py
in src/datasets/features
163 7 2021-10-13 2025-06-19 43 12 8515462+albertvillanova@use... 49127578+tytodd@users.norep...
setup.py
in root
131 - 2020-04-14 2025-06-19 278 57 thomwolf@users.noreply.gith... 49127578+tytodd@users.norep...
jax_formatter.py
in src/datasets/formatting
116 9 2021-06-21 2025-06-19 13 8 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
tf_formatter.py
in src/datasets/formatting
83 8 2021-02-05 2025-06-19 14 7 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
torch_formatter.py
in src/datasets/formatting
82 8 2021-02-05 2025-06-19 14 8 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
np_formatter.py
in src/datasets/formatting
79 8 2022-10-10 2025-06-19 8 5 42851186+lhoestq@users.nore... 49127578+tytodd@users.norep...
audiofolder.py
in src/datasets/packaged_modules/audiofolder
61 1 2022-08-22 2025-06-19 8 5 polina@huggingface.co 49127578+tytodd@users.norep...
_torchcodec.py
in src/datasets/features
13 1 2025-06-19 2025-06-19 1 1 49127578+tytodd@users.norep... 49127578+tytodd@users.norep...
pdf.py
in src/datasets/features
141 8 2025-03-18 2025-06-17 6 4 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets
32 - 2020-09-10 2025-06-17 128 13 lhoest.q@gmail.com 42851186+lhoestq@users.nore...
_dill.py
in src/datasets/utils
334 17 2023-11-28 2025-06-16 7 4 mariosasko777@gmail.com 45557362+qgallouedec@users....
file_utils.py
in src/datasets/utils
807 69 2020-09-10 2025-06-09 105 26 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
formatting.py
in src/datasets/formatting
464 74 2021-02-05 2025-06-09 39 14 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
py_utils.py
in src/datasets/utils
402 26 2020-09-10 2025-06-09 75 19 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
inspect.py
in src/datasets
149 5 2020-09-10 2025-06-09 45 14 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
hub.py
in src/datasets
100 2 2024-04-30 2025-06-09 4 2 8515462+albertvillanova@use... 42851186+lhoestq@users.nore...
streaming.py
in src/datasets
83 2 2021-06-23 2025-06-09 37 7 42851186+lhoestq@users.nore... 42851186+lhoestq@users.nore...
logging.py
in src/datasets/utils
69 14 2020-09-10 2025-06-09 20 7 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
datasets_cli.py
in src/datasets/commands
25 2 2021-02-26 2025-06-09 8 3 mariosasko777@gmail.com 42851186+lhoestq@users.nore...
data_files.py
in src/datasets
470 27 2021-10-11 2025-05-12 58 17 42851186+lhoestq@users.nore... matthew@protopia.ai
keyhash.py
in src/datasets
38 5 2021-05-10 2025-05-07 5 5 nikhilbartwal1234@gmail.com 73196164+giraffacarp@users....
arrow_writer.py
in src/datasets
469 22 2020-09-10 2025-04-28 72 26 thomwolf@users.noreply.gith... 35225576+afuetterer@users.n...
__init__.py
in src/datasets/formatting
84 4 2021-02-05 2025-04-28 17 8 42851186+lhoestq@users.nore... 35225576+afuetterer@users.n...
download_config.py
in src/datasets/download
33 2 2022-05-25 2025-04-28 15 9 8515462+albertvillanova@use... 35225576+afuetterer@users.n...
arrow_reader.py
in src/datasets
309 26 2020-09-10 2025-03-28 34 16 thomwolf@users.noreply.gith... 42851186+lhoestq@users.nore...
__init__.py
in src/datasets/packaged_modules
85 1 2021-01-19 2025-03-18 28 13 42851186+lhoestq@users.nore... yabran.muvdi@gmail.com
pdffolder.py
in src/datasets/packaged_modules/pdffolder
13 1 2025-03-18 2025-03-18 1 1 yabran.muvdi@gmail.com yabran.muvdi@gmail.com
__init__.py
in src/datasets/packaged_modules/pdffolder
1 - 2025-03-18 2025-03-18 1 1 yabran.muvdi@gmail.com yabran.muvdi@gmail.com
tf_utils.py
in src/datasets/utils
390 21 2022-06-06 2025-03-05 9 6 rocketknight1@users.noreply... cyyever@outlook.com
spark.py
in src/datasets/packaged_modules/spark
289 20 2023-04-26 2025-03-05 9 5 42851186+lhoestq@users.nore... cyyever@outlook.com
webdataset.py
in src/datasets/packaged_modules/webdataset
263 12 2023-11-28 2025-03-05 12 4 42851186+lhoestq@users.nore... cyyever@outlook.com
splits.py
in src/datasets
260 42 2020-09-10 2025-03-05 30 13 thomwolf@users.noreply.gith... cyyever@outlook.com
fingerprint.py
in src/datasets
258 22 2020-09-10 2025-03-05 31 15 thomwolf@users.noreply.gith... cyyever@outlook.com
extract.py
in src/datasets/utils
253 25 2021-07-08 2025-03-05 19 8 8515462+albertvillanova@use... cyyever@outlook.com
metadata.py
in src/datasets/utils
180 8 2021-04-26 2025-03-05 40 16 42851186+lhoestq@users.nore... cyyever@outlook.com
download_manager.py
in src/datasets/download
172 14 2022-05-25 2025-03-05 23 7 8515462+albertvillanova@use... cyyever@outlook.com