huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
GitHub Repo 
13K
lines of main code
100 files
2.2K
lines of test code
29 files
1.6K
lines of other code
15 files
2y
age
748 days
88%
main code touched
1 year (11K LOC)
9%
new main code
1 year (1.3K LOC)
12K
py
0.7K
rs
0.2K
toml

38

196

254

16

35

7

2025 2024 2023

generated by sokrates.dev (configuration) on 2025-06-30