huggingface / OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
GitHub Repo 
7.7K
lines of main code
58 files
0
lines of test code
0 files
0.07K
lines of other code
2 files
2y
age
755 days
0%
main code touched
1 year (0 LOC)
0%
new main code
1 year (0 LOC)
7.5K
py
0.2K
html
0.09K
yaml

0

2

16

0

1

2

2025 2024 2023

generated by sokrates.dev (configuration) on 2025-06-30