huggingface / tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
GitHub Repo 
29K
lines of main code
146 files
4.3K
lines of test code
36 files
18K
lines of other code
66 files
6y
age
2,069 days
86%
main code touched
1 year (25K LOC)
2%
new main code
1 year (793 LOC)
26K
rs
1.8K
py
PYI
0.6K
pyi
0.4K
js
0.2K
toml
0.1K
css
CFG
0.05K
cfg
IN
0.01K
in
0.01K
ts

31

92

118

110

79

1213

198

20

25

26

22

15

44

4

2025 2024 2023 2022 2021 2020 2019

generated by sokrates.dev (configuration) on 2025-06-30