tokenizers/src/models/unigram/trainer.rs (2 lines): - line 389: // TODO: Temporary hack to avoid Nans. - line 537: // TODO Should be able to upgrade to u64 when needed bindings/python/py_src/tokenizers/tools/visualizer.py (2 lines): - line 238: # TODO is this the right name for the data attribute ? - line 305: # TODO I think there is an edge case here where an annotation's span might not close tokenizers/src/models/unigram/lattice.rs (2 lines): - line 39: // TODO Maybe use Ordered Floats (https://docs.rs/ordered-float/1.0.2/ordered_float/) - line 190: // TODO can we remove this clone ? tokenizers/src/pre_tokenizers/byte_level.rs (1 line): - line 118: // TODO: Give the ability to modify this regex bindings/python/src/utils/normalization.rs (1 line): - line 17: // TODO: Add the compatibility for Fn(char) -> bool tokenizers/src/pre_tokenizers/delimiter.rs (1 line): - line 21: // TODO: Maybe add the option to specify the behavior tokenizers/src/tokenizer/mod.rs (1 line): - line 686: // TODO ArthurZ THIS IS WRONG! We need to measure the length of the `set` because bindings/python/scripts/convert.py (1 line): - line 112: # TODO what parameters should we give ? bindings/python/src/processors.rs (1 line): - line 96: // TODO: update signature to `tk::Result` tokenizers/src/tokenizer/encoding.rs (1 line): - line 394: // TODO this is suboptimal as we're doing this iteratively instead of preallocating bindings/python/stub.py (1 line): - line 103: # TODO it would be interesting to add the setter maybe ?