tokenizers/src/models/unigram/trainer.rs (2 lines):
	- line 389: // TODO: Temporary hack to avoid Nans.
	- line 537: // TODO Should be able to upgrade to u64 when needed


bindings/python/py_src/tokenizers/tools/visualizer.py (2 lines):
	- line 238: # TODO is this the right name for the data attribute ?
	- line 305: # TODO I think there is an edge case here where an annotation's span might not close


tokenizers/src/models/unigram/lattice.rs (2 lines):
	- line 39: // TODO Maybe use Ordered Floats (https://docs.rs/ordered-float/1.0.2/ordered_float/)
	- line 190: // TODO can we remove this clone ?


tokenizers/src/pre_tokenizers/byte_level.rs (1 line):
	- line 118: // TODO: Give the ability to modify this regex


bindings/python/src/utils/normalization.rs (1 line):
	- line 17: // TODO: Add the compatibility for Fn(char) -> bool


tokenizers/src/pre_tokenizers/delimiter.rs (1 line):
	- line 21: // TODO: Maybe add the option to specify the behavior


tokenizers/src/tokenizer/mod.rs (1 line):
	- line 686: // TODO ArthurZ THIS IS WRONG! We need to measure the length of the `set` because


bindings/python/scripts/convert.py (1 line):
	- line 112: # TODO what parameters should we give ?


bindings/python/src/processors.rs (1 line):
	- line 96: // TODO: update signature to `tk::Result<usize>`


tokenizers/src/tokenizer/encoding.rs (1 line):
	- line 394: // TODO this is suboptimal as we're doing this iteratively instead of preallocating


bindings/python/stub.py (1 line):
	- line 103: # TODO it would be interesting to add the setter maybe ?