duplicated block id: 1 size: 58 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1989:2049) - tokenizers/src/tokenizer/normalizer.rs (2051:2111) duplicated block id: 2 size: 40 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1632:1673) - tokenizers/src/tokenizer/normalizer.rs (1675:1716) duplicated block id: 3 size: 37 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (510:547) - tokenizers/src/processors/sequence.rs (126:163) duplicated block id: 4 size: 35 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (56:95) - tokenizers/src/processors/roberta.rs (86:125) duplicated block id: 5 size: 31 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (97:130) - tokenizers/src/processors/roberta.rs (127:160) duplicated block id: 6 size: 28 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (152:179) - bindings/python/src/trainers.rs (372:399) duplicated block id: 7 size: 26 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (320:348) - bindings/python/src/trainers.rs (527:555) duplicated block id: 8 size: 23 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (172:200) - tokenizers/src/processors/roberta.rs (213:241) duplicated block id: 9 size: 22 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (635:662) - tokenizers/src/models/wordlevel/trainer.rs (104:131) duplicated block id: 10 size: 22 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1996:2018) - tokenizers/src/tokenizer/normalizer.rs (2120:2142) duplicated block id: 11 size: 22 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2058:2080) - tokenizers/src/tokenizer/normalizer.rs (2120:2142) duplicated block id: 12 size: 21 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (254:274) - tokenizers/src/processors/template.rs (930:950) duplicated block id: 13 size: 20 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (427:453) - bindings/python/src/utils/serde_pyo3.rs (467:493) duplicated block id: 14 size: 20 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (28:47) - tokenizers/src/pre_tokenizers/sequence.rs (61:80) duplicated block id: 15 size: 18 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2062:2080) - tokenizers/src/tokenizer/normalizer.rs (2192:2210) duplicated block id: 16 size: 18 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1639:1657) - tokenizers/src/tokenizer/normalizer.rs (1725:1744) duplicated block id: 17 size: 18 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2000:2018) - tokenizers/src/tokenizer/normalizer.rs (2192:2210) duplicated block id: 18 size: 18 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (237:257) - bindings/python/src/trainers.rs (789:809) duplicated block id: 19 size: 18 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (238:258) - bindings/python/src/trainers.rs (452:472) duplicated block id: 20 size: 18 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2124:2142) - tokenizers/src/tokenizer/normalizer.rs (2192:2210) duplicated block id: 21 size: 18 cleaned lines of code in 2 files: - bindings/node/src/decoders.rs (96:115) - bindings/node/src/pre_tokenizers.rs (161:180) duplicated block id: 22 size: 18 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1682:1700) - tokenizers/src/tokenizer/normalizer.rs (1725:1744) duplicated block id: 23 size: 18 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (1048:1065) - tokenizers/src/processors/template.rs (1099:1116) duplicated block id: 24 size: 18 cleaned lines of code in 2 files: - bindings/python/scripts/convert.py (119:139) - bindings/python/scripts/convert.py (254:274) duplicated block id: 25 size: 17 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (643:660) - bindings/python/src/trainers.rs (789:806) duplicated block id: 26 size: 17 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (452:471) - bindings/python/src/trainers.rs (790:809) duplicated block id: 27 size: 17 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (399:421) - bindings/python/src/utils/serde_pyo3.rs (431:453) duplicated block id: 28 size: 17 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (399:421) - bindings/python/src/utils/serde_pyo3.rs (471:493) duplicated block id: 29 size: 17 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (237:254) - bindings/python/src/trainers.rs (643:660) duplicated block id: 30 size: 17 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (488:504) - tokenizers/src/processors/sequence.rs (101:117) duplicated block id: 31 size: 17 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1390:1408) - tokenizers/src/tokenizer/mod.rs (1443:1461) duplicated block id: 32 size: 17 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (1038:1055) - bindings/python/src/tokenizer.rs (1097:1114) duplicated block id: 33 size: 17 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (230:246) - tokenizers/src/processors/template.rs (908:924) duplicated block id: 34 size: 16 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (329:346) - bindings/python/src/trainers.rs (683:700) duplicated block id: 35 size: 16 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (237:252) - tokenizers/src/processors/roberta.rs (280:295) duplicated block id: 36 size: 16 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (903:918) - tokenizers/src/processors/template.rs (981:996) duplicated block id: 37 size: 16 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (452:468) - bindings/python/src/trainers.rs (644:660) duplicated block id: 38 size: 16 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (536:553) - bindings/python/src/trainers.rs (683:700) duplicated block id: 39 size: 15 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/trainer.rs (623:642) - tokenizers/src/models/wordlevel/trainer.rs (111:130) duplicated block id: 40 size: 15 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/trainer.rs (623:642) - tokenizers/src/models/unigram/trainer.rs (642:661) duplicated block id: 41 size: 15 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (135:152) - tokenizers/src/processors/bert.rs (162:179) duplicated block id: 42 size: 15 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (1069:1083) - tokenizers/src/processors/template.rs (1101:1115) duplicated block id: 43 size: 15 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1539) - tokenizers/src/tokenizer/normalizer.rs (1642:1657) duplicated block id: 44 size: 15 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (329:345) - bindings/python/src/trainers.rs (856:872) duplicated block id: 45 size: 15 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1539) - tokenizers/src/tokenizer/normalizer.rs (1685:1700) duplicated block id: 46 size: 15 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (213:228) - tokenizers/src/processors/template.rs (891:906) duplicated block id: 47 size: 15 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2120:2134) - tokenizers/src/tokenizer/normalizer.rs (2239:2253) duplicated block id: 48 size: 15 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1539) - tokenizers/src/tokenizer/normalizer.rs (1728:1744) duplicated block id: 49 size: 15 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2058:2072) - tokenizers/src/tokenizer/normalizer.rs (2239:2253) duplicated block id: 50 size: 15 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (683:699) - bindings/python/src/trainers.rs (856:872) duplicated block id: 51 size: 15 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1996:2010) - tokenizers/src/tokenizer/normalizer.rs (2239:2253) duplicated block id: 52 size: 15 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (536:552) - bindings/python/src/trainers.rs (856:872) duplicated block id: 53 size: 15 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (1050:1064) - tokenizers/src/processors/template.rs (1069:1083) duplicated block id: 54 size: 15 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (364:383) - tokenizers/src/tokenizer/normalizer.rs (621:640) duplicated block id: 55 size: 14 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (372:461) - bindings/python/py_src/tokenizers/models/__init__.pyi (501:591) duplicated block id: 56 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1870:1884) - tokenizers/src/tokenizer/normalizer.rs (2196:2210) duplicated block id: 57 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (610:623) - tokenizers/src/tokenizer/encoding.rs (653:666) duplicated block id: 58 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1659:1673) - tokenizers/src/tokenizer/normalizer.rs (1746:1761) duplicated block id: 59 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (610:623) - tokenizers/src/tokenizer/encoding.rs (750:763) duplicated block id: 60 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (653:666) - tokenizers/src/tokenizer/encoding.rs (750:763) duplicated block id: 61 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1966:1979) - tokenizers/src/tokenizer/normalizer.rs (2238:2251) duplicated block id: 62 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1702:1716) - tokenizers/src/tokenizer/normalizer.rs (1746:1761) duplicated block id: 63 size: 14 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (353:374) - bindings/python/src/pre_tokenizers.rs (693:714) duplicated block id: 64 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2085:2098) - tokenizers/src/tokenizer/normalizer.rs (2147:2160) duplicated block id: 65 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1870:1884) - tokenizers/src/tokenizer/normalizer.rs (2004:2018) duplicated block id: 66 size: 14 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (474:487) - tokenizers/src/processors/sequence.rs (84:97) duplicated block id: 67 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1870:1884) - tokenizers/src/tokenizer/normalizer.rs (2066:2080) duplicated block id: 68 size: 14 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (462:476) - bindings/python/src/pre_tokenizers.rs (564:578) duplicated block id: 69 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1928:1942) - tokenizers/src/tokenizer/normalizer.rs (2252:2266) duplicated block id: 70 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1928:1942) - tokenizers/src/tokenizer/normalizer.rs (2200:2214) duplicated block id: 71 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2200:2214) - tokenizers/src/tokenizer/normalizer.rs (2252:2266) duplicated block id: 72 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2023:2036) - tokenizers/src/tokenizer/normalizer.rs (2147:2160) duplicated block id: 73 size: 14 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1870:1884) - tokenizers/src/tokenizer/normalizer.rs (2128:2142) duplicated block id: 74 size: 13 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1730:1744) - tokenizers/src/tokenizer/normalizer.rs (1781:1794) duplicated block id: 75 size: 13 cleaned lines of code in 2 files: - tokenizers/benches/bpe_benchmark.rs (100:114) - tokenizers/benches/unigram_benchmark.rs (46:60) duplicated block id: 76 size: 13 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/word.rs (291:311) - tokenizers/src/models/bpe/word.rs (333:346) duplicated block id: 77 size: 13 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (89:101) - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (121:133) duplicated block id: 78 size: 13 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (350:364) - bindings/python/src/trainers.rs (556:570) duplicated block id: 79 size: 13 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1525:1539) - tokenizers/src/tokenizer/normalizer.rs (1781:1794) duplicated block id: 80 size: 13 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1967:1979) - tokenizers/src/tokenizer/normalizer.rs (2058:2070) duplicated block id: 81 size: 13 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1687:1700) - tokenizers/src/tokenizer/normalizer.rs (1781:1794) duplicated block id: 82 size: 13 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1967:1979) - tokenizers/src/tokenizer/normalizer.rs (2120:2132) duplicated block id: 83 size: 13 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1967:1979) - tokenizers/src/tokenizer/normalizer.rs (1996:2008) duplicated block id: 84 size: 13 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1644:1657) - tokenizers/src/tokenizer/normalizer.rs (1781:1794) duplicated block id: 85 size: 13 cleaned lines of code in 2 files: - tokenizers/src/processors/roberta.rs (264:276) - tokenizers/src/processors/template.rs (894:906) duplicated block id: 86 size: 13 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (216:228) - tokenizers/src/processors/roberta.rs (264:276) duplicated block id: 87 size: 13 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (117:130) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (72:85) duplicated block id: 88 size: 12 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/mod.rs (31:42) - tokenizers/src/pre_tokenizers/mod.rs (102:113) duplicated block id: 89 size: 12 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1967:1978) - tokenizers/src/tokenizer/normalizer.rs (2263:2274) duplicated block id: 90 size: 12 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2239:2250) - tokenizers/src/tokenizer/normalizer.rs (2263:2274) duplicated block id: 91 size: 12 cleaned lines of code in 2 files: - tokenizers/benches/bert_benchmark.rs (76:87) - tokenizers/benches/bert_benchmark.rs (104:115) duplicated block id: 92 size: 12 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (33:46) - tokenizers/src/pre_tokenizers/byte_level.rs (133:145) duplicated block id: 93 size: 12 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1787:1799) - tokenizers/src/tokenizer/normalizer.rs (1832:1844) duplicated block id: 94 size: 12 cleaned lines of code in 2 files: - bindings/node/src/tasks/models.rs (16:30) - bindings/node/src/tasks/models.rs (63:77) duplicated block id: 95 size: 12 cleaned lines of code in 2 files: - bindings/node/src/tasks/models.rs (16:30) - bindings/node/src/tasks/models.rs (40:54) duplicated block id: 96 size: 12 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (145:156) - tokenizers/src/processors/roberta.rs (179:190) duplicated block id: 97 size: 12 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (283:294) - tokenizers/src/processors/roberta.rs (328:339) duplicated block id: 98 size: 12 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (531:544) - tokenizers/src/models/wordlevel/mod.rs (196:209) duplicated block id: 99 size: 12 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2058:2069) - tokenizers/src/tokenizer/normalizer.rs (2263:2274) duplicated block id: 100 size: 12 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2120:2131) - tokenizers/src/tokenizer/normalizer.rs (2263:2274) duplicated block id: 101 size: 12 cleaned lines of code in 2 files: - tokenizers/src/models/wordlevel/serialization.rs (66:85) - tokenizers/src/models/wordpiece/serialization.rs (91:110) duplicated block id: 102 size: 12 cleaned lines of code in 2 files: - bindings/node/src/tasks/models.rs (40:54) - bindings/node/src/tasks/models.rs (63:77) duplicated block id: 103 size: 12 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (68:82) - tokenizers/src/pre_tokenizers/whitespace.rs (68:82) duplicated block id: 104 size: 12 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1608:1620) - tokenizers/src/tokenizer/normalizer.rs (1834:1846) duplicated block id: 105 size: 12 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1996:2007) - tokenizers/src/tokenizer/normalizer.rs (2263:2274) duplicated block id: 106 size: 12 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (52:86) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (101:138) duplicated block id: 107 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/added_vocabulary.rs (778:788) - tokenizers/src/tokenizer/added_vocabulary.rs (1015:1025) duplicated block id: 108 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1682:1692) - tokenizers/src/tokenizer/normalizer.rs (1840:1850) duplicated block id: 109 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1658:1668) - tokenizers/src/tokenizer/normalizer.rs (1820:1830) duplicated block id: 110 size: 11 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (363:374) - tokenizers/src/pre_tokenizers/byte_level.rs (385:396) duplicated block id: 111 size: 11 cleaned lines of code in 2 files: - tokenizers/src/processors/mod.rs (67:77) - tokenizers/src/processors/roberta.rs (242:252) duplicated block id: 112 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1819:1829) - tokenizers/src/tokenizer/normalizer.rs (1840:1850) duplicated block id: 113 size: 11 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (282:293) - tokenizers/src/pre_tokenizers/metaspace.rs (335:345) duplicated block id: 114 size: 11 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/trainer.rs (223:233) - tokenizers/src/models/unigram/trainer.rs (86:96) duplicated block id: 115 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1725:1735) - tokenizers/src/tokenizer/normalizer.rs (1819:1829) duplicated block id: 116 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (673:683) - tokenizers/src/tokenizer/encoding.rs (751:761) duplicated block id: 117 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1701:1711) - tokenizers/src/tokenizer/normalizer.rs (1820:1830) duplicated block id: 118 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1639:1649) - tokenizers/src/tokenizer/normalizer.rs (1819:1829) duplicated block id: 119 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1725:1735) - tokenizers/src/tokenizer/normalizer.rs (1840:1850) duplicated block id: 120 size: 11 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (152:162) - bindings/python/src/trainers.rs (579:589) duplicated block id: 121 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (611:621) - tokenizers/src/tokenizer/encoding.rs (673:683) duplicated block id: 122 size: 11 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/whitespace.rs (68:78) - tokenizers/src/pre_tokenizers/whitespace.rs (92:102) duplicated block id: 123 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (654:664) - tokenizers/src/tokenizer/encoding.rs (673:683) duplicated block id: 124 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1574:1586) - tokenizers/src/tokenizer/normalizer.rs (1786:1797) duplicated block id: 125 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2192:2202) - tokenizers/src/tokenizer/normalizer.rs (2243:2253) duplicated block id: 126 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1089:1099) - tokenizers/src/tokenizer/normalizer.rs (1149:1159) duplicated block id: 127 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1639:1649) - tokenizers/src/tokenizer/normalizer.rs (1840:1850) duplicated block id: 128 size: 11 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1682:1692) - tokenizers/src/tokenizer/normalizer.rs (1819:1829) duplicated block id: 129 size: 11 cleaned lines of code in 2 files: - tokenizers/src/processors/roberta.rs (173:186) - tokenizers/src/processors/roberta.rs (207:220) duplicated block id: 130 size: 11 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (372:382) - bindings/python/src/trainers.rs (579:589) duplicated block id: 131 size: 11 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (230:240) - tokenizers/src/processors/template.rs (986:996) duplicated block id: 132 size: 11 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (68:78) - tokenizers/src/pre_tokenizers/whitespace.rs (92:102) duplicated block id: 133 size: 11 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (76:86) - tokenizers/src/pre_tokenizers/digits.rs (90:100) duplicated block id: 134 size: 10 cleaned lines of code in 2 files: - tokenizers/benches/bert_benchmark.rs (120:131) - tokenizers/benches/bpe_benchmark.rs (103:114) duplicated block id: 135 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2008:2018) - tokenizers/src/tokenizer/normalizer.rs (2252:2262) duplicated block id: 136 size: 10 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (107:118) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (134:145) duplicated block id: 137 size: 10 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (321:330) - tokenizers/src/pre_tokenizers/metaspace.rs (335:344) duplicated block id: 138 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2132:2142) - tokenizers/src/tokenizer/normalizer.rs (2252:2262) duplicated block id: 139 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1658:1667) - tokenizers/src/tokenizer/normalizer.rs (1841:1850) duplicated block id: 140 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1701:1710) - tokenizers/src/tokenizer/normalizer.rs (1726:1735) duplicated block id: 141 size: 10 cleaned lines of code in 2 files: - tokenizers/benches/bert_benchmark.rs (120:131) - tokenizers/benches/unigram_benchmark.rs (49:60) duplicated block id: 142 size: 10 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (282:292) - tokenizers/src/pre_tokenizers/metaspace.rs (321:330) duplicated block id: 143 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1746:1755) - tokenizers/src/tokenizer/normalizer.rs (1821:1830) duplicated block id: 144 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2070:2080) - tokenizers/src/tokenizer/normalizer.rs (2252:2262) duplicated block id: 145 size: 10 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (102:113) - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (135:146) duplicated block id: 146 size: 10 cleaned lines of code in 2 files: - tokenizers/src/processors/roberta.rs (280:289) - tokenizers/src/processors/template.rs (915:924) duplicated block id: 147 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1928:1938) - tokenizers/src/tokenizer/normalizer.rs (2008:2018) duplicated block id: 148 size: 10 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (489:498) - tokenizers/src/processors/sequence.rs (82:91) duplicated block id: 149 size: 10 cleaned lines of code in 2 files: - tokenizers/src/processors/sequence.rs (82:91) - tokenizers/src/processors/sequence.rs (102:111) duplicated block id: 150 size: 10 cleaned lines of code in 2 files: - tokenizers/benches/llama3_benchmark.rs (51:60) - tokenizers/benches/unigram_benchmark.rs (48:57) duplicated block id: 151 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1701:1710) - tokenizers/src/tokenizer/normalizer.rs (1841:1850) duplicated block id: 152 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1928:1938) - tokenizers/src/tokenizer/normalizer.rs (2132:2142) duplicated block id: 153 size: 10 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (11:80) - bindings/python/py_src/tokenizers/models/__init__.pyi (264:333) duplicated block id: 154 size: 10 cleaned lines of code in 2 files: - tokenizers/benches/bpe_benchmark.rs (102:111) - tokenizers/benches/llama3_benchmark.rs (51:60) duplicated block id: 155 size: 10 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (86:95) - tokenizers/src/pre_tokenizers/fixed_length.rs (106:115) duplicated block id: 156 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1928:1938) - tokenizers/src/tokenizer/normalizer.rs (2070:2080) duplicated block id: 157 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1640:1649) - tokenizers/src/tokenizer/normalizer.rs (1658:1667) duplicated block id: 158 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1874:1884) - tokenizers/src/tokenizer/normalizer.rs (1928:1938) duplicated block id: 159 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1640:1649) - tokenizers/src/tokenizer/normalizer.rs (1701:1710) duplicated block id: 160 size: 10 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/tools/visualizer-styles.css (59:69) - bindings/python/py_src/tokenizers/tools/visualizer-styles.css (158:167) duplicated block id: 161 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1658:1667) - tokenizers/src/tokenizer/normalizer.rs (1726:1735) duplicated block id: 162 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1575:1586) - tokenizers/src/tokenizer/normalizer.rs (1832:1842) duplicated block id: 163 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1683:1692) - tokenizers/src/tokenizer/normalizer.rs (1701:1710) duplicated block id: 164 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1608:1618) - tokenizers/src/tokenizer/normalizer.rs (1789:1799) duplicated block id: 165 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1658:1667) - tokenizers/src/tokenizer/normalizer.rs (1683:1692) duplicated block id: 166 size: 10 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1874:1884) - tokenizers/src/tokenizer/normalizer.rs (2252:2262) duplicated block id: 167 size: 10 cleaned lines of code in 2 files: - tokenizers/benches/bert_benchmark.rs (92:103) - tokenizers/benches/bpe_benchmark.rs (81:92) duplicated block id: 168 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1905:1915) - tokenizers/src/tokenizer/normalizer.rs (2102:2111) duplicated block id: 169 size: 9 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (766:776) - tokenizers/src/models/unigram/trainer.rs (802:812) duplicated block id: 170 size: 9 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (145:158) - bindings/python/src/processors.rs (182:195) duplicated block id: 171 size: 9 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (1262:1270) - bindings/python/src/trainers.rs (791:799) duplicated block id: 172 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/pattern.rs (197:205) - tokenizers/src/tokenizer/pattern.rs (211:219) duplicated block id: 173 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (249:257) - tokenizers/src/pre_tokenizers/byte_level.rs (386:395) duplicated block id: 174 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1542:1550) - tokenizers/src/tokenizer/normalizer.rs (1638:1646) duplicated block id: 175 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2184:2192) - tokenizers/src/tokenizer/normalizer.rs (2238:2246) duplicated block id: 176 size: 9 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (742:752) - tokenizers/src/models/unigram/trainer.rs (764:774) duplicated block id: 177 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (213:221) - tokenizers/src/pre_tokenizers/metaspace.rs (236:244) duplicated block id: 178 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1542:1550) - tokenizers/src/tokenizer/normalizer.rs (1681:1689) duplicated block id: 179 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (716:726) - tokenizers/src/tokenizer/normalizer.rs (734:744) duplicated block id: 180 size: 9 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (881:889) - tokenizers/src/models/bpe/model.rs (901:909) duplicated block id: 181 size: 9 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (373:387) - bindings/python/src/utils/serde_pyo3.rs (438:451) duplicated block id: 182 size: 9 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (225:235) - bindings/python/src/trainers.rs (777:787) duplicated block id: 183 size: 9 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (407:415) - bindings/python/src/tokenizer.rs (851:859) duplicated block id: 184 size: 9 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (373:387) - bindings/python/src/utils/serde_pyo3.rs (406:419) duplicated block id: 185 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1905:1915) - tokenizers/src/tokenizer/normalizer.rs (2040:2049) duplicated block id: 186 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (358:366) - tokenizers/src/tokenizer/encoding.rs (376:384) duplicated block id: 187 size: 9 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (373:387) - bindings/python/src/utils/serde_pyo3.rs (478:491) duplicated block id: 188 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (28:36) - tokenizers/src/pre_tokenizers/punctuation.rs (48:56) duplicated block id: 189 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1971:1979) - tokenizers/src/tokenizer/normalizer.rs (2192:2200) duplicated block id: 190 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1641:1649) - tokenizers/src/tokenizer/normalizer.rs (1746:1754) duplicated block id: 191 size: 9 cleaned lines of code in 2 files: - tokenizers/benches/bert_benchmark.rs (120:128) - tokenizers/benches/llama3_benchmark.rs (52:60) duplicated block id: 192 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2040:2049) - tokenizers/src/tokenizer/normalizer.rs (2163:2173) duplicated block id: 193 size: 9 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (63:73) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (88:98) duplicated block id: 194 size: 9 cleaned lines of code in 2 files: - bindings/python/scripts/convert.py (164:172) - bindings/python/scripts/convert.py (242:250) duplicated block id: 195 size: 9 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (154:164) - bindings/python/src/trainers.rs (57:67) duplicated block id: 196 size: 9 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (1262:1270) - bindings/python/src/trainers.rs (239:247) duplicated block id: 197 size: 9 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (95:104) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (118:127) duplicated block id: 198 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (85:93) - tokenizers/src/pre_tokenizers/metaspace.rs (250:258) duplicated block id: 199 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1966:1974) - tokenizers/src/tokenizer/normalizer.rs (2184:2192) duplicated block id: 200 size: 9 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (221:234) - bindings/python/src/processors.rs (182:195) duplicated block id: 201 size: 9 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (225:235) - bindings/python/src/trainers.rs (631:641) duplicated block id: 202 size: 9 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/trainer.rs (453:462) - tokenizers/src/models/bpe/trainer.rs (554:563) duplicated block id: 203 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (53:62) - tokenizers/src/tokenizer/normalizer.rs (1238:1247) duplicated block id: 204 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (238:246) - tokenizers/src/pre_tokenizers/metaspace.rs (300:308) duplicated block id: 205 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (48:60) - tokenizers/src/pre_tokenizers/whitespace.rs (39:51) duplicated block id: 206 size: 9 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (113:122) - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (95:104) duplicated block id: 207 size: 9 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (1262:1270) - bindings/python/src/trainers.rs (645:653) duplicated block id: 208 size: 9 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (1262:1270) - bindings/python/src/trainers.rs (453:461) duplicated block id: 209 size: 9 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (113:122) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (118:127) duplicated block id: 210 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1684:1692) - tokenizers/src/tokenizer/normalizer.rs (1746:1754) duplicated block id: 211 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/punctuation.rs (48:56) - tokenizers/src/pre_tokenizers/sequence.rs (61:69) duplicated block id: 212 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1746:1754) - tokenizers/src/tokenizer/normalizer.rs (1842:1850) duplicated block id: 213 size: 9 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (631:641) - bindings/python/src/trainers.rs (777:787) duplicated block id: 214 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (442:450) - tokenizers/src/pre_tokenizers/byte_level.rs (456:464) duplicated block id: 215 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1727:1735) - tokenizers/src/tokenizer/normalizer.rs (1746:1754) duplicated block id: 216 size: 9 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (95:104) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (73:82) duplicated block id: 217 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2102:2111) - tokenizers/src/tokenizer/normalizer.rs (2163:2173) duplicated block id: 218 size: 9 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (249:257) - tokenizers/src/pre_tokenizers/byte_level.rs (364:373) duplicated block id: 219 size: 9 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (132:142) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (86:96) duplicated block id: 220 size: 9 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (145:158) - bindings/python/src/pre_tokenizers.rs (221:234) duplicated block id: 221 size: 9 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (113:122) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (73:82) duplicated block id: 222 size: 9 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1905:1915) - tokenizers/src/tokenizer/normalizer.rs (2163:2173) duplicated block id: 223 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (29:36) - tokenizers/src/pre_tokenizers/bert.rs (65:72) duplicated block id: 224 size: 8 cleaned lines of code in 2 files: - bindings/node/src/models.rs (221:231) - bindings/node/src/models.rs (278:288) duplicated block id: 225 size: 8 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (106:115) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (62:71) duplicated block id: 226 size: 8 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (360:371) - bindings/python/src/utils/serde_pyo3.rs (395:404) duplicated block id: 227 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/mod.rs (230:238) - tokenizers/src/pre_tokenizers/mod.rs (250:258) duplicated block id: 228 size: 8 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (162:173) - bindings/python/src/normalizers.rs (231:242) duplicated block id: 229 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (385:393) - tokenizers/src/pre_tokenizers/byte_level.rs (407:415) duplicated block id: 230 size: 8 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (162:173) - bindings/python/src/processors.rs (201:212) duplicated block id: 231 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (363:371) - tokenizers/src/pre_tokenizers/byte_level.rs (407:415) duplicated block id: 232 size: 8 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (226:235) - bindings/python/src/trainers.rs (440:449) duplicated block id: 233 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (106:113) - tokenizers/src/pre_tokenizers/punctuation.rs (49:56) duplicated block id: 234 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1263:1271) - tokenizers/src/tokenizer/mod.rs (1308:1316) duplicated block id: 235 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1530:1539) - tokenizers/src/tokenizer/normalizer.rs (1574:1583) duplicated block id: 236 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1263:1271) - tokenizers/src/tokenizer/mod.rs (1286:1294) duplicated block id: 237 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (49:56) - tokenizers/src/pre_tokenizers/digits.rs (71:78) duplicated block id: 238 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (115:122) - tokenizers/src/processors/roberta.rs (112:119) duplicated block id: 239 size: 8 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (239:250) - bindings/python/src/processors.rs (201:212) duplicated block id: 240 size: 8 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (440:449) - bindings/python/src/trainers.rs (632:641) duplicated block id: 241 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/roberta.rs (112:119) - tokenizers/src/processors/roberta.rs (145:152) duplicated block id: 242 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (29:36) - tokenizers/src/pre_tokenizers/fixed_length.rs (86:93) duplicated block id: 243 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:79) - tokenizers/src/pre_tokenizers/metaspace.rs (300:307) duplicated block id: 244 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (65:72) - tokenizers/src/pre_tokenizers/punctuation.rs (49:56) duplicated block id: 245 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (29:36) - tokenizers/src/pre_tokenizers/fixed_length.rs (106:113) duplicated block id: 246 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1530) - tokenizers/src/tokenizer/normalizer.rs (1660:1667) duplicated block id: 247 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (66:73) - tokenizers/src/processors/bert.rs (96:103) duplicated block id: 248 size: 8 cleaned lines of code in 2 files: - bindings/python/scripts/convert.py (176:183) - bindings/python/scripts/convert.py (228:235) duplicated block id: 249 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:79) - tokenizers/src/pre_tokenizers/metaspace.rs (286:293) duplicated block id: 250 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2192:2199) - tokenizers/src/tokenizer/normalizer.rs (2267:2274) duplicated block id: 251 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1574:1583) - tokenizers/src/tokenizer/normalizer.rs (1692:1700) duplicated block id: 252 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (238:245) - tokenizers/src/pre_tokenizers/metaspace.rs (286:293) duplicated block id: 253 size: 8 cleaned lines of code in 2 files: - tokenizers/src/models/wordlevel/mod.rs (226:233) - tokenizers/src/models/wordlevel/serialization.rs (100:107) duplicated block id: 254 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1574:1583) - tokenizers/src/tokenizer/normalizer.rs (1649:1657) duplicated block id: 255 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1530) - tokenizers/src/tokenizer/normalizer.rs (1703:1710) duplicated block id: 256 size: 8 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (162:173) - bindings/python/src/pre_tokenizers.rs (239:250) duplicated block id: 257 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (184:191) - tokenizers/src/processors/roberta.rs (70:77) duplicated block id: 258 size: 8 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (231:242) - bindings/python/src/processors.rs (201:212) duplicated block id: 259 size: 8 cleaned lines of code in 2 files: - tokenizers/benches/bert_benchmark.rs (137:146) - tokenizers/benches/bpe_benchmark.rs (119:127) duplicated block id: 260 size: 8 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (334:344) - tokenizers/src/models/wordlevel/mod.rs (128:138) duplicated block id: 261 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (145:152) - tokenizers/src/processors/roberta.rs (213:220) duplicated block id: 262 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:79) - tokenizers/src/pre_tokenizers/metaspace.rs (338:345) duplicated block id: 263 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (82:89) - tokenizers/src/processors/roberta.rs (145:152) duplicated block id: 264 size: 8 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (86:95) - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (108:117) duplicated block id: 265 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2120:2127) - tokenizers/src/tokenizer/normalizer.rs (2185:2192) duplicated block id: 266 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (82:89) - tokenizers/src/processors/bert.rs (115:122) duplicated block id: 267 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1839:1846) - tokenizers/src/utils/truncation.rs (246:253) duplicated block id: 268 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (65:72) - tokenizers/src/pre_tokenizers/fixed_length.rs (86:93) duplicated block id: 269 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1574:1583) - tokenizers/src/tokenizer/normalizer.rs (1735:1744) duplicated block id: 270 size: 8 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (231:242) - bindings/python/src/pre_tokenizers.rs (239:250) duplicated block id: 271 size: 8 cleaned lines of code in 2 files: - tokenizers/src/normalizers/prepend.rs (45:52) - tokenizers/src/tokenizer/normalizer.rs (1382:1389) duplicated block id: 272 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1530) - tokenizers/src/tokenizer/normalizer.rs (1747:1754) duplicated block id: 273 size: 8 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (63:70) - tokenizers/src/normalizers/prepend.rs (38:45) duplicated block id: 274 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2185:2192) - tokenizers/src/tokenizer/normalizer.rs (2263:2270) duplicated block id: 275 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (248:255) - tokenizers/src/pre_tokenizers/byte_level.rs (274:281) duplicated block id: 276 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1543:1550) - tokenizers/src/tokenizer/normalizer.rs (1725:1732) duplicated block id: 277 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (300:307) - tokenizers/src/pre_tokenizers/metaspace.rs (338:345) duplicated block id: 278 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (2058:2065) - tokenizers/src/tokenizer/normalizer.rs (2185:2192) duplicated block id: 279 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1530) - tokenizers/src/tokenizer/normalizer.rs (1822:1829) duplicated block id: 280 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1613:1620) - tokenizers/src/utils/truncation.rs (246:253) duplicated block id: 281 size: 8 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (97:104) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (54:61) duplicated block id: 282 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (238:245) - tokenizers/src/pre_tokenizers/metaspace.rs (338:345) duplicated block id: 283 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (86:93) - tokenizers/src/pre_tokenizers/punctuation.rs (49:56) duplicated block id: 284 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (474:481) - tokenizers/src/processors/sequence.rs (104:111) duplicated block id: 285 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1996:2003) - tokenizers/src/tokenizer/normalizer.rs (2185:2192) duplicated block id: 286 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:79) - tokenizers/src/pre_tokenizers/metaspace.rs (238:245) duplicated block id: 287 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (65:72) - tokenizers/src/pre_tokenizers/fixed_length.rs (106:113) duplicated block id: 288 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (96:103) - tokenizers/src/processors/roberta.rs (96:103) duplicated block id: 289 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (571:578) - tokenizers/src/tokenizer/encoding.rs (885:892) duplicated block id: 290 size: 8 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (172:179) - tokenizers/src/processors/roberta.rs (179:186) duplicated block id: 291 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1286:1294) - tokenizers/src/tokenizer/mod.rs (1308:1316) duplicated block id: 292 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1523:1530) - tokenizers/src/tokenizer/normalizer.rs (1843:1850) duplicated block id: 293 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (65:72) - tokenizers/src/pre_tokenizers/sequence.rs (62:69) duplicated block id: 294 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (474:481) - tokenizers/src/pre_tokenizers/byte_level.rs (491:498) duplicated block id: 295 size: 8 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/trainer.rs (604:613) - tokenizers/src/models/unigram/trainer.rs (623:632) duplicated block id: 296 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (682:692) - tokenizers/src/tokenizer/encoding.rs (738:748) duplicated block id: 297 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1577:1586) - tokenizers/src/tokenizer/normalizer.rs (1608:1616) duplicated block id: 298 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (86:93) - tokenizers/src/pre_tokenizers/sequence.rs (62:69) duplicated block id: 299 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1543:1550) - tokenizers/src/tokenizer/normalizer.rs (1819:1826) duplicated block id: 300 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (106:113) - tokenizers/src/pre_tokenizers/sequence.rs (62:69) duplicated block id: 301 size: 8 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (440:449) - bindings/python/src/trainers.rs (778:787) duplicated block id: 302 size: 8 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (286:293) - tokenizers/src/pre_tokenizers/metaspace.rs (300:307) duplicated block id: 303 size: 8 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/tools/visualizer.py (269:277) - bindings/python/py_src/tokenizers/tools/visualizer.py (295:303) duplicated block id: 304 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1543:1550) - tokenizers/src/tokenizer/normalizer.rs (1840:1847) duplicated block id: 305 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1494:1504) - tokenizers/src/tokenizer/mod.rs (1509:1519) duplicated block id: 306 size: 8 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (392:399) - tokenizers/src/tokenizer/normalizer.rs (643:650) duplicated block id: 307 size: 8 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (617:627) - bindings/python/src/processors.rs (769:779) duplicated block id: 308 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:320) - tokenizers/src/pre_tokenizers/digits.rs (73:79) duplicated block id: 309 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:320) - tokenizers/src/pre_tokenizers/metaspace.rs (339:345) duplicated block id: 310 size: 7 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (1231:1238) - bindings/python/src/tokenizer.rs (1268:1275) duplicated block id: 311 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (766:774) - tokenizers/src/models/unigram/trainer.rs (782:790) duplicated block id: 312 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (744:752) - tokenizers/src/models/unigram/trainer.rs (782:790) duplicated block id: 313 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (111:119) - bindings/python/src/tokenizer.rs (496:504) duplicated block id: 314 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (134:142) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (63:71) duplicated block id: 315 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) - tokenizers/src/pre_tokenizers/punctuation.rs (50:56) duplicated block id: 316 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (251:257) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (120:126) duplicated block id: 317 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (249:255) - tokenizers/src/pre_tokenizers/byte_level.rs (408:415) duplicated block id: 318 size: 7 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (69:75) - tokenizers/src/tokenizer/normalizer.rs (1613:1619) duplicated block id: 319 size: 7 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (37:45) - bindings/python/src/tokenizer.rs (496:504) duplicated block id: 320 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/added_vocabulary.rs (960:967) - tokenizers/src/tokenizer/added_vocabulary.rs (982:988) duplicated block id: 321 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1639:1645) - tokenizers/src/utils/truncation.rs (247:253) duplicated block id: 322 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (34:80) - bindings/python/py_src/tokenizers/models/__init__.pyi (545:591) duplicated block id: 323 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (238:244) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 324 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (642:651) - tokenizers/src/tokenizer/encoding.rs (782:791) duplicated block id: 325 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (238:244) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 326 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:72) - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) duplicated block id: 327 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:320) - tokenizers/src/pre_tokenizers/metaspace.rs (301:307) duplicated block id: 328 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:320) - tokenizers/src/pre_tokenizers/metaspace.rs (287:293) duplicated block id: 329 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1693:1700) - tokenizers/src/tokenizer/normalizer.rs (1832:1839) duplicated block id: 330 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/added_vocabulary.rs (767:773) - tokenizers/src/tokenizer/added_vocabulary.rs (815:821) duplicated block id: 331 size: 7 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (67:73) - tokenizers/src/processors/roberta.rs (127:133) duplicated block id: 332 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (34:80) - bindings/python/py_src/tokenizers/models/__init__.pyi (205:251) duplicated block id: 333 size: 7 cleaned lines of code in 2 files: - bindings/python/src/models.rs (118:126) - bindings/python/src/processors.rs (117:125) duplicated block id: 334 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/punctuation.rs (49:55) - tokenizers/src/pre_tokenizers/whitespace.rs (94:100) duplicated block id: 335 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (106:112) - tokenizers/src/pre_tokenizers/whitespace.rs (70:76) duplicated block id: 336 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:36) - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) duplicated block id: 337 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/tools/visualizer.py (295:301) - bindings/python/py_src/tokenizers/tools/visualizer.py (306:312) duplicated block id: 338 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (70:76) - tokenizers/src/pre_tokenizers/punctuation.rs (49:55) duplicated block id: 339 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (34:80) - bindings/python/py_src/tokenizers/models/__init__.pyi (415:461) duplicated block id: 340 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (106:112) - tokenizers/src/pre_tokenizers/whitespace.rs (94:100) duplicated block id: 341 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (87:93) - tokenizers/src/pre_tokenizers/sequence.rs (63:69) duplicated block id: 342 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/added_vocabulary.rs (924:932) - tokenizers/src/tokenizer/added_vocabulary.rs (986:992) duplicated block id: 343 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (251:257) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (97:103) duplicated block id: 344 size: 7 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (37:45) - bindings/python/src/models.rs (118:126) duplicated block id: 345 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/wordlevel/trainer.rs (152:158) - tokenizers/src/models/wordlevel/trainer.rs (168:174) duplicated block id: 346 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (744:752) - tokenizers/src/models/unigram/trainer.rs (802:810) duplicated block id: 347 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1870:1876) - tokenizers/src/tokenizer/normalizer.rs (2247:2253) duplicated block id: 348 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (300:306) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 349 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (111:119) - bindings/python/src/processors.rs (117:125) duplicated block id: 350 size: 7 cleaned lines of code in 2 files: - tokenizers/src/processors/sequence.rs (85:91) - tokenizers/src/processors/sequence.rs (137:143) duplicated block id: 351 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/added_vocabulary.rs (658:664) - tokenizers/src/tokenizer/added_vocabulary.rs (699:705) duplicated block id: 352 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (300:306) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 353 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/punctuation.rs (49:55) - tokenizers/src/pre_tokenizers/whitespace.rs (70:76) duplicated block id: 354 size: 7 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (154:162) - bindings/python/src/tokenizer.rs (496:504) duplicated block id: 355 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:72) - tokenizers/src/pre_tokenizers/digits.rs (87:93) duplicated block id: 356 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (87:93) - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) duplicated block id: 357 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (205:251) - bindings/python/py_src/tokenizers/models/__init__.pyi (415:461) duplicated block id: 358 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:78) - tokenizers/src/pre_tokenizers/metaspace.rs (324:330) duplicated block id: 359 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (58:64) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (97:103) duplicated block id: 360 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (205:251) - bindings/python/py_src/tokenizers/models/__init__.pyi (545:591) duplicated block id: 361 size: 7 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (154:162) - bindings/python/src/processors.rs (117:125) duplicated block id: 362 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/wordlevel/mod.rs (180:192) - tokenizers/src/tokenizer/added_vocabulary.rs (613:622) duplicated block id: 363 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1682:1688) - tokenizers/src/utils/truncation.rs (247:253) duplicated block id: 364 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (97:103) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (120:126) duplicated block id: 365 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1543:1549) - tokenizers/src/tokenizer/normalizer.rs (1614:1620) duplicated block id: 366 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (29:35) - tokenizers/src/pre_tokenizers/fixed_length.rs (70:76) duplicated block id: 367 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (205:251) - bindings/python/py_src/tokenizers/models/__init__.pyi (287:333) duplicated block id: 368 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1833:1840) - tokenizers/src/tokenizer/normalizer.rs (1878:1885) duplicated block id: 369 size: 7 cleaned lines of code in 2 files: - bindings/python/src/models.rs (118:126) - bindings/python/src/tokenizer.rs (496:504) duplicated block id: 370 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (73:80) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (86:94) duplicated block id: 371 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) - tokenizers/src/pre_tokenizers/sequence.rs (63:69) duplicated block id: 372 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (223:229) - tokenizers/src/pre_tokenizers/metaspace.rs (251:257) duplicated block id: 373 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) - tokenizers/src/pre_tokenizers/sequence.rs (63:69) duplicated block id: 374 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (475:481) - tokenizers/src/processors/sequence.rs (137:143) duplicated block id: 375 size: 7 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (46:55) - tokenizers/src/processors/roberta.rs (61:70) duplicated block id: 376 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/wordlevel/mod.rs (52:64) - tokenizers/src/models/wordpiece/mod.rs (58:70) duplicated block id: 377 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) duplicated block id: 378 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:72) - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) duplicated block id: 379 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (86:92) - tokenizers/src/pre_tokenizers/whitespace.rs (70:76) duplicated block id: 380 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:72) - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) duplicated block id: 381 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/trainer.rs (605:613) - tokenizers/src/models/wordlevel/trainer.rs (93:101) duplicated block id: 382 size: 7 cleaned lines of code in 2 files: - tokenizers/benches/common/mod.rs (41:48) - tokenizers/benches/common/mod.rs (68:75) duplicated block id: 383 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) - tokenizers/src/pre_tokenizers/digits.rs (87:93) duplicated block id: 384 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 385 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (603:609) - tokenizers/src/models/bpe/model.rs (637:643) duplicated block id: 386 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/pattern.rs (72:81) - tokenizers/src/utils/fancy.rs (52:61) duplicated block id: 387 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (324:330) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 388 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) - tokenizers/src/pre_tokenizers/digits.rs (87:93) duplicated block id: 389 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (111:119) - bindings/python/src/trainers.rs (57:65) duplicated block id: 390 size: 7 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (211:222) - bindings/python/src/processors.rs (182:193) duplicated block id: 391 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) duplicated block id: 392 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (86:92) - tokenizers/src/pre_tokenizers/whitespace.rs (94:100) duplicated block id: 393 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (324:330) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 394 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:36) - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) duplicated block id: 395 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) - tokenizers/src/pre_tokenizers/metaspace.rs (300:306) duplicated block id: 396 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (70:76) - tokenizers/src/pre_tokenizers/fixed_length.rs (86:92) duplicated block id: 397 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (70:76) - tokenizers/src/pre_tokenizers/fixed_length.rs (106:112) duplicated block id: 398 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) - tokenizers/src/pre_tokenizers/metaspace.rs (286:292) duplicated block id: 399 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (492:498) - tokenizers/src/pre_tokenizers/byte_level.rs (521:527) duplicated block id: 400 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (29:35) - tokenizers/src/pre_tokenizers/whitespace.rs (70:76) duplicated block id: 401 size: 7 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (132:138) - tokenizers/src/normalizers/strip.rs (140:146) duplicated block id: 402 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) - tokenizers/src/pre_tokenizers/sequence.rs (63:69) duplicated block id: 403 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (286:292) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 404 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (29:35) - tokenizers/src/pre_tokenizers/whitespace.rs (94:100) duplicated block id: 405 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (47:55) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (62:70) duplicated block id: 406 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (111:119) - bindings/python/src/pre_tokenizers.rs (154:162) duplicated block id: 407 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (58:64) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (120:126) duplicated block id: 408 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (286:292) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 409 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) - tokenizers/src/pre_tokenizers/punctuation.rs (50:56) duplicated block id: 410 size: 7 cleaned lines of code in 2 files: - bindings/python/src/processors.rs (117:125) - bindings/python/src/tokenizer.rs (496:504) duplicated block id: 411 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 412 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 413 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:36) - tokenizers/src/pre_tokenizers/digits.rs (87:93) duplicated block id: 414 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (65:71) - tokenizers/src/pre_tokenizers/fixed_length.rs (70:76) duplicated block id: 415 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (65:71) - tokenizers/src/pre_tokenizers/whitespace.rs (70:76) duplicated block id: 416 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (107:113) - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) duplicated block id: 417 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (145:156) - bindings/python/src/normalizers.rs (211:222) duplicated block id: 418 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (86:92) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (97:103) duplicated block id: 419 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (107:115) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (88:96) duplicated block id: 420 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (65:71) - tokenizers/src/pre_tokenizers/whitespace.rs (94:100) duplicated block id: 421 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (86:92) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (120:126) duplicated block id: 422 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1114:1120) - tokenizers/src/tokenizer/normalizer.rs (1128:1134) duplicated block id: 423 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (86:92) - tokenizers/src/pre_tokenizers/metaspace.rs (223:229) duplicated block id: 424 size: 7 cleaned lines of code in 2 files: - bindings/python/src/processors.rs (349:357) - bindings/python/src/processors.rs (430:438) duplicated block id: 425 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/metaspace.rs (324:330) duplicated block id: 426 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1736:1744) - tokenizers/src/tokenizer/normalizer.rs (1832:1839) duplicated block id: 427 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) - tokenizers/src/pre_tokenizers/metaspace.rs (324:330) duplicated block id: 428 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/metaspace.rs (338:344) duplicated block id: 429 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (300:306) - tokenizers/src/pre_tokenizers/metaspace.rs (324:330) duplicated block id: 430 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) - tokenizers/src/pre_tokenizers/metaspace.rs (338:344) duplicated block id: 431 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (275:281) - tokenizers/src/pre_tokenizers/byte_level.rs (408:415) duplicated block id: 432 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/metaspace.rs (286:292) duplicated block id: 433 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (144:150) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (97:103) duplicated block id: 434 size: 7 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (37:45) - bindings/python/src/trainers.rs (57:65) duplicated block id: 435 size: 7 cleaned lines of code in 2 files: - tokenizers/src/utils/padding.rs (97:103) - tokenizers/src/utils/padding.rs (108:114) duplicated block id: 436 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1482:1488) - tokenizers/src/tokenizer/mod.rs (1498:1504) duplicated block id: 437 size: 7 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (496:504) - bindings/python/src/trainers.rs (57:65) duplicated block id: 438 size: 7 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (154:162) - bindings/python/src/processors.rs (117:125) duplicated block id: 439 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (111:119) - bindings/python/src/models.rs (118:126) duplicated block id: 440 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1788:1795) - tokenizers/src/tokenizer/normalizer.rs (1878:1885) duplicated block id: 441 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1482:1488) - tokenizers/src/tokenizer/mod.rs (1513:1519) duplicated block id: 442 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/metaspace.rs (300:306) duplicated block id: 443 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1614:1620) - tokenizers/src/tokenizer/normalizer.rs (1819:1825) duplicated block id: 444 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (118:125) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (86:94) duplicated block id: 445 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1725:1731) - tokenizers/src/utils/truncation.rs (247:253) duplicated block id: 446 size: 7 cleaned lines of code in 2 files: - bindings/python/src/models.rs (118:126) - bindings/python/src/normalizers.rs (154:162) duplicated block id: 447 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (238:244) - tokenizers/src/pre_tokenizers/metaspace.rs (324:330) duplicated block id: 448 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (58:64) - tokenizers/src/pre_tokenizers/digits.rs (86:92) duplicated block id: 449 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1531:1539) - tokenizers/src/tokenizer/normalizer.rs (1832:1839) duplicated block id: 450 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (338:344) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 451 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (70:76) - tokenizers/src/pre_tokenizers/sequence.rs (62:68) duplicated block id: 452 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:78) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 453 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:36) - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) duplicated block id: 454 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (50:56) - tokenizers/src/pre_tokenizers/metaspace.rs (238:244) duplicated block id: 455 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) - tokenizers/src/pre_tokenizers/sequence.rs (63:69) duplicated block id: 456 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (287:333) - bindings/python/py_src/tokenizers/models/__init__.pyi (545:591) duplicated block id: 457 size: 7 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (69:75) - tokenizers/src/tokenizer/normalizer.rs (1839:1845) duplicated block id: 458 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:78) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 459 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) - tokenizers/src/pre_tokenizers/punctuation.rs (50:56) duplicated block id: 460 size: 7 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (69:75) - tokenizers/src/utils/truncation.rs (246:252) duplicated block id: 461 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:36) - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) duplicated block id: 462 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (111:119) - bindings/python/src/encoding.rs (37:45) duplicated block id: 463 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (287:333) - bindings/python/py_src/tokenizers/models/__init__.pyi (415:461) duplicated block id: 464 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (113:120) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (86:94) duplicated block id: 465 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) - tokenizers/src/pre_tokenizers/punctuation.rs (50:56) duplicated block id: 466 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (624:632) - tokenizers/src/models/wordlevel/trainer.rs (93:101) duplicated block id: 467 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1819:1825) - tokenizers/src/utils/truncation.rs (247:253) duplicated block id: 468 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1772:1778) - tokenizers/src/tokenizer/normalizer.rs (1818:1824) duplicated block id: 469 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (13:25) - tokenizers/src/tokenizer/encoding.rs (35:41) duplicated block id: 470 size: 7 cleaned lines of code in 2 files: - bindings/python/src/models.rs (118:126) - bindings/python/src/pre_tokenizers.rs (154:162) duplicated block id: 471 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/lattice.rs (336:342) - tokenizers/src/models/unigram/lattice.rs (386:392) duplicated block id: 472 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1893:1899) - tokenizers/src/tokenizer/normalizer.rs (2081:2087) duplicated block id: 473 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (72:78) - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) duplicated block id: 474 size: 7 cleaned lines of code in 2 files: - bindings/python/src/processors.rs (117:125) - bindings/python/src/trainers.rs (57:65) duplicated block id: 475 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (338:344) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 476 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) duplicated block id: 477 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1576:1584) - tokenizers/src/tokenizer/normalizer.rs (1878:1885) duplicated block id: 478 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (492:498) - tokenizers/src/processors/sequence.rs (137:143) duplicated block id: 479 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1650:1657) - tokenizers/src/tokenizer/normalizer.rs (1832:1839) duplicated block id: 480 size: 7 cleaned lines of code in 2 files: - tokenizers/src/processors/roberta.rs (97:103) - tokenizers/src/processors/roberta.rs (127:133) duplicated block id: 481 size: 7 cleaned lines of code in 2 files: - tokenizers/src/processors/sequence.rs (105:111) - tokenizers/src/processors/sequence.rs (137:143) duplicated block id: 482 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (275:281) - tokenizers/src/pre_tokenizers/byte_level.rs (386:393) duplicated block id: 483 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (223:229) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (97:103) duplicated block id: 484 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (87:93) - tokenizers/src/pre_tokenizers/fixed_length.rs (87:93) duplicated block id: 485 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (87:93) - tokenizers/src/pre_tokenizers/fixed_length.rs (107:113) duplicated block id: 486 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) duplicated block id: 487 size: 7 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (154:162) - bindings/python/src/tokenizer.rs (496:504) duplicated block id: 488 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (275:281) - tokenizers/src/pre_tokenizers/byte_level.rs (364:371) duplicated block id: 489 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/sequence.rs (62:68) - tokenizers/src/pre_tokenizers/whitespace.rs (94:100) duplicated block id: 490 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (832:839) - tokenizers/src/models/bpe/model.rs (863:870) duplicated block id: 491 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1893:1899) - tokenizers/src/tokenizer/normalizer.rs (2019:2025) duplicated block id: 492 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:320) - tokenizers/src/pre_tokenizers/metaspace.rs (239:245) duplicated block id: 493 size: 7 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (37:45) - bindings/python/src/pre_tokenizers.rs (154:162) duplicated block id: 494 size: 7 cleaned lines of code in 2 files: - bindings/python/src/models.rs (118:126) - bindings/python/src/trainers.rs (57:65) duplicated block id: 495 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1614:1620) - tokenizers/src/tokenizer/normalizer.rs (1682:1688) duplicated block id: 496 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) - tokenizers/src/pre_tokenizers/fixed_length.rs (107:113) duplicated block id: 497 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) - tokenizers/src/pre_tokenizers/digits.rs (87:93) duplicated block id: 498 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1544:1550) - tokenizers/src/tokenizer/normalizer.rs (1701:1707) duplicated block id: 499 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) - tokenizers/src/pre_tokenizers/fixed_length.rs (87:93) duplicated block id: 500 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (223:229) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (120:126) duplicated block id: 501 size: 7 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (782:790) - tokenizers/src/models/unigram/trainer.rs (802:810) duplicated block id: 502 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1614:1620) - tokenizers/src/tokenizer/normalizer.rs (1725:1731) duplicated block id: 503 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:256) - tokenizers/src/pre_tokenizers/fixed_length.rs (107:113) duplicated block id: 504 size: 7 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (154:162) - bindings/python/src/pre_tokenizers.rs (154:162) duplicated block id: 505 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (475:481) - tokenizers/src/pre_tokenizers/byte_level.rs (521:527) duplicated block id: 506 size: 7 cleaned lines of code in 2 files: - tokenizers/src/processors/roberta.rs (162:168) - tokenizers/src/processors/roberta.rs (193:199) duplicated block id: 507 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:394) - tokenizers/src/pre_tokenizers/fixed_length.rs (87:93) duplicated block id: 508 size: 7 cleaned lines of code in 2 files: - bindings/python/src/processors.rs (366:374) - bindings/python/src/processors.rs (447:455) duplicated block id: 509 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (89:95) duplicated block id: 510 size: 7 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (111:119) - bindings/python/src/normalizers.rs (154:162) duplicated block id: 511 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) - tokenizers/src/pre_tokenizers/fixed_length.rs (107:113) duplicated block id: 512 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:72) - tokenizers/src/pre_tokenizers/metaspace.rs (252:258) duplicated block id: 513 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (215:221) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (112:118) duplicated block id: 514 size: 7 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (154:162) - bindings/python/src/trainers.rs (57:65) duplicated block id: 515 size: 7 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (563:571) - bindings/python/src/utils/serde_pyo3.rs (594:602) duplicated block id: 516 size: 7 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (37:45) - bindings/python/src/processors.rs (117:125) duplicated block id: 517 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1614:1620) - tokenizers/src/tokenizer/normalizer.rs (1639:1645) duplicated block id: 518 size: 7 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (37:45) - bindings/python/src/normalizers.rs (154:162) duplicated block id: 519 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/tools/visualizer.py (269:276) - bindings/python/py_src/tokenizers/tools/visualizer.py (306:312) duplicated block id: 520 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (87:93) - tokenizers/src/pre_tokenizers/punctuation.rs (50:56) duplicated block id: 521 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1544:1550) - tokenizers/src/tokenizer/normalizer.rs (1658:1664) duplicated block id: 522 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:372) - tokenizers/src/pre_tokenizers/fixed_length.rs (87:93) duplicated block id: 523 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (95:102) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (86:94) duplicated block id: 524 size: 7 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (70:76) - tokenizers/src/tokenizer/normalizer.rs (1773:1779) duplicated block id: 525 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (521:527) - tokenizers/src/processors/sequence.rs (85:91) duplicated block id: 526 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (58:64) - tokenizers/src/pre_tokenizers/metaspace.rs (251:257) duplicated block id: 527 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (27:33) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (36:42) duplicated block id: 528 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (58:64) - tokenizers/src/pre_tokenizers/metaspace.rs (223:229) duplicated block id: 529 size: 7 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1543:1549) - tokenizers/src/utils/truncation.rs (247:253) duplicated block id: 530 size: 7 cleaned lines of code in 2 files: - tokenizers/src/processors/mod.rs (94:100) - tokenizers/src/processors/mod.rs (119:125) duplicated block id: 531 size: 7 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (100:106) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (126:132) duplicated block id: 532 size: 7 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (211:222) - bindings/python/src/pre_tokenizers.rs (221:232) duplicated block id: 533 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/sequence.rs (62:68) - tokenizers/src/pre_tokenizers/whitespace.rs (70:76) duplicated block id: 534 size: 7 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (521:527) - tokenizers/src/processors/sequence.rs (105:111) duplicated block id: 535 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) - tokenizers/src/pre_tokenizers/metaspace.rs (252:257) duplicated block id: 536 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1576:1583) - tokenizers/src/tokenizer/normalizer.rs (2256:2262) duplicated block id: 537 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:319) - tokenizers/src/pre_tokenizers/metaspace.rs (325:330) duplicated block id: 538 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) - tokenizers/src/pre_tokenizers/sequence.rs (63:68) duplicated block id: 539 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1694:1700) - tokenizers/src/tokenizer/normalizer.rs (2074:2080) duplicated block id: 540 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/sequence.rs (63:68) duplicated block id: 541 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:319) - tokenizers/src/pre_tokenizers/digits.rs (51:56) duplicated block id: 542 size: 6 cleaned lines of code in 2 files: - bindings/python/src/processors.rs (182:189) - bindings/python/src/tokenizer.rs (648:655) duplicated block id: 543 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (145:150) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (138:144) duplicated block id: 544 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 545 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1737:1744) - tokenizers/src/tokenizer/normalizer.rs (1878:1884) duplicated block id: 546 size: 6 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (145:152) - bindings/python/src/trainers.rs (79:86) duplicated block id: 547 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/sequence.rs (104:109) - tokenizers/src/processors/sequence.rs (131:136) duplicated block id: 548 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (237:244) - bindings/python/src/trainers.rs (79:86) duplicated block id: 549 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1576:1583) - tokenizers/src/tokenizer/normalizer.rs (2204:2210) duplicated block id: 550 size: 6 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (165:173) - bindings/python/src/models.rs (333:341) duplicated block id: 551 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (69:74) - tokenizers/src/normalizers/byte_level.rs (137:142) duplicated block id: 552 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (299:304) - tokenizers/src/tokenizer/mod.rs (312:317) duplicated block id: 553 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 554 size: 6 cleaned lines of code in 2 files: - bindings/node/src/models.rs (57:63) - bindings/node/src/models.rs (221:227) duplicated block id: 555 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1833:1839) - tokenizers/src/tokenizer/normalizer.rs (2256:2262) duplicated block id: 556 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:35) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 557 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (131:136) - tokenizers/src/normalizers/prepend.rs (53:58) duplicated block id: 558 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1737:1744) - tokenizers/src/tokenizer/normalizer.rs (1932:1938) duplicated block id: 559 size: 6 cleaned lines of code in 2 files: - tokenizers/benches/common/mod.rs (15:20) - tokenizers/src/tokenizer/mod.rs (530:535) duplicated block id: 560 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (718:725) - tokenizers/src/tokenizer/normalizer.rs (756:763) duplicated block id: 561 size: 6 cleaned lines of code in 2 files: - bindings/node/src/models.rs (105:110) - bindings/node/src/models.rs (135:140) duplicated block id: 562 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1662:1667) - tokenizers/src/tokenizer/normalizer.rs (1781:1786) duplicated block id: 563 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1788:1794) - tokenizers/src/tokenizer/normalizer.rs (2074:2080) duplicated block id: 564 size: 6 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (242:250) - bindings/python/src/trainers.rs (130:138) duplicated block id: 565 size: 6 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (546:554) - bindings/python/src/utils/serde_pyo3.rs (577:585) duplicated block id: 566 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1833:1839) - tokenizers/src/tokenizer/normalizer.rs (2204:2210) duplicated block id: 567 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1922:1927) - tokenizers/src/tokenizer/normalizer.rs (1966:1971) duplicated block id: 568 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (237:244) - bindings/python/src/normalizers.rs (211:218) duplicated block id: 569 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1651:1657) - tokenizers/src/tokenizer/normalizer.rs (1878:1884) duplicated block id: 570 size: 6 cleaned lines of code in 2 files: - bindings/node/src/tasks/tokenizer.rs (27:34) - bindings/node/src/tasks/tokenizer.rs (84:91) duplicated block id: 571 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (299:304) - tokenizers/src/tokenizer/mod.rs (1049:1054) duplicated block id: 572 size: 6 cleaned lines of code in 2 files: - bindings/python/src/trainers.rs (282:289) - bindings/python/src/trainers.rs (814:821) duplicated block id: 573 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/serialization.rs (51:56) - tokenizers/src/tokenizer/serialization.rs (95:100) duplicated block id: 574 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/punctuation.rs (50:55) duplicated block id: 575 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (98:103) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (138:144) duplicated block id: 576 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1788:1794) - tokenizers/src/tokenizer/normalizer.rs (2136:2142) duplicated block id: 577 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/metaspace.rs (239:244) duplicated block id: 578 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 579 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 580 size: 6 cleaned lines of code in 2 files: - bindings/node/src/decoders.rs (1:7) - bindings/node/src/processors.rs (1:6) duplicated block id: 581 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 582 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1658:1663) - tokenizers/src/utils/truncation.rs (248:253) duplicated block id: 583 size: 6 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (485:493) - bindings/python/src/utils/serde_pyo3.rs (579:587) duplicated block id: 584 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:71) - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) duplicated block id: 585 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 586 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 587 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (117:122) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (98:103) duplicated block id: 588 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1532:1539) - tokenizers/src/tokenizer/normalizer.rs (1932:1938) duplicated block id: 589 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/sequence.rs (63:68) duplicated block id: 590 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:319) - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) duplicated block id: 591 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (137:142) - tokenizers/src/tokenizer/normalizer.rs (1839:1844) duplicated block id: 592 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (299:304) - tokenizers/src/tokenizer/mod.rs (1088:1093) duplicated block id: 593 size: 6 cleaned lines of code in 2 files: - tokenizers/src/utils/padding.rs (108:113) - tokenizers/src/utils/truncation.rs (180:185) duplicated block id: 594 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1532:1539) - tokenizers/src/tokenizer/normalizer.rs (1878:1884) duplicated block id: 595 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 596 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/prepend.rs (38:43) - tokenizers/src/normalizers/strip.rs (133:138) duplicated block id: 597 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/digits.rs (87:92) duplicated block id: 598 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (638:643) - tokenizers/src/models/bpe/model.rs (901:906) duplicated block id: 599 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1694:1700) - tokenizers/src/tokenizer/normalizer.rs (2136:2142) duplicated block id: 600 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (312:317) - tokenizers/src/tokenizer/mod.rs (1049:1054) duplicated block id: 601 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:255) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 602 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (523:532) - tokenizers/src/models/wordpiece/mod.rs (268:277) duplicated block id: 603 size: 6 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (145:152) - bindings/python/src/tokenizer.rs (648:655) duplicated block id: 604 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1701:1706) - tokenizers/src/utils/truncation.rs (248:253) duplicated block id: 605 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (638:643) - tokenizers/src/models/bpe/model.rs (881:886) duplicated block id: 606 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 607 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:255) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 608 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (530:535) - tokenizers/src/tokenizer/mod.rs (1088:1093) duplicated block id: 609 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1781:1786) - tokenizers/src/tokenizer/normalizer.rs (1824:1829) duplicated block id: 610 size: 6 cleaned lines of code in 2 files: - tokenizers/src/decoders/byte_fallback.rs (38:43) - tokenizers/src/decoders/byte_fallback.rs (51:56) duplicated block id: 611 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (53:91) - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (651:689) duplicated block id: 612 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 613 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (53:91) - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (607:645) duplicated block id: 614 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/metaspace.rs (216:221) duplicated block id: 615 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (53:91) - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (536:574) duplicated block id: 616 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1576:1583) - tokenizers/src/tokenizer/normalizer.rs (2136:2142) duplicated block id: 617 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/digits.rs (59:64) duplicated block id: 618 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/wordlevel/serialization.rs (52:57) - tokenizers/src/models/wordpiece/serialization.rs (67:73) duplicated block id: 619 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/wordlevel/mod.rs (83:88) - tokenizers/src/models/wordpiece/mod.rs (103:108) duplicated block id: 620 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/encoding.rs (575:580) - tokenizers/src/tokenizer/encoding.rs (585:590) duplicated block id: 621 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (147:182) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (188:223) duplicated block id: 622 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 623 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1794:1799) - tokenizers/src/utils/truncation.rs (246:251) duplicated block id: 624 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (237:244) - bindings/python/src/pre_tokenizers.rs (221:228) duplicated block id: 625 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1833:1839) - tokenizers/src/tokenizer/normalizer.rs (1932:1938) duplicated block id: 626 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (147:182) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (229:264) duplicated block id: 627 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (137:142) - tokenizers/src/utils/truncation.rs (246:251) duplicated block id: 628 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (87:92) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 629 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (147:182) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (270:305) duplicated block id: 630 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1694:1700) - tokenizers/src/tokenizer/normalizer.rs (2204:2210) duplicated block id: 631 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (147:182) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (311:346) duplicated block id: 632 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:71) - tokenizers/src/pre_tokenizers/digits.rs (59:64) duplicated block id: 633 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (59:64) - tokenizers/src/pre_tokenizers/sequence.rs (63:68) duplicated block id: 634 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (147:182) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (352:387) duplicated block id: 635 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (87:92) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 636 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1737:1744) - tokenizers/src/tokenizer/normalizer.rs (2256:2262) duplicated block id: 637 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (87:92) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 638 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/word.rs (279:284) - tokenizers/src/models/bpe/word.rs (323:328) duplicated block id: 639 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (117:122) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (145:150) duplicated block id: 640 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1725:1730) - tokenizers/src/tokenizer/normalizer.rs (1773:1778) duplicated block id: 641 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/strip.rs (107:114) - tokenizers/src/normalizers/strip.rs (119:126) duplicated block id: 642 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/digits.rs (51:56) duplicated block id: 643 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (311:346) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (352:387) duplicated block id: 644 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1781:1786) - tokenizers/src/tokenizer/normalizer.rs (1845:1850) duplicated block id: 645 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (556:561) - tokenizers/src/models/bpe/serialization.rs (27:32) duplicated block id: 646 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/punctuation.rs (50:55) duplicated block id: 647 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (59:64) - tokenizers/src/pre_tokenizers/fixed_length.rs (107:112) duplicated block id: 648 size: 6 cleaned lines of code in 2 files: - tokenizers/benches/bpe_benchmark.rs (36:42) - tokenizers/benches/llama3_benchmark.rs (22:28) duplicated block id: 649 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1675:1680) - tokenizers/src/tokenizer/normalizer.rs (1718:1723) duplicated block id: 650 size: 6 cleaned lines of code in 2 files: - tokenizers/src/utils/padding.rs (95:100) - tokenizers/src/utils/padding.rs (106:111) duplicated block id: 651 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (57:62) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (81:86) duplicated block id: 652 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 653 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (106:141) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (601:636) duplicated block id: 654 size: 6 cleaned lines of code in 2 files: - tokenizers/src/utils/padding.rs (95:100) - tokenizers/src/utils/truncation.rs (177:182) duplicated block id: 655 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/roberta.rs (266:271) - tokenizers/src/processors/template.rs (961:966) duplicated block id: 656 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1788:1794) - tokenizers/src/tokenizer/normalizer.rs (2204:2210) duplicated block id: 657 size: 6 cleaned lines of code in 2 files: - bindings/node/src/tasks/tokenizer.rs (18:23) - bindings/node/src/tasks/tokenizer.rs (75:80) duplicated block id: 658 size: 6 cleaned lines of code in 2 files: - tokenizers/benches/common/mod.rs (40:45) - tokenizers/src/tokenizer/mod.rs (1254:1259) duplicated block id: 659 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (188:223) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (601:636) duplicated block id: 660 size: 6 cleaned lines of code in 2 files: - bindings/node/src/tasks/tokenizer.rs (18:23) - bindings/node/src/tasks/tokenizer.rs (51:56) duplicated block id: 661 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (106:141) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (311:346) duplicated block id: 662 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (106:141) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (352:387) duplicated block id: 663 size: 6 cleaned lines of code in 2 files: - bindings/python/src/encoding.rs (400:405) - bindings/python/src/tokenizer.rs (844:849) duplicated block id: 664 size: 6 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (457:464) - bindings/python/src/pre_tokenizers.rs (642:649) duplicated block id: 665 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/fixed_length.rs (87:92) duplicated block id: 666 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (157:162) - tokenizers/src/models/wordlevel/mod.rs (83:88) duplicated block id: 667 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (106:141) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (270:305) duplicated block id: 668 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1651:1657) - tokenizers/src/tokenizer/normalizer.rs (2012:2018) duplicated block id: 669 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (106:141) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (188:223) duplicated block id: 670 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 671 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1788:1794) - tokenizers/src/tokenizer/normalizer.rs (2256:2262) duplicated block id: 672 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (106:141) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (229:264) duplicated block id: 673 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (147:182) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (601:636) duplicated block id: 674 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/fixed_length.rs (107:112) duplicated block id: 675 size: 6 cleaned lines of code in 2 files: - bindings/node/src/models.rs (57:63) - bindings/node/src/models.rs (278:284) duplicated block id: 676 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (106:141) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (147:182) duplicated block id: 677 size: 6 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (211:218) - bindings/python/src/tokenizer.rs (648:655) duplicated block id: 678 size: 6 cleaned lines of code in 2 files: - bindings/node/src/encoding.rs (247:252) - bindings/node/src/encoding.rs (265:270) duplicated block id: 679 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:393) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 680 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/punctuation.rs (50:55) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 681 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (146:151) - bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py (98:103) duplicated block id: 682 size: 6 cleaned lines of code in 2 files: - bindings/python/src/tokenizer.rs (648:655) - bindings/python/src/trainers.rs (79:86) duplicated block id: 683 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (59:64) - tokenizers/src/pre_tokenizers/punctuation.rs (50:55) duplicated block id: 684 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1651:1657) - tokenizers/src/tokenizer/normalizer.rs (1932:1938) duplicated block id: 685 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:35) - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) duplicated block id: 686 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) duplicated block id: 687 size: 6 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (385:392) - bindings/python/src/pre_tokenizers.rs (719:726) duplicated block id: 688 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/punctuation.rs (50:55) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 689 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:393) - tokenizers/src/pre_tokenizers/digits.rs (59:64) duplicated block id: 690 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (604:609) - tokenizers/src/models/bpe/model.rs (901:906) duplicated block id: 691 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (188:223) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (229:264) duplicated block id: 692 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:371) - tokenizers/src/pre_tokenizers/digits.rs (59:64) duplicated block id: 693 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (604:609) - tokenizers/src/models/bpe/model.rs (881:886) duplicated block id: 694 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (70:75) - tokenizers/src/tokenizer/normalizer.rs (1543:1548) duplicated block id: 695 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (270:305) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (352:387) duplicated block id: 696 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (218:223) - tokenizers/src/processors/template.rs (961:966) duplicated block id: 697 size: 6 cleaned lines of code in 2 files: - tokenizers/src/utils/padding.rs (97:102) - tokenizers/src/utils/truncation.rs (180:185) duplicated block id: 698 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (270:305) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (311:346) duplicated block id: 699 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (188:223) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (352:387) duplicated block id: 700 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (333:341) - bindings/python/src/processors.rs (204:212) duplicated block id: 701 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (188:223) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (270:305) duplicated block id: 702 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (188:223) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (311:346) duplicated block id: 703 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (59:64) - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) duplicated block id: 704 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (90:95) duplicated block id: 705 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1532:1539) - tokenizers/src/tokenizer/normalizer.rs (2256:2262) duplicated block id: 706 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (229:264) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (601:636) duplicated block id: 707 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (59:64) - tokenizers/src/pre_tokenizers/fixed_length.rs (87:92) duplicated block id: 708 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1615:1620) - tokenizers/src/tokenizer/normalizer.rs (1658:1663) duplicated block id: 709 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (311:346) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (601:636) duplicated block id: 710 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (270:305) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (601:636) duplicated block id: 711 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1639:1644) - tokenizers/src/tokenizer/normalizer.rs (1773:1778) duplicated block id: 712 size: 6 cleaned lines of code in 2 files: - bindings/python/src/processors.rs (204:212) - bindings/python/src/trainers.rs (130:138) duplicated block id: 713 size: 6 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (211:218) - bindings/python/src/trainers.rs (79:86) duplicated block id: 714 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (117:122) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (138:144) duplicated block id: 715 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/digits.rs (73:78) duplicated block id: 716 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (607:645) - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (651:689) duplicated block id: 717 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (229:264) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (311:346) duplicated block id: 718 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1532:1539) - tokenizers/src/tokenizer/normalizer.rs (2204:2210) duplicated block id: 719 size: 6 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (234:242) - bindings/python/src/trainers.rs (130:138) duplicated block id: 720 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (229:264) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (352:387) duplicated block id: 721 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (137:142) - tokenizers/src/tokenizer/normalizer.rs (1794:1799) duplicated block id: 722 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 723 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (69:74) - tokenizers/src/tokenizer/normalizer.rs (1794:1799) duplicated block id: 724 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 725 size: 6 cleaned lines of code in 2 files: - bindings/python/src/processors.rs (182:189) - bindings/python/src/trainers.rs (79:86) duplicated block id: 726 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (113:118) duplicated block id: 727 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1694:1700) - tokenizers/src/tokenizer/normalizer.rs (2256:2262) duplicated block id: 728 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (229:264) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (270:305) duplicated block id: 729 size: 6 cleaned lines of code in 2 files: - bindings/node/src/models.rs (87:92) - bindings/node/src/models.rs (125:130) duplicated block id: 730 size: 6 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (613:620) - bindings/python/src/pre_tokenizers.rs (445:452) duplicated block id: 731 size: 6 cleaned lines of code in 2 files: - tokenizers/benches/common/mod.rs (15:20) - tokenizers/src/tokenizer/mod.rs (299:304) duplicated block id: 732 size: 6 cleaned lines of code in 2 files: - tokenizers/benches/common/mod.rs (15:20) - tokenizers/src/tokenizer/mod.rs (312:317) duplicated block id: 733 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:35) - tokenizers/src/pre_tokenizers/digits.rs (59:64) duplicated block id: 734 size: 6 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (165:173) - bindings/python/src/trainers.rs (130:138) duplicated block id: 735 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:371) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 736 size: 6 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (221:228) - bindings/python/src/tokenizer.rs (648:655) duplicated block id: 737 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:393) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 738 size: 6 cleaned lines of code in 2 files: - bindings/python/src/decoders.rs (145:152) - bindings/python/src/models.rs (237:244) duplicated block id: 739 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1788:1794) - tokenizers/src/tokenizer/normalizer.rs (1932:1938) duplicated block id: 740 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1737:1744) - tokenizers/src/tokenizer/normalizer.rs (2136:2142) duplicated block id: 741 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1833:1839) - tokenizers/src/tokenizer/normalizer.rs (2012:2018) duplicated block id: 742 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (920:925) - tokenizers/src/processors/template.rs (1008:1013) duplicated block id: 743 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:371) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 744 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1615:1620) - tokenizers/src/tokenizer/normalizer.rs (1701:1706) duplicated block id: 745 size: 6 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (307:314) - bindings/python/src/processors.rs (528:535) duplicated block id: 746 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:393) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 747 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:71) - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) duplicated block id: 748 size: 6 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (221:228) - bindings/python/src/trainers.rs (79:86) duplicated block id: 749 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (931:936) - tokenizers/src/processors/template.rs (1020:1025) duplicated block id: 750 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (828:835) - tokenizers/src/tokenizer/mod.rs (871:878) duplicated block id: 751 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (352:387) - bindings/python/py_src/tokenizers/normalizers/__init__.pyi (601:636) duplicated block id: 752 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:255) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 753 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1538:1543) - tokenizers/src/tokenizer/serialization.rs (16:21) duplicated block id: 754 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (63:68) - tokenizers/src/normalizers/strip.rs (133:138) duplicated block id: 755 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1922:1927) - tokenizers/src/tokenizer/normalizer.rs (2184:2189) duplicated block id: 756 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 757 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (107:112) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 758 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:255) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 759 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1651:1657) - tokenizers/src/tokenizer/normalizer.rs (2074:2080) duplicated block id: 760 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1614:1619) - tokenizers/src/tokenizer/normalizer.rs (1773:1778) duplicated block id: 761 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/metaspace.rs (252:257) duplicated block id: 762 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (107:112) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 763 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/fixed_length.rs (107:112) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 764 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1651:1657) - tokenizers/src/tokenizer/normalizer.rs (2136:2142) duplicated block id: 765 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (87:92) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 766 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1922:1927) - tokenizers/src/tokenizer/normalizer.rs (2238:2243) duplicated block id: 767 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (104:115) - tokenizers/src/models/bpe/trainer.rs (121:132) duplicated block id: 768 size: 6 cleaned lines of code in 2 files: - tokenizers/benches/common/mod.rs (15:20) - tokenizers/src/tokenizer/mod.rs (1049:1054) duplicated block id: 769 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (299:304) - tokenizers/src/tokenizer/mod.rs (530:535) duplicated block id: 770 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1543:1548) - tokenizers/src/tokenizer/normalizer.rs (1773:1778) duplicated block id: 771 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (704:709) - bindings/python/src/models.rs (826:831) duplicated block id: 772 size: 6 cleaned lines of code in 2 files: - bindings/node/src/models.rs (115:120) - bindings/node/src/models.rs (135:140) duplicated block id: 773 size: 6 cleaned lines of code in 2 files: - tokenizers/benches/common/mod.rs (15:20) - tokenizers/src/tokenizer/mod.rs (1088:1093) duplicated block id: 774 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1705:1710) - tokenizers/src/tokenizer/normalizer.rs (1781:1786) duplicated block id: 775 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (333:341) - bindings/python/src/pre_tokenizers.rs (242:250) duplicated block id: 776 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (146:151) - bindings/python/py_src/tokenizers/implementations/byte_level_bpe.py (117:122) duplicated block id: 777 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1694:1700) - tokenizers/src/tokenizer/normalizer.rs (1932:1938) duplicated block id: 778 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:393) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 779 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (70:75) - tokenizers/src/tokenizer/normalizer.rs (1639:1644) duplicated block id: 780 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (536:574) - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (607:645) duplicated block id: 781 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:393) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 782 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/unigram/trainer.rs (737:742) - tokenizers/src/models/unigram/trainer.rs (796:801) duplicated block id: 783 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (536:574) - bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi (651:689) duplicated block id: 784 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (237:244) - bindings/python/src/processors.rs (182:189) duplicated block id: 785 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) duplicated block id: 786 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1532:1539) - tokenizers/src/tokenizer/normalizer.rs (2136:2142) duplicated block id: 787 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (146:151) - bindings/python/py_src/tokenizers/implementations/sentencepiece_unigram.py (138:144) duplicated block id: 788 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (312:317) - tokenizers/src/tokenizer/mod.rs (530:535) duplicated block id: 789 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/fixed_length.rs (87:92) duplicated block id: 790 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1694:1700) - tokenizers/src/tokenizer/normalizer.rs (1878:1884) duplicated block id: 791 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/sequence.rs (63:68) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 792 size: 6 cleaned lines of code in 2 files: - bindings/node/src/models.rs (105:110) - bindings/node/src/models.rs (115:120) duplicated block id: 793 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (70:75) - tokenizers/src/tokenizer/normalizer.rs (1682:1687) duplicated block id: 794 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (276:281) - tokenizers/src/pre_tokenizers/fixed_length.rs (107:112) duplicated block id: 795 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (474:479) - tokenizers/src/processors/sequence.rs (131:136) duplicated block id: 796 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1576:1583) - tokenizers/src/tokenizer/normalizer.rs (1932:1938) duplicated block id: 797 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (252:257) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 798 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1773:1778) - tokenizers/src/utils/truncation.rs (247:252) duplicated block id: 799 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:255) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 800 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1532:1539) - tokenizers/src/tokenizer/normalizer.rs (2074:2080) duplicated block id: 801 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (252:257) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 802 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (70:75) - tokenizers/src/tokenizer/normalizer.rs (1725:1730) duplicated block id: 803 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (786:793) - tokenizers/src/tokenizer/mod.rs (871:878) duplicated block id: 804 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1737:1744) - tokenizers/src/tokenizer/normalizer.rs (2204:2210) duplicated block id: 805 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/bpe/model.rs (157:162) - tokenizers/src/models/wordpiece/mod.rs (103:108) duplicated block id: 806 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (786:793) - tokenizers/src/tokenizer/mod.rs (828:835) duplicated block id: 807 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (43:50) - tokenizers/src/pre_tokenizers/punctuation.rs (60:67) duplicated block id: 808 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:35) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 809 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (59:64) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 810 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/implementations/bert_wordpiece.py (146:151) - bindings/python/py_src/tokenizers/implementations/char_level_bpe.py (145:150) duplicated block id: 811 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (530:535) - tokenizers/src/tokenizer/mod.rs (1049:1054) duplicated block id: 812 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1682:1687) - tokenizers/src/tokenizer/normalizer.rs (1773:1778) duplicated block id: 813 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (312:317) - tokenizers/src/tokenizer/mod.rs (1088:1093) duplicated block id: 814 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (44:55) - tokenizers/src/normalizers/precompiled.rs (65:75) duplicated block id: 815 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (59:64) - tokenizers/src/pre_tokenizers/whitespace.rs (95:100) duplicated block id: 816 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:319) - tokenizers/src/pre_tokenizers/metaspace.rs (216:221) duplicated block id: 817 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (491:496) - tokenizers/src/pre_tokenizers/byte_level.rs (515:520) duplicated block id: 818 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (736:743) - tokenizers/src/tokenizer/normalizer.rs (756:763) duplicated block id: 819 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1737:1744) - tokenizers/src/tokenizer/normalizer.rs (2012:2018) duplicated block id: 820 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (474:479) - tokenizers/src/pre_tokenizers/byte_level.rs (515:520) duplicated block id: 821 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1576:1583) - tokenizers/src/tokenizer/normalizer.rs (2074:2080) duplicated block id: 822 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:319) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (113:118) duplicated block id: 823 size: 6 cleaned lines of code in 2 files: - bindings/python/src/normalizers.rs (457:464) - bindings/python/src/processors.rs (793:800) duplicated block id: 824 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1833:1839) - tokenizers/src/tokenizer/normalizer.rs (2136:2142) duplicated block id: 825 size: 6 cleaned lines of code in 2 files: - bindings/node/src/decoders.rs (127:134) - bindings/node/src/normalizers.rs (165:172) duplicated block id: 826 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/sequence.rs (63:68) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 827 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1897:1902) - tokenizers/src/tokenizer/normalizer.rs (1951:1956) duplicated block id: 828 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (314:319) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (90:95) duplicated block id: 829 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1651:1657) - tokenizers/src/tokenizer/normalizer.rs (2204:2210) duplicated block id: 830 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:255) - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) duplicated block id: 831 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (250:255) - tokenizers/src/pre_tokenizers/digits.rs (59:64) duplicated block id: 832 size: 6 cleaned lines of code in 2 files: - tokenizers/src/utils/padding.rs (106:111) - tokenizers/src/utils/truncation.rs (177:182) duplicated block id: 833 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:371) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 834 size: 6 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (413:421) - bindings/python/src/utils/serde_pyo3.rs (579:587) duplicated block id: 835 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (237:244) - bindings/python/src/tokenizer.rs (648:655) duplicated block id: 836 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (486:491) - bindings/python/src/models.rs (821:826) duplicated block id: 837 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (1131:1136) - tokenizers/src/processors/template.rs (1145:1150) duplicated block id: 838 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (1049:1054) - tokenizers/src/tokenizer/mod.rs (1088:1093) duplicated block id: 839 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1833:1839) - tokenizers/src/tokenizer/normalizer.rs (2074:2080) duplicated block id: 840 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:371) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 841 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (158:185) - bindings/python/py_src/tokenizers/models/__init__.pyi (501:528) duplicated block id: 842 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (333:341) - bindings/python/src/trainers.rs (130:138) duplicated block id: 843 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1651:1657) - tokenizers/src/tokenizer/normalizer.rs (2256:2262) duplicated block id: 844 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (388:393) - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) duplicated block id: 845 size: 6 cleaned lines of code in 2 files: - bindings/python/py_src/tokenizers/models/__init__.pyi (158:185) - bindings/python/py_src/tokenizers/models/__init__.pyi (372:399) duplicated block id: 846 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/digits.rs (59:64) duplicated block id: 847 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (87:92) - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) duplicated block id: 848 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:35) - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) duplicated block id: 849 size: 6 cleaned lines of code in 2 files: - bindings/python/src/pre_tokenizers.rs (642:649) - bindings/python/src/processors.rs (793:800) duplicated block id: 850 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (30:35) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 851 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (896:901) - tokenizers/src/processors/template.rs (961:966) duplicated block id: 852 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (59:64) - tokenizers/src/tokenizer/normalizer.rs (919:924) duplicated block id: 853 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1788:1794) - tokenizers/src/tokenizer/normalizer.rs (2012:2018) duplicated block id: 854 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/digits.rs (87:92) - tokenizers/src/pre_tokenizers/whitespace.rs (71:76) duplicated block id: 855 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1773:1778) - tokenizers/src/tokenizer/normalizer.rs (1840:1845) duplicated block id: 856 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1060:1065) - tokenizers/src/tokenizer/normalizer.rs (1122:1127) duplicated block id: 857 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 858 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:71) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 859 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 860 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/digits.rs (87:92) duplicated block id: 861 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 862 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1749:1754) - tokenizers/src/tokenizer/normalizer.rs (1781:1786) duplicated block id: 863 size: 6 cleaned lines of code in 2 files: - bindings/node/src/tasks/tokenizer.rs (51:56) - bindings/node/src/tasks/tokenizer.rs (75:80) duplicated block id: 864 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (515:520) - tokenizers/src/processors/sequence.rs (104:109) duplicated block id: 865 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:71) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (121:126) duplicated block id: 866 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:371) - tokenizers/src/pre_tokenizers/fixed_length.rs (71:76) duplicated block id: 867 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/mod.rs (447:452) - tokenizers/src/tokenizer/mod.rs (1525:1532) duplicated block id: 868 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1608:1614) - tokenizers/src/tokenizer/normalizer.rs (1879:1885) duplicated block id: 869 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (515:520) - tokenizers/src/processors/sequence.rs (84:89) duplicated block id: 870 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1632:1637) - tokenizers/src/tokenizer/normalizer.rs (1718:1723) duplicated block id: 871 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1532:1539) - tokenizers/src/tokenizer/normalizer.rs (2012:2018) duplicated block id: 872 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/template.rs (737:742) - tokenizers/src/processors/template.rs (751:756) duplicated block id: 873 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/bert.rs (66:71) - tokenizers/src/pre_tokenizers/unicode_scripts/pre_tokenizer.rs (98:103) duplicated block id: 874 size: 6 cleaned lines of code in 2 files: - tokenizers/src/models/wordpiece/mod.rs (211:216) - tokenizers/src/models/wordpiece/mod.rs (256:261) duplicated block id: 875 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (491:496) - tokenizers/src/processors/sequence.rs (131:136) duplicated block id: 876 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/metaspace.rs (287:292) duplicated block id: 877 size: 6 cleaned lines of code in 2 files: - bindings/python/src/utils/serde_pyo3.rs (445:453) - bindings/python/src/utils/serde_pyo3.rs (579:587) duplicated block id: 878 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1694:1700) - tokenizers/src/tokenizer/normalizer.rs (2012:2018) duplicated block id: 879 size: 6 cleaned lines of code in 2 files: - bindings/python/src/models.rs (333:341) - bindings/python/src/normalizers.rs (234:242) duplicated block id: 880 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1545:1550) - tokenizers/src/tokenizer/normalizer.rs (1746:1751) duplicated block id: 881 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) - tokenizers/src/pre_tokenizers/punctuation.rs (50:55) duplicated block id: 882 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (70:75) - tokenizers/src/tokenizer/normalizer.rs (1819:1824) duplicated block id: 883 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/metaspace.rs (301:306) duplicated block id: 884 size: 6 cleaned lines of code in 2 files: - tokenizers/src/normalizers/byte_level.rs (137:142) - tokenizers/src/tokenizer/normalizer.rs (1613:1618) duplicated block id: 885 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/metaspace.rs (325:330) duplicated block id: 886 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (366:371) - tokenizers/src/pre_tokenizers/metaspace.rs (224:229) duplicated block id: 887 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/bert.rs (255:260) - tokenizers/src/processors/template.rs (1020:1025) duplicated block id: 888 size: 6 cleaned lines of code in 2 files: - tokenizers/src/processors/sequence.rs (84:89) - tokenizers/src/processors/sequence.rs (131:136) duplicated block id: 889 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1576:1583) - tokenizers/src/tokenizer/normalizer.rs (2012:2018) duplicated block id: 890 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1057:1062) - tokenizers/src/tokenizer/normalizer.rs (1548:1553) duplicated block id: 891 size: 6 cleaned lines of code in 2 files: - tokenizers/src/tokenizer/normalizer.rs (1737:1744) - tokenizers/src/tokenizer/normalizer.rs (2074:2080) duplicated block id: 892 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (418:423) - tokenizers/src/pre_tokenizers/metaspace.rs (339:344) duplicated block id: 893 size: 6 cleaned lines of code in 2 files: - tokenizers/src/pre_tokenizers/byte_level.rs (410:415) - tokenizers/src/pre_tokenizers/metaspace.rs (252:257)