duplicated block id: 1 size: 115 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (662:808) - src/nanotron/nn/ring_attention.py (446:592) duplicated block id: 2 size: 94 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (504:650) - src/nanotron/nn/ring_attention.py (288:432) duplicated block id: 3 size: 77 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1097:1224) - src/nanotron/models/qwen.py (963:1091) duplicated block id: 4 size: 61 cleaned lines of code in 2 files: - src/nanotron/generation/decode.py (355:433) - src/nanotron/generation/decode.py (673:751) duplicated block id: 5 size: 53 cleaned lines of code in 2 files: - src/nanotron/generation/decode.py (469:532) - src/nanotron/generation/decode.py (751:814) duplicated block id: 6 size: 33 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (783:825) - src/nanotron/models/starcoder2.py (1024:1066) duplicated block id: 7 size: 27 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/functional.py (38:79) - src/nanotron/parallel/tensor_parallel/functional.py (118:159) duplicated block id: 8 size: 27 cleaned lines of code in 2 files: - src/nanotron/generation/decode.py (316:351) - src/nanotron/generation/decode.py (624:661) duplicated block id: 9 size: 24 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1584:1607) - src/nanotron/nn/ring_attention_lucidrain.py (1623:1646) duplicated block id: 10 size: 23 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1190:1212) - src/nanotron/nn/ring_attention_lucidrain.py (1624:1646) duplicated block id: 11 size: 23 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1190:1212) - src/nanotron/nn/ring_attention_lucidrain.py (1585:1607) duplicated block id: 12 size: 22 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1038:1062) - src/nanotron/models/qwen.py (908:931) duplicated block id: 13 size: 21 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (165:185) - src/nanotron/data/dataloader.py (205:225) duplicated block id: 14 size: 21 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (965:992) - src/nanotron/models/qwen.py (842:869) duplicated block id: 15 size: 20 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (745:774) - src/nanotron/models/starcoder2.py (987:1015) duplicated block id: 16 size: 19 cleaned lines of code in 2 files: - src/nanotron/generation/sampler.py (38:67) - src/nanotron/generation/sampler.py (245:276) duplicated block id: 17 size: 19 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (876:897) - src/nanotron/models/qwen.py (736:756) duplicated block id: 18 size: 19 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1070:1093) - src/nanotron/models/qwen.py (939:959) duplicated block id: 19 size: 18 cleaned lines of code in 2 files: - src/nanotron/generation/decode.py (252:275) - src/nanotron/generation/decode.py (584:607) duplicated block id: 20 size: 17 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/helpers.cpp (248:264) - src/nanotron/data/nemo_dataset/helpers.cpp (507:523) duplicated block id: 21 size: 17 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (431:447) - src/nanotron/nn/llama3_ring_attention.py (471:487) duplicated block id: 22 size: 16 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (668:687) - src/nanotron/models/starcoder2.py (949:967) duplicated block id: 23 size: 16 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (894:911) - src/nanotron/models/starcoder2.py (1335:1352) duplicated block id: 24 size: 16 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (139:164) - src/nanotron/parallel/pipeline_parallel/engine.py (189:214) duplicated block id: 25 size: 16 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (821:839) - src/nanotron/models/qwen.py (686:704) duplicated block id: 26 size: 16 cleaned lines of code in 2 files: - src/nanotron/serialize/weights.py (226:245) - src/nanotron/serialize/weights.py (359:378) duplicated block id: 27 size: 16 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (392:407) - src/nanotron/nn/llama3_ring_attention.py (431:446) duplicated block id: 28 size: 16 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (392:407) - src/nanotron/nn/llama3_ring_attention.py (471:486) duplicated block id: 29 size: 15 cleaned lines of code in 2 files: - src/nanotron/data/dataloader_builder.py (42:56) - src/nanotron/data/tokenized_bytes.py (693:707) duplicated block id: 30 size: 15 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (306:320) - src/nanotron/nn/llama3_ring_attention.py (487:501) duplicated block id: 31 size: 15 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1251:1265) - src/nanotron/nn/ring_attention_lucidrain.py (1457:1471) duplicated block id: 32 size: 15 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (118:132) - src/nanotron/eval/evaluation_tasks.py (347:361) duplicated block id: 33 size: 15 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (450:464) - src/nanotron/nn/llama3_ring_attention.py (490:504) duplicated block id: 34 size: 15 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (411:425) - src/nanotron/nn/llama3_ring_attention.py (450:464) duplicated block id: 35 size: 15 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (134:148) - src/nanotron/nn/ring_attention.py (48:63) duplicated block id: 36 size: 15 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (411:425) - src/nanotron/nn/llama3_ring_attention.py (490:504) duplicated block id: 37 size: 14 cleaned lines of code in 2 files: - src/nanotron/config/config.py (423:437) - src/nanotron/config/lighteval_config.py (14:28) duplicated block id: 38 size: 14 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (976:993) - src/nanotron/models/starcoder2.py (1375:1394) duplicated block id: 39 size: 14 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/helpers.cpp (395:416) - src/nanotron/data/nemo_dataset/helpers.cpp (643:663) duplicated block id: 40 size: 14 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (266:285) - src/nanotron/models/starcoder2.py (248:267) duplicated block id: 41 size: 14 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (315:329) - src/nanotron/nn/ring_attention.py (189:203) duplicated block id: 42 size: 14 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (152:165) - src/nanotron/eval/evaluation_tasks.py (212:225) duplicated block id: 43 size: 14 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (269:282) - src/nanotron/nn/ring_attention.py (141:154) duplicated block id: 44 size: 13 cleaned lines of code in 2 files: - src/nanotron/data/samplers.py (243:256) - src/nanotron/data/samplers.py (333:346) duplicated block id: 45 size: 13 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (760:772) - src/nanotron/nn/ring_attention_lucidrain.py (1520:1532) duplicated block id: 46 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (832:845) - src/nanotron/models/starcoder2.py (1077:1090) duplicated block id: 47 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (427:441) - src/nanotron/models/starcoder2.py (949:962) duplicated block id: 48 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (548:566) - src/nanotron/nn/moe.py (194:212) duplicated block id: 49 size: 13 cleaned lines of code in 2 files: - src/nanotron/data/samplers.py (194:207) - src/nanotron/data/samplers.py (243:256) duplicated block id: 50 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (853:869) - src/nanotron/models/starcoder2.py (1375:1391) duplicated block id: 51 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1111:1128) - src/nanotron/models/starcoder2.py (1494:1511) duplicated block id: 52 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (977:994) - src/nanotron/models/starcoder2.py (1494:1511) duplicated block id: 53 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (427:441) - src/nanotron/models/starcoder2.py (668:682) duplicated block id: 54 size: 13 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (389:407) - src/nanotron/nn/moe.py (99:117) duplicated block id: 55 size: 13 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:80) - src/nanotron/nn/llama3_ring_attention.py (169:181) duplicated block id: 56 size: 13 cleaned lines of code in 2 files: - src/nanotron/data/samplers.py (194:207) - src/nanotron/data/samplers.py (333:346) duplicated block id: 57 size: 12 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (847:859) - src/nanotron/models/starcoder2.py (1097:1109) duplicated block id: 58 size: 12 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (309:320) - src/nanotron/nn/llama3_ring_attention.py (411:422) duplicated block id: 59 size: 12 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (309:320) - src/nanotron/nn/llama3_ring_attention.py (450:461) duplicated block id: 60 size: 12 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (877:891) - src/nanotron/models/starcoder2.py (1317:1331) duplicated block id: 61 size: 12 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1609:1620) - src/nanotron/nn/ring_attention_lucidrain.py (1648:1659) duplicated block id: 62 size: 12 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (737:750) - src/nanotron/models/starcoder2.py (1317:1331) duplicated block id: 63 size: 12 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (152:163) - src/nanotron/data/dataloader.py (187:198) duplicated block id: 64 size: 12 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (999:1013) - src/nanotron/models/qwen.py (872:886) duplicated block id: 65 size: 12 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (217:232) - src/nanotron/parallel/pipeline_parallel/engine.py (331:348) duplicated block id: 66 size: 11 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (569:584) - src/nanotron/models/starcoder2.py (1048:1063) duplicated block id: 67 size: 11 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (136:146) - src/nanotron/nn/ring_attention.py (143:153) duplicated block id: 68 size: 11 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (542:554) - src/nanotron/models/starcoder2.py (783:795) duplicated block id: 69 size: 11 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1191:1201) - src/nanotron/nn/ring_attention_lucidrain.py (1510:1520) duplicated block id: 70 size: 11 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/nn.py (47:57) - src/nanotron/parallel/tensor_parallel/nn.py (113:123) duplicated block id: 71 size: 11 cleaned lines of code in 2 files: - src/nanotron/sanity_checks.py (136:148) - src/nanotron/sanity_checks.py (194:206) duplicated block id: 72 size: 11 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention.py (50:60) - src/nanotron/nn/ring_attention.py (143:153) duplicated block id: 73 size: 11 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (569:584) - src/nanotron/models/starcoder2.py (807:822) duplicated block id: 74 size: 11 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (136:146) - src/nanotron/nn/llama3_ring_attention.py (271:281) duplicated block id: 75 size: 11 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (271:281) - src/nanotron/nn/ring_attention.py (50:60) duplicated block id: 76 size: 11 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (542:554) - src/nanotron/models/starcoder2.py (1024:1036) duplicated block id: 77 size: 11 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1510:1520) - src/nanotron/nn/ring_attention_lucidrain.py (1586:1596) duplicated block id: 78 size: 11 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1510:1520) - src/nanotron/nn/ring_attention_lucidrain.py (1625:1635) duplicated block id: 79 size: 10 cleaned lines of code in 2 files: - src/nanotron/scaling/parametrization.py (110:122) - src/nanotron/scaling/parametrization.py (168:183) duplicated block id: 80 size: 10 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (1075:1091) - src/nanotron/models/starcoder2.py (1667:1682) duplicated block id: 81 size: 10 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (837:846) - src/nanotron/models/starcoder2.py (1275:1284) duplicated block id: 82 size: 10 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/p2p.py (83:92) - src/nanotron/parallel/pipeline_parallel/p2p.py (209:218) duplicated block id: 83 size: 10 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (518:534) - src/nanotron/models/starcoder2.py (1000:1016) duplicated block id: 84 size: 10 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (864:874) - src/nanotron/models/starcoder2.py (1304:1314) duplicated block id: 85 size: 10 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (342:353) - src/nanotron/data/dataloader_builder.py (49:60) duplicated block id: 86 size: 10 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/helpers.cpp (332:349) - src/nanotron/data/nemo_dataset/helpers.cpp (591:607) duplicated block id: 87 size: 10 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (354:364) - src/nanotron/nn/ring_attention.py (221:231) duplicated block id: 88 size: 10 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1208:1224) - src/nanotron/models/starcoder2.py (1667:1682) duplicated block id: 89 size: 10 cleaned lines of code in 2 files: - src/nanotron/trainer.py (1343:1355) - src/nanotron/trainer.py (1384:1396) duplicated block id: 90 size: 10 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (196:205) - src/nanotron/nn/ring_attention_lucidrain.py (223:232) duplicated block id: 91 size: 9 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (645:654) - src/nanotron/models/starcoder2.py (1077:1086) duplicated block id: 92 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (309:317) - src/nanotron/eval/evaluation_tasks.py (347:355) duplicated block id: 93 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (788:796) - src/nanotron/nn/ring_attention_lucidrain.py (1217:1225) duplicated block id: 94 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (124:132) - src/nanotron/eval/evaluation_tasks.py (217:225) duplicated block id: 95 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (124:132) - src/nanotron/eval/evaluation_tasks.py (157:165) duplicated block id: 96 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (332:340) - src/nanotron/nn/llama3_ring_attention.py (468:476) duplicated block id: 97 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (332:340) - src/nanotron/nn/llama3_ring_attention.py (487:495) duplicated block id: 98 size: 9 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (559:567) - src/nanotron/models/starcoder2.py (797:805) duplicated block id: 99 size: 9 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/nn.py (47:55) - src/nanotron/parallel/tensor_parallel/nn.py (185:193) duplicated block id: 100 size: 9 cleaned lines of code in 2 files: - src/nanotron/sanity_checks.py (126:137) - src/nanotron/sanity_checks.py (245:254) duplicated block id: 101 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (157:165) - src/nanotron/eval/evaluation_tasks.py (353:361) duplicated block id: 102 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (49:57) - src/nanotron/eval/evaluation_tasks.py (347:355) duplicated block id: 103 size: 9 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/nn.py (113:121) - src/nanotron/parallel/tensor_parallel/nn.py (185:193) duplicated block id: 104 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (306:314) - src/nanotron/nn/llama3_ring_attention.py (332:340) duplicated block id: 105 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention.py (15:23) - src/nanotron/nn/ring_attention.py (96:104) duplicated block id: 106 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (217:225) - src/nanotron/eval/evaluation_tasks.py (353:361) duplicated block id: 107 size: 9 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (645:654) - src/nanotron/models/starcoder2.py (832:841) duplicated block id: 108 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (118:126) - src/nanotron/eval/evaluation_tasks.py (309:317) duplicated block id: 109 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (468:476) - src/nanotron/nn/llama3_ring_attention.py (487:495) duplicated block id: 110 size: 9 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (518:533) - src/nanotron/models/starcoder2.py (759:774) duplicated block id: 111 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (105:115) - src/nanotron/nn/llama3_ring_attention.py (233:243) duplicated block id: 112 size: 9 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (559:567) - src/nanotron/models/starcoder2.py (1038:1046) duplicated block id: 113 size: 9 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/nn.py (69:78) - src/nanotron/parallel/tensor_parallel/nn.py (136:144) duplicated block id: 114 size: 9 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (306:314) - src/nanotron/nn/llama3_ring_attention.py (468:476) duplicated block id: 115 size: 9 cleaned lines of code in 2 files: - src/nanotron/data/tokenized_bytes.py (269:277) - src/nanotron/data/tokenized_bytes.py (443:451) duplicated block id: 116 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (49:57) - src/nanotron/eval/evaluation_tasks.py (118:126) duplicated block id: 117 size: 9 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (307:332) - src/nanotron/models/starcoder2.py (61:86) duplicated block id: 118 size: 9 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (787:795) - src/nanotron/models/qwen.py (664:672) duplicated block id: 119 size: 9 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (49:57) - src/nanotron/eval/evaluation_tasks.py (309:317) duplicated block id: 120 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (997:1005) - src/nanotron/models/starcoder2.py (1548:1556) duplicated block id: 121 size: 8 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1158:1165) - src/nanotron/nn/ring_attention_lucidrain.py (1460:1467) duplicated block id: 122 size: 8 cleaned lines of code in 2 files: - src/nanotron/generation/sampler.py (109:118) - src/nanotron/generation/sampler.py (252:261) duplicated block id: 123 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (612:619) - src/nanotron/models/starcoder2.py (885:892) duplicated block id: 124 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (702:710) - src/nanotron/models/starcoder2.py (1123:1130) duplicated block id: 125 size: 8 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (163:170) - src/nanotron/nn/llama3_ring_attention.py (368:375) duplicated block id: 126 size: 8 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (127:134) - src/nanotron/nn/llama3_ring_attention.py (262:269) duplicated block id: 127 size: 8 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (122:129) - src/nanotron/parallel/pipeline_parallel/engine.py (182:189) duplicated block id: 128 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (339:349) - src/nanotron/models/starcoder2.py (615:623) duplicated block id: 129 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1131:1139) - src/nanotron/models/starcoder2.py (1548:1556) duplicated block id: 130 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (230:237) - src/nanotron/models/qwen.py (357:366) duplicated block id: 131 size: 8 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1158:1165) - src/nanotron/nn/ring_attention_lucidrain.py (1254:1261) duplicated block id: 132 size: 8 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (122:129) - src/nanotron/parallel/pipeline_parallel/engine.py (242:249) duplicated block id: 133 size: 8 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (182:189) - src/nanotron/parallel/pipeline_parallel/engine.py (242:249) duplicated block id: 134 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (1260:1267) - src/nanotron/models/starcoder2.py (1407:1414) duplicated block id: 135 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (336:345) - src/nanotron/models/llama.py (721:728) duplicated block id: 136 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (290:297) - src/nanotron/models/starcoder2.py (268:275) duplicated block id: 137 size: 8 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/helpers.cpp (298:310) - src/nanotron/data/nemo_dataset/helpers.cpp (557:569) duplicated block id: 138 size: 8 cleaned lines of code in 2 files: - src/nanotron/config/config.py (58:65) - src/nanotron/config/config.py (71:78) duplicated block id: 139 size: 8 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (744:751) - src/nanotron/nn/ring_attention_lucidrain.py (1501:1508) duplicated block id: 140 size: 8 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/helpers.cpp (314:330) - src/nanotron/data/nemo_dataset/helpers.cpp (574:589) duplicated block id: 141 size: 8 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/p2p.py (103:110) - src/nanotron/parallel/pipeline_parallel/p2p.py (252:259) duplicated block id: 142 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (212:221) - src/nanotron/models/starcoder2.py (179:188) duplicated block id: 143 size: 8 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (656:664) - src/nanotron/models/starcoder2.py (843:851) duplicated block id: 144 size: 8 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/__init__.py (663:674) - src/nanotron/data/nemo_dataset/blendable_dataset.py (76:87) duplicated block id: 145 size: 8 cleaned lines of code in 2 files: - src/nanotron/generation/sampler.py (109:118) - src/nanotron/generation/sampler.py (176:185) duplicated block id: 146 size: 8 cleaned lines of code in 2 files: - src/nanotron/generation/sampler.py (43:52) - src/nanotron/generation/sampler.py (176:185) duplicated block id: 147 size: 8 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (342:349) - src/nanotron/data/tokenized_bytes.py (700:707) duplicated block id: 148 size: 8 cleaned lines of code in 2 files: - src/nanotron/generation/sampler.py (43:52) - src/nanotron/generation/sampler.py (109:118) duplicated block id: 149 size: 8 cleaned lines of code in 2 files: - src/nanotron/generation/sampler.py (176:185) - src/nanotron/generation/sampler.py (252:261) duplicated block id: 150 size: 7 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (946:955) - src/nanotron/models/qwen.py (821:832) duplicated block id: 151 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (776:782) - src/nanotron/nn/ring_attention_lucidrain.py (1545:1551) duplicated block id: 152 size: 7 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (185:191) - src/nanotron/eval/evaluation_tasks.py (355:361) duplicated block id: 153 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/__init__.py (65:71) - src/nanotron/data/nemo_dataset/__init__.py (261:267) duplicated block id: 154 size: 7 cleaned lines of code in 2 files: - src/nanotron/serialize/weights.py (84:91) - src/nanotron/serialize/weights.py (248:255) duplicated block id: 155 size: 7 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/state.py (186:192) - src/nanotron/parallel/pipeline_parallel/state.py (280:286) duplicated block id: 156 size: 7 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/functional.py (637:643) - src/nanotron/parallel/tensor_parallel/functional.py (656:662) duplicated block id: 157 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (155:161) - src/nanotron/data/dataloader.py (168:174) duplicated block id: 158 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (74:80) - src/nanotron/nn/ring_attention.py (98:104) duplicated block id: 159 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (528:535) - src/nanotron/nn/llama3_ring_attention.py (594:601) duplicated block id: 160 size: 7 cleaned lines of code in 2 files: - src/nanotron/sanity_checks.py (66:74) - src/nanotron/sanity_checks.py (216:224) duplicated block id: 161 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (74:80) - src/nanotron/nn/ring_attention.py (17:23) duplicated block id: 162 size: 7 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (136:142) - src/nanotron/models/qwen.py (570:576) duplicated block id: 163 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (528:535) - src/nanotron/nn/ring_attention.py (376:383) duplicated block id: 164 size: 7 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (126:132) - src/nanotron/eval/evaluation_tasks.py (185:191) duplicated block id: 165 size: 7 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (185:191) - src/nanotron/eval/evaluation_tasks.py (219:225) duplicated block id: 166 size: 7 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (88:94) - src/nanotron/eval/evaluation_tasks.py (152:158) duplicated block id: 167 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (175:181) - src/nanotron/nn/ring_attention.py (17:23) duplicated block id: 168 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (380:386) - src/nanotron/nn/ring_attention.py (243:249) duplicated block id: 169 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (367:373) - src/nanotron/nn/ring_attention.py (234:240) duplicated block id: 170 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (155:161) - src/nanotron/data/dataloader.py (208:214) duplicated block id: 171 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (175:181) - src/nanotron/nn/ring_attention.py (98:104) duplicated block id: 172 size: 7 cleaned lines of code in 2 files: - src/nanotron/scaling/parametrization.py (156:163) - src/nanotron/scaling/parametrization.py (227:234) duplicated block id: 173 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/samplers.py (258:266) - src/nanotron/data/samplers.py (347:355) duplicated block id: 174 size: 7 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (864:870) - src/nanotron/models/qwen.py (881:887) duplicated block id: 175 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (190:196) - src/nanotron/data/dataloader.py (208:214) duplicated block id: 176 size: 7 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (598:604) - src/nanotron/models/llama.py (615:621) duplicated block id: 177 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (162:168) - src/nanotron/nn/ring_attention.py (89:95) duplicated block id: 178 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention.py (310:317) - src/nanotron/nn/ring_attention.py (376:383) duplicated block id: 179 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention.py (90:96) - src/nanotron/nn/ring_attention.py (235:241) duplicated block id: 180 size: 7 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (159:165) - src/nanotron/eval/evaluation_tasks.py (185:191) duplicated block id: 181 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (370:376) - src/nanotron/data/dataloader_builder.py (75:82) duplicated block id: 182 size: 7 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (88:94) - src/nanotron/eval/evaluation_tasks.py (212:218) duplicated block id: 183 size: 7 cleaned lines of code in 2 files: - src/nanotron/logging/timers.py (321:328) - src/nanotron/logging/timers.py (345:352) duplicated block id: 184 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (332:340) - src/nanotron/data/dataloader_builder.py (40:47) duplicated block id: 185 size: 7 cleaned lines of code in 2 files: - src/nanotron/sanity_checks.py (79:85) - src/nanotron/sanity_checks.py (226:233) duplicated block id: 186 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (168:174) - src/nanotron/data/dataloader.py (190:196) duplicated block id: 187 size: 7 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/functional.py (21:32) - src/nanotron/parallel/pipeline_parallel/functional.py (59:70) duplicated block id: 188 size: 7 cleaned lines of code in 2 files: - src/nanotron/data/clm_collator.py (85:97) - src/nanotron/data/clm_collator.py (232:244) duplicated block id: 189 size: 7 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (149:159) - src/nanotron/parallel/pipeline_parallel/engine.py (267:278) duplicated block id: 190 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1412:1418) - src/nanotron/nn/ring_attention_lucidrain.py (1426:1432) duplicated block id: 191 size: 7 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (866:872) - src/nanotron/models/starcoder2.py (1124:1130) duplicated block id: 192 size: 7 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (348:354) - src/nanotron/models/starcoder2.py (423:429) duplicated block id: 193 size: 7 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (704:710) - src/nanotron/models/starcoder2.py (866:872) duplicated block id: 194 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (341:347) - src/nanotron/nn/ring_attention.py (211:217) duplicated block id: 195 size: 7 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (199:209) - src/nanotron/parallel/pipeline_parallel/engine.py (267:278) duplicated block id: 196 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (594:601) - src/nanotron/nn/ring_attention.py (310:317) duplicated block id: 197 size: 7 cleaned lines of code in 2 files: - src/nanotron/logging/__init__.py (4:10) - src/nanotron/logging/base.py (22:28) duplicated block id: 198 size: 7 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (88:95) - src/nanotron/nn/llama3_ring_attention.py (186:192) duplicated block id: 199 size: 7 cleaned lines of code in 2 files: - src/nanotron/generation/decode.py (217:223) - src/nanotron/generation/decode.py (555:561) duplicated block id: 200 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (335:340) duplicated block id: 201 size: 6 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/functional.py (548:553) - src/nanotron/parallel/tensor_parallel/functional.py (699:704) duplicated block id: 202 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/flex_attention.py (261:266) - src/nanotron/nn/flex_attention.py (329:334) duplicated block id: 203 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/__init__.py (650:657) - src/nanotron/data/nemo_dataset/__init__.py (684:691) duplicated block id: 204 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (796:802) - src/nanotron/nn/ring_attention_lucidrain.py (1132:1138) duplicated block id: 205 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (987:992) - src/nanotron/models/llama.py (1008:1013) duplicated block id: 206 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/__init__.py (76:81) - src/nanotron/data/nemo_dataset/__init__.py (272:277) duplicated block id: 207 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/tokenized_bytes.py (262:267) - src/nanotron/data/tokenized_bytes.py (436:441) duplicated block id: 208 size: 6 cleaned lines of code in 2 files: - src/nanotron/generation/decode.py (295:300) - src/nanotron/generation/decode.py (617:622) duplicated block id: 209 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1060:1065) - src/nanotron/models/starcoder2.py (1436:1441) duplicated block id: 210 size: 6 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/functional.py (32:37) - src/nanotron/parallel/tensor_parallel/functional.py (111:116) duplicated block id: 211 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (59:64) - src/nanotron/eval/evaluation_tasks.py (319:324) duplicated block id: 212 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (309:314) - src/nanotron/nn/llama3_ring_attention.py (392:397) duplicated block id: 213 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (335:340) - src/nanotron/nn/llama3_ring_attention.py (411:416) duplicated block id: 214 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1214:1219) - src/nanotron/nn/ring_attention_lucidrain.py (1552:1557) duplicated block id: 215 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (163:168) - src/nanotron/nn/ring_attention.py (235:240) duplicated block id: 216 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (335:340) - src/nanotron/nn/llama3_ring_attention.py (392:397) duplicated block id: 217 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (309:314) - src/nanotron/nn/llama3_ring_attention.py (431:436) duplicated block id: 218 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/clm_collator.py (101:107) - src/nanotron/data/clm_collator.py (247:254) duplicated block id: 219 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (392:397) duplicated block id: 220 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (431:436) duplicated block id: 221 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (490:495) duplicated block id: 222 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention.py (41:46) - src/nanotron/nn/ring_attention.py (134:139) duplicated block id: 223 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (411:416) duplicated block id: 224 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (471:476) duplicated block id: 225 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (450:455) duplicated block id: 226 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (297:302) - src/nanotron/models/qwen.py (317:322) duplicated block id: 227 size: 6 cleaned lines of code in 2 files: - src/nanotron/trainer.py (368:373) - src/nanotron/trainer.py (407:412) duplicated block id: 228 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/__init__.py (666:674) - src/nanotron/data/tokenized_bytes.py (615:620) duplicated block id: 229 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (471:476) duplicated block id: 230 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (127:132) - src/nanotron/eval/evaluation_tasks.py (319:324) duplicated block id: 231 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (160:165) - src/nanotron/eval/evaluation_tasks.py (319:324) duplicated block id: 232 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (987:992) - src/nanotron/models/qwen.py (881:886) duplicated block id: 233 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (392:397) duplicated block id: 234 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (728:734) - src/nanotron/models/starcoder2.py (1308:1314) duplicated block id: 235 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (59:64) - src/nanotron/eval/evaluation_tasks.py (220:225) duplicated block id: 236 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1008:1013) - src/nanotron/models/qwen.py (864:869) duplicated block id: 237 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (411:416) duplicated block id: 238 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (59:64) - src/nanotron/eval/evaluation_tasks.py (186:191) duplicated block id: 239 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (431:436) duplicated block id: 240 size: 6 cleaned lines of code in 2 files: - src/nanotron/serialize/main.py (72:77) - src/nanotron/serialize/main.py (83:88) duplicated block id: 241 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (456:461) - src/nanotron/nn/ring_attention.py (189:194) duplicated block id: 242 size: 6 cleaned lines of code in 2 files: - src/nanotron/serialize/main.py (72:77) - src/nanotron/serialize/main.py (94:99) duplicated block id: 243 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (450:455) duplicated block id: 244 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/tokenized_bytes.py (350:355) - src/nanotron/data/tokenized_bytes.py (398:403) duplicated block id: 245 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (59:64) - src/nanotron/eval/evaluation_tasks.py (160:165) duplicated block id: 246 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (59:64) - src/nanotron/eval/evaluation_tasks.py (127:132) duplicated block id: 247 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (450:455) - src/nanotron/nn/llama3_ring_attention.py (471:476) duplicated block id: 248 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (490:495) duplicated block id: 249 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (496:501) - src/nanotron/nn/ring_attention.py (189:194) duplicated block id: 250 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (186:191) - src/nanotron/eval/evaluation_tasks.py (319:324) duplicated block id: 251 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (220:225) - src/nanotron/eval/evaluation_tasks.py (319:324) duplicated block id: 252 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (599:604) - src/nanotron/models/starcoder2.py (1174:1179) duplicated block id: 253 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (431:436) - src/nanotron/nn/llama3_ring_attention.py (490:495) duplicated block id: 254 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (535:540) - src/nanotron/models/starcoder2.py (776:781) duplicated block id: 255 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (1008:1013) - src/nanotron/models/starcoder2.py (1386:1391) duplicated block id: 256 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (335:340) duplicated block id: 257 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (42:51) - src/nanotron/models/qwen.py (25:34) duplicated block id: 258 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/samplers.py (277:284) - src/nanotron/data/samplers.py (361:368) duplicated block id: 259 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (319:324) - src/nanotron/eval/evaluation_tasks.py (356:361) duplicated block id: 260 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (498:503) - src/nanotron/models/llama.py (508:515) duplicated block id: 261 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (417:422) - src/nanotron/nn/ring_attention.py (189:194) duplicated block id: 262 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (541:547) - src/nanotron/models/starcoder2.py (553:558) duplicated block id: 263 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (68:73) - src/nanotron/nn/llama3_ring_attention.py (309:314) duplicated block id: 264 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (411:416) - src/nanotron/nn/llama3_ring_attention.py (431:436) duplicated block id: 265 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (215:223) - src/nanotron/models/qwen.py (341:349) duplicated block id: 266 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/blendable_dataset.py (79:87) - src/nanotron/data/tokenized_bytes.py (615:620) duplicated block id: 267 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (392:397) - src/nanotron/nn/llama3_ring_attention.py (490:495) duplicated block id: 268 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (411:416) - src/nanotron/nn/llama3_ring_attention.py (471:476) duplicated block id: 269 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (321:326) - src/nanotron/nn/ring_attention_lucidrain.py (1663:1668) duplicated block id: 270 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (431:436) - src/nanotron/nn/llama3_ring_attention.py (450:455) duplicated block id: 271 size: 6 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (218:223) - src/nanotron/parallel/pipeline_parallel/engine.py (307:313) duplicated block id: 272 size: 6 cleaned lines of code in 2 files: - src/nanotron/serialize/optimizer.py (268:273) - src/nanotron/serialize/optimizer.py (303:308) duplicated block id: 273 size: 6 cleaned lines of code in 2 files: - src/nanotron/serialize/main.py (83:88) - src/nanotron/serialize/main.py (94:99) duplicated block id: 274 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/layer_norm.py (7:13) - src/nanotron/nn/layer_norm.py (39:45) duplicated block id: 275 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (359:364) - src/nanotron/models/starcoder2.py (391:396) duplicated block id: 276 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (751:757) - src/nanotron/models/qwen.py (624:630) duplicated block id: 277 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (868:874) - src/nanotron/models/qwen.py (728:734) duplicated block id: 278 size: 6 cleaned lines of code in 2 files: - src/nanotron/logging/base.py (178:183) - src/nanotron/logging/base.py (210:215) duplicated block id: 279 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/llama.py (666:673) - src/nanotron/models/llama.py (710:717) duplicated block id: 280 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/nemo_dataset/helpers.cpp (355:367) - src/nanotron/data/nemo_dataset/helpers.cpp (610:622) duplicated block id: 281 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (335:340) - src/nanotron/nn/llama3_ring_attention.py (450:455) duplicated block id: 282 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/qwen.py (881:886) - src/nanotron/models/starcoder2.py (1386:1391) duplicated block id: 283 size: 6 cleaned lines of code in 2 files: - src/nanotron/config/models_config.py (79:84) - src/nanotron/config/models_config.py (133:138) duplicated block id: 284 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (335:340) - src/nanotron/nn/llama3_ring_attention.py (431:436) duplicated block id: 285 size: 6 cleaned lines of code in 2 files: - src/nanotron/parallel/tensor_parallel/nn.py (99:104) - src/nanotron/parallel/tensor_parallel/nn.py (172:177) duplicated block id: 286 size: 6 cleaned lines of code in 2 files: - src/nanotron/parallel/pipeline_parallel/engine.py (307:313) - src/nanotron/parallel/pipeline_parallel/engine.py (332:337) duplicated block id: 287 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (303:308) - src/nanotron/nn/ring_attention.py (181:186) duplicated block id: 288 size: 6 cleaned lines of code in 2 files: - src/nanotron/generation/decode.py (286:291) - src/nanotron/generation/decode.py (610:615) duplicated block id: 289 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (368:373) - src/nanotron/nn/ring_attention.py (90:95) duplicated block id: 290 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/tokenized_bytes.py (39:44) - src/nanotron/data/tokenized_bytes.py (54:59) duplicated block id: 291 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/one_job_runner.py (44:49) - src/nanotron/eval/one_job_runner.py (53:58) duplicated block id: 292 size: 6 cleaned lines of code in 2 files: - src/nanotron/config/models_config.py (68:73) - src/nanotron/config/models_config.py (123:128) duplicated block id: 293 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (510:515) - src/nanotron/parallel/tensor_parallel/nn.py (117:122) duplicated block id: 294 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (1220:1225) - src/nanotron/nn/ring_attention_lucidrain.py (1559:1564) duplicated block id: 295 size: 6 cleaned lines of code in 2 files: - src/nanotron/eval/evaluation_tasks.py (59:64) - src/nanotron/eval/evaluation_tasks.py (356:361) duplicated block id: 296 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (392:397) - src/nanotron/nn/llama3_ring_attention.py (411:416) duplicated block id: 297 size: 6 cleaned lines of code in 2 files: - src/nanotron/data/dataloader.py (334:340) - src/nanotron/data/tokenized_bytes.py (693:698) duplicated block id: 298 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/ring_attention_lucidrain.py (791:796) - src/nanotron/nn/ring_attention_lucidrain.py (1559:1564) duplicated block id: 299 size: 6 cleaned lines of code in 2 files: - src/nanotron/s3_checkpoints/s3_mover.py (240:245) - src/nanotron/s3_checkpoints/s3_mover.py (293:298) duplicated block id: 300 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (510:515) - src/nanotron/parallel/tensor_parallel/nn.py (51:56) duplicated block id: 301 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (169:174) - src/nanotron/nn/llama3_ring_attention.py (309:314) duplicated block id: 302 size: 6 cleaned lines of code in 2 files: - src/nanotron/models/starcoder2.py (1514:1519) - src/nanotron/models/starcoder2.py (1536:1541) duplicated block id: 303 size: 6 cleaned lines of code in 2 files: - src/nanotron/nn/llama3_ring_attention.py (392:397) - src/nanotron/nn/llama3_ring_attention.py (450:455)