duplicated block id: 1 size: 124 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (27:250) - dataset-construction/src/ndb_data/sample_questions_250.py (27:250) duplicated block id: 2 size: 72 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (159:242) - modelling/src/neuraldb/convert_spj_to_predictions.py (122:205) duplicated block id: 3 size: 65 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (27:117) - dataset-construction/src/ndb_data/sample_questions_50.py (27:117) duplicated block id: 4 size: 63 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_baselines.py (37:107) - modelling/src/neuraldb/evaluation/postprocess_spj.py (36:106) duplicated block id: 5 size: 62 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (27:115) - dataset-construction/src/ndb_data/sample_questions_500.py (27:115) duplicated block id: 6 size: 61 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (27:113) - dataset-construction/src/ndb_data/sample_questions_500.py (27:113) duplicated block id: 7 size: 61 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (27:113) - dataset-construction/src/ndb_data/sample_questions_1000.py (27:113) duplicated block id: 8 size: 61 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (117:251) - dataset-construction/src/ndb_data/sample_questions_500.py (117:251) duplicated block id: 9 size: 61 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (27:113) - dataset-construction/src/ndb_data/sample_questions_50.py (27:113) duplicated block id: 10 size: 61 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_50.py (27:113) - dataset-construction/src/ndb_data/sample_questions_500.py (27:113) duplicated block id: 11 size: 58 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (245:327) - modelling/src/neuraldb/convert_spj_to_predictions.py (208:290) duplicated block id: 12 size: 54 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (32:101) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (32:101) duplicated block id: 13 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (27:72) - dataset-construction/src/ndb_data/sample_questions_50.py (27:72) duplicated block id: 14 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (27:72) - dataset-construction/src/ndb_data/sample_questions_250.py (27:72) duplicated block id: 15 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (27:72) - dataset-construction/src/ndb_data/sample_questions_500.py (27:72) duplicated block id: 16 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (27:72) - dataset-construction/src/ndb_data/sample_questions_50.py (27:72) duplicated block id: 17 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (27:72) - dataset-construction/src/ndb_data/sample_questions_100.py (27:72) duplicated block id: 18 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (27:72) - dataset-construction/src/ndb_data/sample_questions_500.py (27:72) duplicated block id: 19 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (27:72) - dataset-construction/src/ndb_data/sample_questions_1000.py (27:72) duplicated block id: 20 size: 35 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (27:72) - dataset-construction/src/ndb_data/sample_questions_250.py (27:72) duplicated block id: 21 size: 32 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (160:250) - dataset-construction/src/ndb_data/sample_questions_50.py (154:244) duplicated block id: 22 size: 32 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (160:250) - dataset-construction/src/ndb_data/sample_questions_50.py (154:244) duplicated block id: 23 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_50.py (163:244) - dataset-construction/src/ndb_data/sample_questions_500.py (170:251) duplicated block id: 24 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (74:115) - dataset-construction/src/ndb_data/sample_questions_500.py (74:115) duplicated block id: 25 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (169:250) - dataset-construction/src/ndb_data/sample_questions_500.py (170:251) duplicated block id: 26 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (169:250) - dataset-construction/src/ndb_data/sample_questions_1000.py (170:251) duplicated block id: 27 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (74:115) - dataset-construction/src/ndb_data/sample_questions_500.py (74:115) duplicated block id: 28 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (170:251) - dataset-construction/src/ndb_data/sample_questions_50.py (163:244) duplicated block id: 29 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (170:251) - dataset-construction/src/ndb_data/sample_questions_250.py (169:250) duplicated block id: 30 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (74:115) - dataset-construction/src/ndb_data/sample_questions_1000.py (74:115) duplicated block id: 31 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (169:250) - dataset-construction/src/ndb_data/sample_questions_500.py (170:251) duplicated block id: 32 size: 26 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (74:115) - dataset-construction/src/ndb_data/sample_questions_250.py (74:115) duplicated block id: 33 size: 25 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (51:80) - modelling/src/neuraldb/convert_spj_to_predictions.py (79:108) duplicated block id: 34 size: 25 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (74:113) - dataset-construction/src/ndb_data/sample_questions_50.py (74:113) duplicated block id: 35 size: 25 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (74:113) - dataset-construction/src/ndb_data/sample_questions_100.py (74:113) duplicated block id: 36 size: 25 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (74:113) - dataset-construction/src/ndb_data/sample_questions_50.py (74:113) duplicated block id: 37 size: 25 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (74:113) - dataset-construction/src/ndb_data/sample_questions_250.py (74:113) duplicated block id: 38 size: 24 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/construction/make_database_initial.py (41:73) - dataset-construction/src/ndb_data/construction/make_database_initial_cache.py (42:74) duplicated block id: 39 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (162:188) - dataset-construction/src/ndb_data/generation/question_to_db.py (210:237) duplicated block id: 40 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (163:240) - dataset-construction/src/ndb_data/sample_questions_250.py (173:250) duplicated block id: 41 size: 23 cleaned lines of code in 2 files: - modelling/src/neuraldb/convert_spj_to_predictions.py (125:151) - modelling/src/neuraldb/convert_spj_to_predictions.py (173:200) duplicated block id: 42 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (163:240) - dataset-construction/src/ndb_data/sample_questions_500.py (174:251) duplicated block id: 43 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (163:240) - dataset-construction/src/ndb_data/sample_questions_50.py (167:244) duplicated block id: 44 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (210:237) - modelling/src/neuraldb/convert_spj_to_predictions.py (125:151) duplicated block id: 45 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (163:240) - dataset-construction/src/ndb_data/sample_questions_1000.py (174:251) duplicated block id: 46 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (162:188) - modelling/src/neuraldb/convert_spj_to_predictions.py (173:200) duplicated block id: 47 size: 23 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (163:240) - dataset-construction/src/ndb_data/sample_questions_100.py (173:250) duplicated block id: 48 size: 21 cleaned lines of code in 2 files: - modelling/src/neuraldb/modelling/neuraldb_trainer.py (222:242) - modelling/src/neuraldb/modelling/neuraldb_trainer.py (252:272) duplicated block id: 49 size: 19 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_baselines.py (138:158) - modelling/src/neuraldb/evaluation/postprocess_spj.py (116:136) duplicated block id: 50 size: 18 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (122:152) - dataset-construction/src/ndb_data/sample_questions_50.py (122:152) duplicated block id: 51 size: 18 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (122:152) - dataset-construction/src/ndb_data/sample_questions_100.py (128:158) duplicated block id: 52 size: 18 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (128:158) - dataset-construction/src/ndb_data/sample_questions_50.py (122:152) duplicated block id: 53 size: 18 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (128:158) - dataset-construction/src/ndb_data/sample_questions_50.py (122:152) duplicated block id: 54 size: 18 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (53:74) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (80:101) duplicated block id: 55 size: 18 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (53:74) - modelling/src/neuraldb/final_scoring_with_dbsize.py (80:101) duplicated block id: 56 size: 18 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (122:152) - dataset-construction/src/ndb_data/sample_questions_250.py (128:158) duplicated block id: 57 size: 17 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/map_kelm.py (226:247) - dataset-construction/src/ndb_data/generation/map_kelm.py (294:315) duplicated block id: 58 size: 17 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (118:136) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (122:140) duplicated block id: 59 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (132:158) - dataset-construction/src/ndb_data/sample_questions_500.py (133:159) duplicated block id: 60 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (133:159) - dataset-construction/src/ndb_data/sample_questions_50.py (126:152) duplicated block id: 61 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (132:158) - dataset-construction/src/ndb_data/sample_questions_1000.py (133:159) duplicated block id: 62 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_50.py (126:152) - dataset-construction/src/ndb_data/sample_questions_500.py (133:159) duplicated block id: 63 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (133:159) - dataset-construction/src/ndb_data/sample_questions_250.py (132:158) duplicated block id: 64 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (126:152) - dataset-construction/src/ndb_data/sample_questions_1000.py (133:159) duplicated block id: 65 size: 15 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/instance_generator.py (171:188) - modelling/src/neuraldb/dataset/instance_generator/instance_generator.py (242:259) duplicated block id: 66 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (126:152) - dataset-construction/src/ndb_data/sample_questions_500.py (133:159) duplicated block id: 67 size: 15 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (132:158) - dataset-construction/src/ndb_data/sample_questions_500.py (133:159) duplicated block id: 68 size: 15 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (160:175) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (163:178) duplicated block id: 69 size: 14 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/map_kelm.py (262:278) - dataset-construction/src/ndb_data/generation/map_kelm.py (330:346) duplicated block id: 70 size: 12 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (32:44) - modelling/src/neuraldb/convert_spj_to_predictions.py (60:72) duplicated block id: 71 size: 11 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (82:95) - modelling/src/neuraldb/final_scoring_with_dbsize.py (122:133) duplicated block id: 72 size: 11 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_spj.py (41:51) - modelling/src/neuraldb/evaluation/postprocess_spj.py (58:68) duplicated block id: 73 size: 11 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_baselines.py (42:52) - modelling/src/neuraldb/evaluation/postprocess_baselines.py (59:69) duplicated block id: 74 size: 11 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_baselines.py (42:52) - modelling/src/neuraldb/evaluation/postprocess_spj.py (58:68) duplicated block id: 75 size: 11 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (82:95) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (126:137) duplicated block id: 76 size: 11 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_baselines.py (59:69) - modelling/src/neuraldb/evaluation/postprocess_spj.py (41:51) duplicated block id: 77 size: 10 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (33:42) - dataset-construction/src/ndb_data/generation/question_to_db.py (58:67) duplicated block id: 78 size: 10 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/construction/make_database_finalize.py (82:95) - dataset-construction/src/ndb_data/construction/make_questions.py (487:500) duplicated block id: 79 size: 10 cleaned lines of code in 2 files: - modelling/src/neuraldb/convert_spj_to_predictions.py (61:70) - modelling/src/neuraldb/convert_spj_to_predictions.py (86:95) duplicated block id: 80 size: 10 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/construction/make_questions.py (131:141) - dataset-construction/src/ndb_data/construction/make_questions.py (186:196) duplicated block id: 81 size: 10 cleaned lines of code in 2 files: - modelling/src/neuraldb/convert_spj_to_predictions.py (341:350) - modelling/src/neuraldb/convert_spj_to_predictions.py (384:393) duplicated block id: 82 size: 10 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/spj_generator.py (157:166) - modelling/src/neuraldb/dataset/instance_generator/spj_generator.py (202:211) duplicated block id: 83 size: 10 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (104:114) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (107:117) duplicated block id: 84 size: 10 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (58:67) - modelling/src/neuraldb/convert_spj_to_predictions.py (61:70) duplicated block id: 85 size: 10 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (33:42) - modelling/src/neuraldb/convert_spj_to_predictions.py (86:95) duplicated block id: 86 size: 9 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (117:126) - dataset-construction/src/ndb_data/sample_questions_500.py (117:126) duplicated block id: 87 size: 9 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/map_kelm.py (250:260) - dataset-construction/src/ndb_data/generation/map_kelm.py (318:328) duplicated block id: 88 size: 9 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (117:126) - dataset-construction/src/ndb_data/sample_questions_1000.py (117:126) duplicated block id: 89 size: 9 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (117:126) - dataset-construction/src/ndb_data/sample_questions_500.py (117:126) duplicated block id: 90 size: 9 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (179:191) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (182:194) duplicated block id: 91 size: 9 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/spj_generator.py (137:149) - modelling/src/neuraldb/dataset/instance_generator/spj_generator.py (172:183) duplicated block id: 92 size: 9 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (29:39) - modelling/src/neuraldb/final_scoring_with_dbsize.py (30:40) duplicated block id: 93 size: 9 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (117:126) - dataset-construction/src/ndb_data/sample_questions_250.py (117:126) duplicated block id: 94 size: 8 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (199:206) - dataset-construction/src/ndb_data/generation/question_to_db.py (248:255) duplicated block id: 95 size: 8 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (147:154) - modelling/src/neuraldb/convert_spj_to_predictions.py (111:118) duplicated block id: 96 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/convert_spj_to_predictions.py (162:169) - modelling/src/neuraldb/convert_spj_to_predictions.py (211:218) duplicated block id: 97 size: 8 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (199:206) - modelling/src/neuraldb/convert_spj_to_predictions.py (211:218) duplicated block id: 98 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/externalir_generator_maxtok.py (41:48) - modelling/src/neuraldb/dataset/instance_generator/perfectir_generator.py (37:44) duplicated block id: 99 size: 8 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (248:255) - modelling/src/neuraldb/convert_spj_to_predictions.py (162:169) duplicated block id: 100 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/instance_generator.py (40:47) - modelling/src/neuraldb/dataset/instance_generator/spj_generator.py (32:39) duplicated block id: 101 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/data_collator_seq2seq.py (100:107) - modelling/src/neuraldb/dataset/data_collator_seq2seq.py (163:170) duplicated block id: 102 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (202:210) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (246:254) duplicated block id: 103 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_baselines.py (113:120) - modelling/src/neuraldb/evaluation/postprocess_spj.py (99:106) duplicated block id: 104 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/externalir_generator_maxtok.py (41:48) - modelling/src/neuraldb/dataset/instance_generator/wholedb_generator.py (35:42) duplicated block id: 105 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/externalir_generator.py (37:44) - modelling/src/neuraldb/dataset/instance_generator/externalir_generator_maxtok.py (41:48) duplicated block id: 106 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/perfectir_generator.py (37:44) - modelling/src/neuraldb/dataset/instance_generator/wholedb_generator.py (35:42) duplicated block id: 107 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/evaluation/postprocess_baselines.py (100:107) - modelling/src/neuraldb/evaluation/postprocess_baselines.py (113:120) duplicated block id: 108 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/externalir_generator.py (37:44) - modelling/src/neuraldb/dataset/instance_generator/perfectir_generator.py (37:44) duplicated block id: 109 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/externalir_generator.py (37:44) - modelling/src/neuraldb/dataset/instance_generator/wholedb_generator.py (35:42) duplicated block id: 110 size: 8 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (31:39) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (32:40) duplicated block id: 111 size: 7 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (120:126) - dataset-construction/src/ndb_data/sample_questions_50.py (114:120) duplicated block id: 112 size: 7 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (120:126) - dataset-construction/src/ndb_data/sample_questions_50.py (114:120) duplicated block id: 113 size: 7 cleaned lines of code in 2 files: - modelling/src/neuraldb/convert_spj_to_predictions.py (329:336) - modelling/src/neuraldb/convert_spj_to_predictions.py (373:381) duplicated block id: 114 size: 7 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (216:222) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (269:275) duplicated block id: 115 size: 7 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (198:205) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (247:254) duplicated block id: 116 size: 7 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize.py (198:205) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (203:210) duplicated block id: 117 size: 7 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/template_first_db.py (133:142) - dataset-construction/src/ndb_data/generation/template_first_db.py (174:184) duplicated block id: 118 size: 7 cleaned lines of code in 2 files: - ssg/ssg_prediction.py (40:46) - ssg/train_ssg.py (41:47) duplicated block id: 119 size: 7 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (120:126) - dataset-construction/src/ndb_data/sample_questions_50.py (114:120) duplicated block id: 120 size: 7 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_50.py (114:120) - dataset-construction/src/ndb_data/sample_questions_500.py (120:126) duplicated block id: 121 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (101:107) - dataset-construction/src/ndb_data/sample_questions_250.py (57:63) duplicated block id: 122 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (41:51) - modelling/src/neuraldb/final_scoring_with_dbsize.py (64:75) duplicated block id: 123 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (101:107) - dataset-construction/src/ndb_data/sample_questions_500.py (57:63) duplicated block id: 124 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (57:63) - dataset-construction/src/ndb_data/sample_questions_250.py (101:107) duplicated block id: 125 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (101:107) - dataset-construction/src/ndb_data/sample_questions_50.py (57:63) duplicated block id: 126 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (57:63) - dataset-construction/src/ndb_data/sample_questions_250.py (101:107) duplicated block id: 127 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (276:283) - modelling/src/neuraldb/convert_spj_to_predictions.py (222:229) duplicated block id: 128 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (57:63) - dataset-construction/src/ndb_data/sample_questions_100.py (101:107) duplicated block id: 129 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (57:63) - dataset-construction/src/ndb_data/sample_questions_50.py (101:107) duplicated block id: 130 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (57:63) - dataset-construction/src/ndb_data/sample_questions_100.py (101:107) duplicated block id: 131 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (259:266) - modelling/src/neuraldb/convert_spj_to_predictions.py (239:246) duplicated block id: 132 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/construction/make_database_initial_cache.py (111:118) - dataset-construction/src/ndb_data/construction/make_database_initial_cache.py (122:129) duplicated block id: 133 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (101:107) - dataset-construction/src/ndb_data/sample_questions_250.py (57:63) duplicated block id: 134 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/dataset/instance_generator/externalir_generator.py (28:34) - modelling/src/neuraldb/dataset/instance_generator/externalir_generator_maxtok.py (28:34) duplicated block id: 135 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (101:107) - dataset-construction/src/ndb_data/sample_questions_250.py (57:63) duplicated block id: 136 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (57:63) - dataset-construction/src/ndb_data/sample_questions_1000.py (101:107) duplicated block id: 137 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (57:63) - dataset-construction/src/ndb_data/sample_questions.py (101:107) duplicated block id: 138 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (101:107) - dataset-construction/src/ndb_data/sample_questions_500.py (57:63) duplicated block id: 139 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (101:107) - dataset-construction/src/ndb_data/sample_questions_100.py (57:63) duplicated block id: 140 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (101:107) - dataset-construction/src/ndb_data/sample_questions_500.py (57:63) duplicated block id: 141 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (57:63) - dataset-construction/src/ndb_data/sample_questions_500.py (101:107) duplicated block id: 142 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (57:63) - dataset-construction/src/ndb_data/sample_questions_50.py (101:107) duplicated block id: 143 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/template_first_db.py (110:115) - dataset-construction/src/ndb_data/generation/template_first_db.py (150:155) duplicated block id: 144 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (232:237) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (241:246) duplicated block id: 145 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (57:63) - dataset-construction/src/ndb_data/sample_questions_50.py (101:107) duplicated block id: 146 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (101:107) - dataset-construction/src/ndb_data/sample_questions_500.py (57:63) duplicated block id: 147 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (232:237) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (251:256) duplicated block id: 148 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (241:246) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (251:256) duplicated block id: 149 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (57:63) - dataset-construction/src/ndb_data/sample_questions_50.py (101:107) duplicated block id: 150 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_50.py (101:107) - dataset-construction/src/ndb_data/sample_questions_500.py (57:63) duplicated block id: 151 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (57:63) - dataset-construction/src/ndb_data/sample_questions_250.py (101:107) duplicated block id: 152 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/generation/question_to_db.py (259:266) - dataset-construction/src/ndb_data/generation/question_to_db.py (276:283) duplicated block id: 153 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_500.py (57:63) - dataset-construction/src/ndb_data/sample_questions_500.py (101:107) duplicated block id: 154 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_50.py (57:63) - dataset-construction/src/ndb_data/sample_questions_50.py (101:107) duplicated block id: 155 size: 6 cleaned lines of code in 2 files: - ssg/ssg_prediction.py (30:37) - ssg/train_ssg.py (31:38) duplicated block id: 156 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (57:63) - dataset-construction/src/ndb_data/sample_questions_1000.py (101:107) duplicated block id: 157 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/retriever/dpr.py (69:76) - modelling/src/neuraldb/retriever/tfidf.py (51:58) duplicated block id: 158 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (57:63) - dataset-construction/src/ndb_data/sample_questions_1000.py (101:107) duplicated block id: 159 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (101:107) - dataset-construction/src/ndb_data/sample_questions_50.py (57:63) duplicated block id: 160 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (101:107) - dataset-construction/src/ndb_data/sample_questions_1000.py (57:63) duplicated block id: 161 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/final_scoring.py (41:51) - modelling/src/neuraldb/final_scoring_with_dbsize_sweep.py (64:75) duplicated block id: 162 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (101:107) - dataset-construction/src/ndb_data/sample_questions_1000.py (57:63) duplicated block id: 163 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (57:63) - dataset-construction/src/ndb_data/sample_questions_500.py (101:107) duplicated block id: 164 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_50.py (57:63) - dataset-construction/src/ndb_data/sample_questions_500.py (101:107) duplicated block id: 165 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_1000.py (57:63) - dataset-construction/src/ndb_data/sample_questions_500.py (101:107) duplicated block id: 166 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_250.py (101:107) - dataset-construction/src/ndb_data/sample_questions_50.py (57:63) duplicated block id: 167 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (101:107) - dataset-construction/src/ndb_data/sample_questions_50.py (57:63) duplicated block id: 168 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions_100.py (57:63) - dataset-construction/src/ndb_data/sample_questions_250.py (101:107) duplicated block id: 169 size: 6 cleaned lines of code in 2 files: - dataset-construction/src/ndb_data/sample_questions.py (57:63) - dataset-construction/src/ndb_data/sample_questions_500.py (101:107) duplicated block id: 170 size: 6 cleaned lines of code in 2 files: - modelling/src/neuraldb/convert_spj_to_predictions.py (222:229) - modelling/src/neuraldb/convert_spj_to_predictions.py (239:246)