Summary: 21 instances, 21 unique Text Count # TODO: Add the QK dotprod to get scores 1 seed = SEEDS + col_id # FIXME index the seed properly 1 # TODO: check this 1 # TODO add bias here 1 # TODO: flag use zeros for garbage 1 # TODO: tune these 1 # TODO: pass in key and value independently. 1 # TODO: check if need to return this or not 1 # TODO: this is here for BC 1 int64_t grain_size = 128; // TODO: tune this 1 # TODO: should be able to not have to pass in num_heads 1 # TODO: Build LUT 1 # TODO: update defaults for use_razavi_pinverse and inv_iterations 1 // TODO: compute grad only if they require grad 1 // TODO investigate misaligned address errors in values ptr 1 # TODO assume we have (N, S, hs) instead of (B, nh, S, hs), with N = B x nh 1 # TODO this is not always true, but is a fast approximation for now 1 # FIXME: Should we still accept a mask in that case ? 1 # TODO: flag use zeros for garbage 1 # TODO: link to a drawing of what these tensors are 1 # FIXME: @lefaudeux tensor shape changes are not well handled, see shape3 1