DSA Question

#33

by ghostplant - opened 13 days ago

13 days ago

在每次 predict next token 的时候，DSA 是基于上一轮 token 的 topK 结果进行局部追加，还是要从全部上下文历史中选择 topK？看模型定义似乎是后者（从全部上下文历史中选择 topK）。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment