Replies: 1 comment
-
Yes, we are. This is how the transformer is trained or any other network (RNN) that tries to autoregressively generate a sequence of tokens. During training (and inference), the model predicts the next token given the current one + previous ones. Schematically:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am trying to understand these lines could you further elaborate what is the procedure of training the transformer here?
`# target includes all sequence elements (no need to handle first one
# differently because we are conditioning)
target = z_indices
Using the features and all of the indices what exactly are we trying to predict? Isn't the target all the z_indices that we are already giving to the transformer? Or are we just predicting the last z_index given the features and the previous z_indices?
Beta Was this translation helpful? Give feedback.
All reactions