Open
Description
I think there is a typo in this section: https://huggingface.co/learn/llm-course/chapter1/4?fw=pt#the-original-architecture
"Note that the first attention layer in a decoder block pays attention to all (past) inputs to the decoder, but the second attention layer uses the output of the encoder. It can thus access the whole input sentence to best predict the current word. This is very useful as different languages can have grammatical rules that put the words in different orders, or some context provided later in the sentence may be helpful to determine the best translation of a given word."
The word INPUT should actually be OUTPUT for the first attention layer correct?
Metadata
Metadata
Assignees
Labels
No labels