8000 Chapter 1 "how transformers work" · Issue #953 · huggingface/course · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Chapter 1 "how transformers work" #953
Open
@bangasaksham20

Description

@bangasaksham20

I think there is a typo in this section: https://huggingface.co/learn/llm-course/chapter1/4?fw=pt#the-original-architecture

"Note that the first attention layer in a decoder block pays attention to all (past) inputs to the decoder, but the second attention layer uses the output of the encoder. It can thus access the whole input sentence to best predict the current word. This is very useful as different languages can have grammatical rules that put the words in different orders, or some context provided later in the sentence may be helpful to determine the best translation of a given word."

The word INPUT should actually be OUTPUT for the first attention layer correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0