Build the encoders, decoders and put everything together
Thanks for this great post.
I think the dropout layer is missing in residual connection in both encoder and decoder in this implementation.
```
attention_output = self.norm(
self.add(
[input, dropout(self.attention(input, input, input, training=training)) ],
),
training=training,
)
Thanks for this great post.
I think the dropout layer is missing in residual connection in both encoder and decoder in this implementation.
```
attention_output = self.norm(
self.add(
[input, dropout(self.attention(input, input, input, training=training)) ],
),
training=training,
)
```