Discussion about this post

User's avatar
Leo's avatar

Thanks for this great post.

I think the dropout layer is missing in residual connection in both encoder and decoder in this implementation.

```

attention_output = self.norm(

self.add(

[input, dropout(self.attention(input, input, input, training=training)) ],

),

training=training,

)

```

Expand full comment
2 more comments...

No posts