Transformer with code Part II - Encoder and…

Fan

Aug 26, 2023

Build the encoders, decoders and put everything together

Read →

3 Comments

Leo

Oct 2, 2023

Thanks for this great post.

I think the dropout layer is missing in residual connection in both encoder and decoder in this implementation.

```

attention_output = self.norm(

self.add(

[input, dropout(self.attention(input, input, input, training=training)) ],

training=training,

)

```

Expand full comment

Reply (1)

Fan

Oct 2, 2023Edited

Thanks for the comment. I checked the implementation from TensorFlow official blog https://www.tensorflow.org/text/tutorials/transformer#the_feed_forward_network. There is no dropout layer between the attention and residual connection layers. But I think we can add dropout layer whenever we want. This is flexible.

Expand full comment

Reply (1)

Leo

Oct 2, 2023Edited

yep, the TF implementation doesn't have dropout.

https://nlp.seas.harvard.edu/annotated-transformer/#encoder-and-decoder-stacks has dropout and the original paper also mentions the dropout in residual connection too.

It's not big deal :)

Expand full comment

Be a happy and strong coder

Transformer with code Part II - Encoder and…