Two tower candidate retriever I

Mar 13, 2023

Bias correction for in-batch negative sampling with Tensorflow Recommender

6 Comments

Fan, can you share your experiment setup when you say "this 2 params do have a big impact on the final performance, around 5% improvement on the recall rate metrics."? The reason of this ask is that I failed to make them work in my experiment setup, which is: (1) each tower is just an embedding lookup table (2) feature is just user id and movie title (3) movielen_100k dataset. There are two variants: (1) sampling bias correction as the baseline (2) baseline + l2 + temperature. I used kerastuner to tune the learning rate and temperature. However, I can't make the variant 2 (baseline + l2 + temperature) outperform variant 1 even with extensive tuning. Here is the colab link in case you are interested: https://drive.google.com/file/d/1i3suC8hE0zK3p5slM5TQKlzSjvMLtsqK/view?usp=sharing. I wonder if it's due to the model is too simple and the dataset is too small.

Expand full comment

Reply (1)

Fan

Jan 29

Sorry. I just saw your comment. The result isn't from MovieLens data, it's based on a real industry model training on real dataset and features. The dataset and features are much more complex than MovieLens. Your assumption should be right. To get a more reliable result, you can try to tune it on Criteo dataset, which is much bigger than MovieLens.

Expand full comment

Reply (1)

Zhong Zhang

Feb 10

no worries, thanks for the reply. Happy Chinese new year!

Expand full comment

Zhong Zhang

Oct 24

thanks for the great explanation, it answers many questions about the streaming frequency estimation in my mind for years. There is still one more puzzle for me and can you help me confirm my understanding: it's definitely possible that a batch contains the same item multiple times and all of those duplicated items in the same batch have the same item frequency estimation?

Expand full comment

Reply (1)

Fan

Oct 24

Yes. Actually in the paper, they didn't consider this situation. In my implemented version, the same candidate will be calculated using the same step params and the estimation result will be the same.

Expand full comment

Reply (1)

Zhong Zhang

Oct 24

Yep, the original paper uses sampling without replacement in the simulation session. :-(. Your code actually made me realizes that batch update handles the duplication.

Expand full comment

Be a happy and strong coder

Two tower candidate retriever I