6 Comments
User's avatar
Zhong Zhang's avatar

Fan, can you share your experiment setup when you say "this 2 params do have a big impact on the final performance, around 5% improvement on the recall rate metrics."? The reason of this ask is that I failed to make them work in my experiment setup, which is: (1) each tower is just an embedding lookup table (2) feature is just user id and movie title (3) movielen_100k dataset. There are two variants: (1) sampling bias correction as the baseline (2) baseline + l2 + temperature. I used kerastuner to tune the learning rate and temperature. However, I can't make the variant 2 (baseline + l2 + temperature) outperform variant 1 even with extensive tuning. Here is the colab link in case you are interested: https://drive.google.com/file/d/1i3suC8hE0zK3p5slM5TQKlzSjvMLtsqK/view?usp=sharing. I wonder if it's due to the model is too simple and the dataset is too small.

Expand full comment
Fan's avatar

Sorry. I just saw your comment. The result isn't from MovieLens data, it's based on a real industry model training on real dataset and features. The dataset and features are much more complex than MovieLens. Your assumption should be right. To get a more reliable result, you can try to tune it on Criteo dataset, which is much bigger than MovieLens.

Expand full comment
Zhong Zhang's avatar

no worries, thanks for the reply. Happy Chinese new year!

Expand full comment
Zhong Zhang's avatar

thanks for the great explanation, it answers many questions about the streaming frequency estimation in my mind for years. There is still one more puzzle for me and can you help me confirm my understanding: it's definitely possible that a batch contains the same item multiple times and all of those duplicated items in the same batch have the same item frequency estimation?

Expand full comment
Fan's avatar

Yes. Actually in the paper, they didn't consider this situation. In my implemented version, the same candidate will be calculated using the same step params and the estimation result will be the same.

Expand full comment
Zhong Zhang's avatar

Yep, the original paper uses sampling without replacement in the simulation session. :-(. Your code actually made me realizes that batch update handles the duplication.

Expand full comment