I'm curious how this model performs in real production, as it doesn't use any other features at all. IMO, it probably would be a backbone model to extract user sequence embedding, which'd be fed into another ctr model along with all other features.
Yes. Bert is way too heavy for real time inference in recommendation scenario. It can be a backbone for offline long term user sequence modeling. This is similar to Pinterest's PinnerFormer model which uses a Transformer to extract long term user feature.
Actually in many academic papers, they usually tend to use less features and only focus on the modeling part. This makes them impractical for production usage. So we always prefer to only leverage the novel modeling ideas and ignore the experiments results. Also we favor industry papers from Google or Alibaba which usually can be directly used in production.
I'm curious how this model performs in real production, as it doesn't use any other features at all. IMO, it probably would be a backbone model to extract user sequence embedding, which'd be fed into another ctr model along with all other features.
hdyt?
Yes. Bert is way too heavy for real time inference in recommendation scenario. It can be a backbone for offline long term user sequence modeling. This is similar to Pinterest's PinnerFormer model which uses a Transformer to extract long term user feature.
Actually in many academic papers, they usually tend to use less features and only focus on the modeling part. This makes them impractical for production usage. So we always prefer to only leverage the novel modeling ideas and ignore the experiments results. Also we favor industry papers from Google or Alibaba which usually can be directly used in production.