2 Comments
User's avatar
Leo's avatar

Thanks for this great post.

I have 2 questions:

1. In your implementation of the FM layer (https://github.com/caesarjuly/reginx/blob/master/trainer/models/common/feature_cross.py#L25), there are no dense feature values used. It seems dense features are represented as embeddings, but their values are not used here. I'm not sure if I missed something. If I'm wrong, how are the values of dense features used?

2. If my understanding is correct, the final user embedding and item embedding are `concat_user_vector` and `concat_item_vector` (https://github.com/caesarjuly/reginx/blob/master/trainer/tasks/generate_fm_embedding.py#L53). And they are used in ANN matching as CR. I'm curious how important `item_score` is in the final embedding matching?

thanks.

Expand full comment
Fan's avatar

Hi Leo, thanks for the questions.

1. In my example using MovieLens data, there is no dense features and the feature number is quite small. For the DeepFM model, I simply convert sparse features to one hot embedding and feed them to a linear model as the linear part. Here is the features I used for the wide (linear) part https://github.com/caesarjuly/reginx/blob/master/trainer/models/features/movielens.py#L463. For the FM and Deep part, I convert all the features to embeddings and feed them to the network https://github.com/caesarjuly/reginx/blob/master/trainer/models/features/movielens.py#L522. According to the DeepFM paper, "Each categorical field is represented as a vector of one-hot encoding, and each continuous field is represented as the value itself, or a vector of one-hot encoding after discretization". In a real world example with many dense features, common practice is either feed the dense features to linear part directly or discretize and convert them to sparse features.

2. Yes, your understanding is correct. Item_score basically represent the importance and popularity of the item itself. So if the item is popular, the score will be high. I haven't try this in a production environment, this is my inference.

Expand full comment