They have a complete math proof of the feature-wise and bit-wise interactions theorems in the appendix section of the paper. It's a quite long proof which I feel unnecessary to put too much time on understand it. You can refer to the original paper for details.
thanks for this great post!
a correction: in section https://happystrongcoder.substack.com/i/135643135/cross-network-mixture-of-low-rank-dcn, there is no X_l in E_i(X_l).
Thank you very much for the comment. I just fixed it.
another question:
`DCN-V2 is proven capable of catching both feature-wise and bit-wise feature interactions effectively. `
I understand bit-wise feature interactions. But don't get how this model is capable of feature-wise interactions. could you elaborate? thanks.
They have a complete math proof of the feature-wise and bit-wise interactions theorems in the appendix section of the paper. It's a quite long proof which I feel unnecessary to put too much time on understand it. You can refer to the original paper for details.
Thanks