# 一种比Dot-Product Self-Attention更 # Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention ![](https://i.imgur.com/6l2xTkf.png) ##