琥珀青葉KBlueLeaf

@KBlueLeaf

Joined on Jan 9, 2020

  • REF: https://arxiv.org/pdf/2402.03300 Algorithm: image Objective: image This note is for learning the GRPO quickly while use lot of simplified abstraction or intuitive descriptions.
     Like 1 Bookmark
  • Part of paper: From the paper, it was mentioned that "A hybrid form of parallel representation and recurrent representation is available to accelerate training." This suggests that all three representations should yield exactly the same output. Let's verify this... Different Representations 1. Recurrent Representation The recurrent form can be written as: $$ S_n = \gamma S_{n-1} + K_n^\mathsf{T}V_n
     Like  Bookmark