Finally, we provide an example of an entire language model: a deep sequence model spine (with repeating Mamba blocks) + language product head.
Although the recipe for forward pass has to be described inside of this https://geraldtqtk429936.bloginder.com/30548264/the-2-minute-rule-for-mamba-paper