AI-ML·중요도 6·2026. 05. 23.·r/MachineLearning

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

── KO ──────────────────

SM1이라는 Mamba1 변형을 개발하여 PyTorch에서 사용 가능하게 만들었다.

SM1(스칼라 Mamba1)은 d_state=1에 대해 전체 선택적 스캔을 두 개의 네이티브 PyTorch 연산으로 대체합니다. 이 구현은 d_state=1 문제에 대한 정확한 닫힌 형태 솔루션으로, 메모리 소비가 대폭 줄어듭니다. 현재는 163K MIDI 파일로 훈련 중이며, 130M 파라미터는 RTX 5060 Ti 카드에 적합합니다. d_state는 이미 구조를 인코딩한 토큰에서는 더 이상 필요하지 않습니다.

── EN ──────────────────

Developed a Mamba1 variant called SM1 that operates in pure PyTorch with d_state=1.

SM1 (Scalar Mamba1) replaces the entire selective scan with two native PyTorch operations for d_state=1. This implementation provides a closed-form solution to the recurrence, significantly reducing memory usage. Currently, it's being trained on 163K MIDI files using a total of 130M parameters, which fits comfortably on an RTX 5060 Ti. d_state is only necessary when the token representation does not already encode structure.

원문 보기 →목록으로