I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P] — PLINKFEED