SSM
tt.ssm contains state-space-style layers adapted to traceTorch’s one-timestep recurrent interface.
Base Layer
- class tracetorch.ssm.Layer(*args: Any, **kwargs: Any)[source]
Bases:
LayerBase class for traceTorch state-space-model layers.
SSM layers store states with an additional trailing
d_statedimension. This base class adaptstt.Layerstate initialization and dimension helpers so concrete SSM layers can still operate on an arbitrary feature dimension while carrying a per-feature latent state.- Parameters:
num_neurons (int) – number of features in the target dimension.
dim (int, default=-1) – dimension along which the layer operates.
d_state (int, default=1) – latent state size per feature.
Layers
- class tracetorch.ssm.S4(*args: Any, **kwargs: Any)[source]
Bases:
LayerA diagonal S4-style state-space layer adapted to traceTorch.
S4stores a per-feature latent state of sized_stateand updates it one timestep at a time. It is designed for traceTorch-style composition, not as an optimized replacement for sequence-parallel S4 implementations.- Parameters:
num_neurons (int) – number of features in the target dimension.
d_state (int, default=64) – latent state size per feature.
dim (int, default=-1) – dimension along which the layer operates.
- Variables:
state – per-feature latent SSM state.
A_log – log-parameterized diagonal dynamics.
B – input projection into the state.
C – output projection from the state.
D – skip connection scale.
log_dt – log timestep scale.
Notes
Input: tensor of shape
[*,num_neurons,*]wherenum_neuronsis at indexdim.Output: tensor with the same shape as the input.
- class tracetorch.ssm.S5(*args: Any, **kwargs: Any)[source]
Bases:
LayerAn S5-style state-space layer with a global latent state.
S5projects the input features into a shared latent state of sized_stateand projects that state back tonum_neuronsoutputs. It processes one timestep per forward call and keeps the global state internal.- Parameters:
num_neurons (int) – number of features in the target dimension.
d_state (int, default=64) – size of the shared latent state.
dim (int, default=-1) – dimension along which the layer operates.
- Variables:
global_state – shared latent state.
A_log – log-parameterized diagonal dynamics.
B – input projection into the global state.
C – output projection from the global state.
D – skip connection scale.
log_dt – log timestep scale.
Notes
Input: tensor of shape
[*,num_neurons,*]wherenum_neuronsis at indexdim.Output: tensor with the same shape as the input.
- class tracetorch.ssm.S6(*args: Any, **kwargs: Any)[source]
Bases:
LayerA data-dependent S6 state-space layer adapted to traceTorch.
S6is the selective SSM core associated with Mamba-style models, without the causal convolution and multiplicative block gate. The timestep, input, and output projections are computed from the current input, then applied to an internal per-feature state.- Parameters:
num_neurons (int) – number of features in the target dimension.
d_state (int, default=16) – latent state size per feature.
dt_rank (int, default=-1) – rank of the timestep projection.
-1usesmax(1, num_neurons // 16).dim (int, default=-1) – dimension along which the layer operates.
- Variables:
state – per-feature latent SSM state.
x_proj – input-dependent projection producing timestep,
B, andC.dt_proj – projection from low-rank timestep features to per-feature timesteps.
A_log – log-parameterized diagonal dynamics.
D – skip connection scale.
Notes
Input: tensor of shape
[*,num_neurons,*]wherenum_neuronsis at indexdim.Output: tensor with the same shape as the input.
- class tracetorch.ssm.Mamba(*args: Any, **kwargs: Any)[source]
Bases:
LayerA compact Mamba-style block adapted to traceTorch.
Mambacombines an input projection, optional causal convolution buffer, SiLU gating, an S6-style selective SSM core, output projection, and residual connection. It keeps the convolution buffer and SSM state internal and processes one timestep per forward call.- Parameters:
num_neurons (int) – number of features in the target dimension.
d_state (int, default=16) – latent SSM state size per feature.
dim (int, default=-1) – dimension along which the layer operates.
dt_rank (int, default=-1) – rank of the timestep projection.
-1usesmax(1, num_neurons // 16).conv_kernel (int, default=4) – causal convolution buffer length. Values
<= 1disable the convolution buffer.
- Variables:
ssm_state – per-feature selective SSM state.
conv_buffer – causal convolution buffer, present when
conv_kernel > 1.
Notes
This is a traceTorch-compatible experimental implementation. It is not an optimized replacement for production Mamba kernels.