SSM

tt.ssm contains state-space-style layers adapted to traceTorch’s one-timestep recurrent interface.

Base Layer

class tracetorch.ssm.Layer(*args: Any, **kwargs: Any)[source]

Bases: Layer

Base class for traceTorch state-space-model layers.

SSM layers store states with an additional trailing d_state dimension. This base class adapts tt.Layer state initialization and dimension helpers so concrete SSM layers can still operate on an arbitrary feature dimension while carrying a per-feature latent state.

Parameters:
  • num_neurons (int) – number of features in the target dimension.

  • dim (int, default=-1) – dimension along which the layer operates.

  • d_state (int, default=1) – latent state size per feature.

Layers

class tracetorch.ssm.S4(*args: Any, **kwargs: Any)[source]

Bases: Layer

A diagonal S4-style state-space layer adapted to traceTorch.

S4 stores a per-feature latent state of size d_state and updates it one timestep at a time. It is designed for traceTorch-style composition, not as an optimized replacement for sequence-parallel S4 implementations.

Parameters:
  • num_neurons (int) – number of features in the target dimension.

  • d_state (int, default=64) – latent state size per feature.

  • dim (int, default=-1) – dimension along which the layer operates.

Variables:
  • state – per-feature latent SSM state.

  • A_log – log-parameterized diagonal dynamics.

  • B – input projection into the state.

  • C – output projection from the state.

  • D – skip connection scale.

  • log_dt – log timestep scale.

Notes

  • Input: tensor of shape [*,num_neurons,*] where num_neurons is at index dim.

  • Output: tensor with the same shape as the input.

class tracetorch.ssm.S5(*args: Any, **kwargs: Any)[source]

Bases: Layer

An S5-style state-space layer with a global latent state.

S5 projects the input features into a shared latent state of size d_state and projects that state back to num_neurons outputs. It processes one timestep per forward call and keeps the global state internal.

Parameters:
  • num_neurons (int) – number of features in the target dimension.

  • d_state (int, default=64) – size of the shared latent state.

  • dim (int, default=-1) – dimension along which the layer operates.

Variables:
  • global_state – shared latent state.

  • A_log – log-parameterized diagonal dynamics.

  • B – input projection into the global state.

  • C – output projection from the global state.

  • D – skip connection scale.

  • log_dt – log timestep scale.

Notes

  • Input: tensor of shape [*,num_neurons,*] where num_neurons is at index dim.

  • Output: tensor with the same shape as the input.

class tracetorch.ssm.S6(*args: Any, **kwargs: Any)[source]

Bases: Layer

A data-dependent S6 state-space layer adapted to traceTorch.

S6 is the selective SSM core associated with Mamba-style models, without the causal convolution and multiplicative block gate. The timestep, input, and output projections are computed from the current input, then applied to an internal per-feature state.

Parameters:
  • num_neurons (int) – number of features in the target dimension.

  • d_state (int, default=16) – latent state size per feature.

  • dt_rank (int, default=-1) – rank of the timestep projection. -1 uses max(1, num_neurons // 16).

  • dim (int, default=-1) – dimension along which the layer operates.

Variables:
  • state – per-feature latent SSM state.

  • x_proj – input-dependent projection producing timestep, B, and C.

  • dt_proj – projection from low-rank timestep features to per-feature timesteps.

  • A_log – log-parameterized diagonal dynamics.

  • D – skip connection scale.

Notes

  • Input: tensor of shape [*,num_neurons,*] where num_neurons is at index dim.

  • Output: tensor with the same shape as the input.

class tracetorch.ssm.Mamba(*args: Any, **kwargs: Any)[source]

Bases: Layer

A compact Mamba-style block adapted to traceTorch.

Mamba combines an input projection, optional causal convolution buffer, SiLU gating, an S6-style selective SSM core, output projection, and residual connection. It keeps the convolution buffer and SSM state internal and processes one timestep per forward call.

Parameters:
  • num_neurons (int) – number of features in the target dimension.

  • d_state (int, default=16) – latent SSM state size per feature.

  • dim (int, default=-1) – dimension along which the layer operates.

  • dt_rank (int, default=-1) – rank of the timestep projection. -1 uses max(1, num_neurons // 16).

  • conv_kernel (int, default=4) – causal convolution buffer length. Values <= 1 disable the convolution buffer.

Variables:
  • ssm_state – per-feature selective SSM state.

  • conv_buffer – causal convolution buffer, present when conv_kernel > 1.

Notes

This is a traceTorch-compatible experimental implementation. It is not an optimized replacement for production Mamba kernels.