RNN

tt.rnn contains conventional recurrent layers adapted to traceTorch’s hidden-state contract.

Base Layer

class tracetorch.rnn.Layer(*args: Any, **kwargs: Any)[source]

Bases: Layer

The superclass used for all RNN layers.

Parameters:

num_neurons (int) – the number of neurons the layer is considered to have. When initializing any hidden states or registering parameters via the tracetorch methods, this is the value used.
dim (int, default=-1) – the dimension along which the layer operates.

Notes

Inherits from tt.core.Layer, but doesn’t add any new methods. Check tt.core.Layer to see available methods.

Layers

class tracetorch.rnn.SimpleRNN(*args: Any, **kwargs: Any)[source]

Bases: Layer

A simple RNN layer, akin to Jordan and Elman networks. Uses the input and the previous timestep’s output to compute the current timestep’s output.

Parameters:

in_features (int) – number of input features.
out_features (int) – number of output features. This is the value used as num_neurons for superclass initialization.
dim (int, default=-1) – the dimension along which the layer operates.

Variables:

H – the hidden state. Stores the previous timestep’s output.
lin – the linear layer used to calculate the output.

Notes

Input: tensor of shape [*,in_features,*] where in_features is at index dim.
Output: tensor of shape [*,out_features,*] where out_features is at index dim.

Concatenates the hidden state H (previous timestep’s output) to the input x, processes both via a linear layer. The output is bound by tanh. Records the result into H and returns it. Pseudocode looks as follows:

H = tanh(linear(concatenate(H, x)))
return H

Examples:

# Process 64->10 features along the last dimension
>>> layer = tt.rnn.SimpleRNN(64, 10)
>>> input = torch.rand(32, 64)
>>> output = layer(input)
>>> print(output.shape)
torch.Size([32, 10])

# Process 64->128 features along the color dimension of an image
>>> layer = tt.rnn.SimpleRNN(64, 128, -3)
>>> input = torch.rand(32, 64, 28, 28)  # [B, C, H, W] shape
>>> output = layer(input)
>>> print(output.shape)
torch.Size([32, 128, 28, 28])

forward(x)[source]: Computes the forward pass.

class tracetorch.rnn.LSTM(*args: Any, **kwargs: Any)[source]

Bases: Layer

A Long Short-Term Memory (LSTM) layer. Uses input, forget, output gates and a cell state to handle long-term dependencies.

Parameters:

in_features (int) – number of input features.
out_features (int) – number of output features (automatically becomes the hidden and cell state size). This is the value used as num_neurons for superclass initialization.
dim (int, default=-1) – the dimension along which the layer operates.

Variables:

H – the hidden state. Stores the previous timestep’s output.
C – the cell state. Stores long-term memory information.
gate_layers – linear layer computing all four gates simultaneously.

Notes

Input: tensor of shape [*,in_features,*] where in_features is at index dim.
Output: tensor of shape [*,out_features,*] where out_features is at index dim.

Computes input, forget, output gates and cell candidate from concatenated hidden state and input. The forget gate controls what to discard from cell state, input gate controls what new information to add, and output gate controls what to expose as the hidden state. Records results into H and C, returns H. Pseudocode looks as follows:

i, f, o, g = chunk(sigmoid(gate_layers(concatenate(H, x))), 4)
C = f * C + i * tanh(g)
H = o * tanh(C)
return H

Examples:

# Process 64->32 features along the last dimension
>>> layer = tt.rnn.LSTM(64, 32)
>>> input = torch.rand(16, 64)
>>> output = layer(input)
>>> print(output.shape)
torch.Size([16, 32])

# Process 64->128 features along the color dimension of an image
>>> layer = tt.rnn.LSTM(64, 128, -3)
>>> input = torch.rand(32, 64, 28, 28)  # [B, C, H, W] shape
>>> output = layer(input)
>>> print(output.shape)
torch.Size([32, 128, 28, 28])

class tracetorch.rnn.GRU(*args: Any, **kwargs: Any)[source]

Bases: Layer

A Gated Recurrent Unit (GRU) layer. Uses reset and update gates to control information flow, offering a simpler alternative to LSTM.

Parameters:

in_features (int) – number of input features.
out_features (int) – number of output features (automatically becomes the hidden state size). This is the value used as num_neurons for superclass initialization.
dim (int, default=-1) – the dimension along which the layer operates.

Variables:

H – the hidden state. Stores the previous timestep’s output.
gate_layers – linear layer computing reset and update gates.
candidate_layer – linear layer computing the candidate hidden state.

Notes

Input: tensor of shape [*,in_features,*] where in_features is at index dim.
Output: tensor of shape [*,out_features,*] where out_features is at index dim.

Computes reset and update gates from concatenated hidden state and input. The reset gate controls how much of the previous hidden state to forget, while the update gate balances between old and new information. Records the result into H and returns it. Pseudocode looks as follows:

reset_gate, update_gate = chunk(sigmoid(gate_layers(concatenate(H, x))))
candidate = tanh(candidate_layer(concatenate(H * reset_gate, x)))
H = H * (1 - update_gate) + update_gate * candidate
return H

Examples:

# Process 64->32 features along the last dimension
>>> layer = tt.rnn.GRU(64, 32)
>>> input = torch.rand(16, 64)
>>> output = layer(input)
>>> print(output.shape)
torch.Size([16, 32])

# Process 64->128 features along the color dimension of an image
>>> layer = tt.rnn.GRU(64, 128, -3)
>>> input = torch.rand(32, 64, 28, 28)  # [B, C, H, W] shape
>>> output = layer(input)
>>> print(output.shape)
torch.Size([32, 128, 28, 28])

forward(x)[source]: Computes the forward pass.