RNN
tt.rnn contains conventional recurrent layers adapted to traceTorch’s hidden-state contract.
Base Layer
- class tracetorch.rnn.Layer(*args: Any, **kwargs: Any)[source]
Bases:
LayerThe superclass used for all RNN layers.
- Parameters:
num_neurons (int) – the number of neurons the layer is considered to have. When initializing any hidden states or registering parameters via the tracetorch methods, this is the value used.
dim (int, default=-1) – the dimension along which the layer operates.
Notes
Inherits from
tt.core.Layer, but doesn’t add any new methods. Checktt.core.Layerto see available methods.
Layers
- class tracetorch.rnn.SimpleRNN(*args: Any, **kwargs: Any)[source]
Bases:
LayerA simple RNN layer, akin to Jordan and Elman networks. Uses the input and the previous timestep’s output to compute the current timestep’s output.
- Parameters:
in_features (int) – number of input features.
out_features (int) – number of output features. This is the value used as
num_neuronsfor superclass initialization.dim (int, default=-1) – the dimension along which the layer operates.
- Variables:
H – the hidden state. Stores the previous timestep’s output.
lin – the linear layer used to calculate the output.
Notes
Input: tensor of shape
[*,in_features,*]wherein_featuresis at indexdim.Output: tensor of shape
[*,out_features,*]whereout_featuresis at indexdim.
Concatenates the hidden state
H(previous timestep’s output) to the inputx, processes both via a linear layer. The output is bound bytanh. Records the result intoHand returns it. Pseudocode looks as follows:H = tanh(linear(concatenate(H, x))) return H
Examples:
# Process 64->10 features along the last dimension >>> layer = tt.rnn.SimpleRNN(64, 10) >>> input = torch.rand(32, 64) >>> output = layer(input) >>> print(output.shape) torch.Size([32, 10]) # Process 64->128 features along the color dimension of an image >>> layer = tt.rnn.SimpleRNN(64, 128, -3) >>> input = torch.rand(32, 64, 28, 28) # [B, C, H, W] shape >>> output = layer(input) >>> print(output.shape) torch.Size([32, 128, 28, 28])
- class tracetorch.rnn.LSTM(*args: Any, **kwargs: Any)[source]
Bases:
LayerA Long Short-Term Memory (LSTM) layer. Uses input, forget, output gates and a cell state to handle long-term dependencies.
- Parameters:
in_features (int) – number of input features.
out_features (int) – number of output features (automatically becomes the hidden and cell state size). This is the value used as
num_neuronsfor superclass initialization.dim (int, default=-1) – the dimension along which the layer operates.
- Variables:
H – the hidden state. Stores the previous timestep’s output.
C – the cell state. Stores long-term memory information.
gate_layers – linear layer computing all four gates simultaneously.
Notes
Input: tensor of shape
[*,in_features,*]wherein_featuresis at indexdim.Output: tensor of shape
[*,out_features,*]whereout_featuresis at indexdim.
Computes input, forget, output gates and cell candidate from concatenated hidden state and input. The forget gate controls what to discard from cell state, input gate controls what new information to add, and output gate controls what to expose as the hidden state. Records results into
HandC, returnsH. Pseudocode looks as follows:i, f, o, g = chunk(sigmoid(gate_layers(concatenate(H, x))), 4) C = f * C + i * tanh(g) H = o * tanh(C) return H
Examples:
# Process 64->32 features along the last dimension >>> layer = tt.rnn.LSTM(64, 32) >>> input = torch.rand(16, 64) >>> output = layer(input) >>> print(output.shape) torch.Size([16, 32]) # Process 64->128 features along the color dimension of an image >>> layer = tt.rnn.LSTM(64, 128, -3) >>> input = torch.rand(32, 64, 28, 28) # [B, C, H, W] shape >>> output = layer(input) >>> print(output.shape) torch.Size([32, 128, 28, 28])
- class tracetorch.rnn.GRU(*args: Any, **kwargs: Any)[source]
Bases:
LayerA Gated Recurrent Unit (GRU) layer. Uses reset and update gates to control information flow, offering a simpler alternative to LSTM.
- Parameters:
in_features (int) – number of input features.
out_features (int) – number of output features (automatically becomes the hidden state size). This is the value used as
num_neuronsfor superclass initialization.dim (int, default=-1) – the dimension along which the layer operates.
- Variables:
H – the hidden state. Stores the previous timestep’s output.
gate_layers – linear layer computing reset and update gates.
candidate_layer – linear layer computing the candidate hidden state.
Notes
Input: tensor of shape
[*,in_features,*]wherein_featuresis at indexdim.Output: tensor of shape
[*,out_features,*]whereout_featuresis at indexdim.
Computes reset and update gates from concatenated hidden state and input. The reset gate controls how much of the previous hidden state to forget, while the update gate balances between old and new information. Records the result into
Hand returns it. Pseudocode looks as follows:reset_gate, update_gate = chunk(sigmoid(gate_layers(concatenate(H, x)))) candidate = tanh(candidate_layer(concatenate(H * reset_gate, x))) H = H * (1 - update_gate) + update_gate * candidate return H
Examples:
# Process 64->32 features along the last dimension >>> layer = tt.rnn.GRU(64, 32) >>> input = torch.rand(16, 64) >>> output = layer(input) >>> print(output.shape) torch.Size([16, 32]) # Process 64->128 features along the color dimension of an image >>> layer = tt.rnn.GRU(64, 128, -3) >>> input = torch.rand(32, 64, 28, 28) # [B, C, H, W] shape >>> output = layer(input) >>> print(output.shape) torch.Size([32, 128, 28, 28])