tabensemb.model.AbstractNN.training_step_end#
method
- AbstractNN.training_step_end(step_output: Tensor | Dict[str, Any]) Tensor | Dict[str, Any]#
Use this when training with dp because
training_step()will operate on only part of the batch. However, this is still optional and only needed for things like softmax or NCE loss.Note
If you later switch to ddp or some other mode, this will still be called so that you don’t have to change your code
# pseudocode sub_batches = split_batches_for_dp(batch) step_output = [training_step(sub_batch) for sub_batch in sub_batches] training_step_end(step_output)
- Parameters:
step_output¶ – What you return in training_step for each batch part.
- Returns:
Anything
When using the DP strategy, only a portion of the batch is inside the training_step:
def training_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self(x) # softmax uses only a portion of the batch in the denominator loss = self.softmax(out) loss = nce_loss(loss) return loss
If you wish to do something with all the parts of the batch, then use this method to do it:
def training_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self.encoder(x) return {"pred": out} def training_step_end(self, training_step_outputs): gpu_0_pred = training_step_outputs[0]["pred"] gpu_1_pred = training_step_outputs[1]["pred"] gpu_n_pred = training_step_outputs[n]["pred"] # this softmax now uses the full batch loss = nce_loss([gpu_0_pred, gpu_1_pred, gpu_n_pred]) return loss
See also
See the Multi GPU Training guide for more details.