
Implementation of model abstract class.

The idea of this class is to provide a standard interface for training/evaluating models and help avoid duplication of code. It is set up in a modular fashion such that a model can overwrite key components of the training process (eg. the actual implementation of the network via get_net, the criterion via get_criterion, how batches from the dataset are preprocessed via process_batch).


local MyModel = torch.class('MyModel', 'tl.Model')

function MyModel:required_params()
  return {'d_in', 'd_hid'}

function MyModel:get_net()
  return nn.Sequential()
      :add(nn.Linear(self.opt.d_in, self.opt.d_hid))
      :add(nn.Linear(self.opt.d_hid, 1))

function MyModel:get_criterion()
  return nn.MSECriterion()


View source



  • opt (table): a key-value map of parameters for the model.

If you feel the need to have a more specific constructor, you should add to the implementation of the child class. In practice, it is often sufficient to overwrite the functions get_net, get_criterion, and initialize.


View source

Initializes the model.

By default, uniformly initializes all parameters to between -0.08 and 0.08 and resets gradients to 0.


  • (Model) initialized model


View source


  • (table) required arguments for the constructor

By default returns empty table. If a required argument is not met, then the constructor will abort with an error.


View source


  • (torch.Module) implementation of the network.

Note: You must overwrite this function.


View source


  • (torch.Module) implementation of the network.

By default returns nn.CrossEntropyCriterion().

Model:process_batch(batch, pad)

View source

Applies prepocessing to the batch object returned by Dataset.batches.


  • batch (table[string:table]): a map from Dataset.batches.
  • pad (int): what to use to pad variably lengthed examples in batch.X.


  • (table[string:table]) padded batch

By default, this pads the X field using Dataset.pad and converts the Y field to a Tensor. You may want to do different things here, such as convert tensors to CUDA, pad a different field etc.

Model:train(dataset, opt, optimize, optim_opt)

View source

Trains on a Dataset instance.


  • dataset (Dataset): dataset to train on.
  • opt (table): training options.
  • optimize (optim.optimizer): optimizer for training. Optional, Default: optim.adam.
  • optim_opt (table): optimizer options. Optional.


  • (number) average loss per example

opt specifies:

- `batch_size`: the number of examples per batch to fetch from `dataset`. By default this is `128`.

- `silent`: whether to prevent progress updates (eg. via a progress bar). By default this is `false`.

- `pad`: The integer used for padding variable lengthed sequences. By default this is `0`.


d = Dataset{X = X, Y = Y}
 loss = model:train(d, {silent=true, batch_size=10}, optim.adam, {learningRate=1e-3})

Model:evaluate(dataset, opt)

View source

Evaluates on a Dataset instance.


  • dataset (Dataset): dataset to evaluate on.
  • opt (table): evaluation options.


  • (number, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor) evaluation results

opt specifies:

  • batch_size: the number of examples per batch to fetch from dataset. By default this is 128.

  • silent: whether to prevent progress updates (eg. via a progress bar). By default this is false.

  • pad: The integer used for padding variable lengthed sequences. By default this is 0.

Returns the following:

  • loss: average loss per example

  • pred: a Tensor contintaing the predictions made

  • targ: a Tensor contintaing the ground truth

  • max_scores: a Tensor contintaing the max scores for each prediction

  • raw_scores: a Tensor contintaing the raw scores for each prediction


d = Dataset{X = X, Y = Y}
loss, pred, targ, max_scores, raw_scores = model:evaluate(d, {silent=true, batch_size=10})

Model:fit(dataset, opt, callbacks, progress, optim, optim_opt)

View source

Trains and evaluates a model.


  • dataset (table[string:Dataset]): a map of datasets.
  • opt (table): training options. Optional.
  • callbacks (table[string:function]): a map of callback functions that are run after each epoch. Optional.
  • progress (function): returns whether this epoch is an improvement over the best results seen so far. Optional.
  • optim (optim.optimizer): optimizer for train. Optional, Default: optim.adam.
  • optim_opt (table): optimizer options for train. Optional.


  • (table, table) best evaluation results seen during training and the training history of all evaluation results.

dataset contains:

  • train: the Dataset to train on.

  • dev: the development Dataset to evaluate on. Used for early stopping

  • test: the Dataset to test on. Optional. If specified, then will be evaluated on at the end of training.

opt contains:

  • batch_size: the number of examples per batch to fetch from dataset. By default this is 128.

  • silent: whether to prevent progress updates (eg. via a progress bar). By default this is false.

  • patience: the number of sub-optimal epochs to tolerate before early stopping. Default is 5.

  • n_epoch: the maximum number of epochs to train for. Default is 30.

  • save: where to save progress. If not specified then no saving will be done.

callbacks functions take the following arguments:

  • split: the name of the split being run

  • res: the evaluation results for the split

If a callback returns values, then the values will be stored in the evaluation results for that epoch

and printed to stdout.

progress takes a function that takes as arguments:

  • curr: the evaluation results for the current epoch

  • best: the best evaluation result so far

and returns whether curr is better than best. By default, this compares the loss field.

d = {
  train=Dataset{X = Xtrain, Y = Ytrain}, 
  dev=Dataset{X = Xdev, Y = Ydev}, 
  test=Dataset{X = Xtest, Y = Ytest}, 
best_scores, train_hist = model:fit(d, {silent=true, batch_size=10})