Operations

Some commands, such as train or fork, support passing a series of optional operations via the --operations option.

Operations provide a flexible way to design data augmentation pipelines, by composing a sequence of primitive data operations to be applied to the data, specified as case-insensitive, python-like functions calls.

operations

For instance, consider a configuration file that specifies the following data augmentation operations:

yaml
toml

train_config.yaml
cmd: train
input: data.json
operations:
  - sub(value=mean, axis=feature)
  - div(value=std, axis=feature)
  - mulrand(range=(-0.1, 0.1), log=true)
  - mul(value=std, axis=feature)
  - add(value=mean, axis=feature)

config.toml
cmd = "train"
input = "data.json"
operations = [
  "sub(value=mean, axis=feature)",
  "div(value=std, axis=feature)",
  "mulrand(range=(-0.1, 0.1), log=true)",
  "mul(value=std, axis=feature)",
  "add(value=mean, axis=feature)",
]

info

Note that text-like values such as mean or std are not in quotes.

In this example, the following series of operations will be applied to the input data:

Subtract the feature-wise mean
Divide by its feature-wise standard deviation
Multiply (i.e., scale) by some normally distributed random value between 2 ** -0.1 and 2 ** 0.1.
Multiply by its feature-wise standard deviation.
Add feature-wise mean to bring feature back to initial scale.

caution

For any augmentation pipeline, make ensure your data always remains in a realistic range. Poorly designed data augmentation pipelines might result in worse training outcomes.