Operations
Some commands, such as train or fork, support passing a series of optional operations via the --operations option.
Operations provide a flexible way to design data augmentation pipelines, by composing a sequence of primitive data operations to be applied to the data, specified as case-insensitive, python-like functions calls.
For instance, consider a configuration file that specifies the following data augmentation operations:
- yaml
- toml
cmd: train
input: data.json
operations:
- sub(value=mean, axis=feature)
- div(value=std, axis=feature)
- mulrand(range=(-0.1, 0.1), log=true)
- mul(value=std, axis=feature)
- add(value=mean, axis=feature)
cmd = "train"
input = "data.json"
operations = [
"sub(value=mean, axis=feature)",
"div(value=std, axis=feature)",
"mulrand(range=(-0.1, 0.1), log=true)",
"mul(value=std, axis=feature)",
"add(value=mean, axis=feature)",
]
Note that text-like values such as mean or std are not in quotes.
In this example, the following series of operations will be applied to the input data:
- Subtract the feature-wise mean
- Divide by its feature-wise standard deviation
- Multiply (i.e., scale) by some normally distributed random value between
2 ** -0.1and2 ** 0.1. - Multiply by its feature-wise standard deviation.
- Add feature-wise mean to bring feature back to initial scale.
For any augmentation pipeline, make ensure your data always remains in a realistic range. Poorly designed data augmentation pipelines might result in worse training outcomes.