Datasets

Debug datasets

class torchelie.datasets.ColoredColumns(*size, transform=None)

A dataset of precedurally generated images of columns randomly colorized.

Parameters:
  • *size (int) – size of images
  • transform (transforms or None) – the image transforms to apply to the generated pictures
class torchelie.datasets.ColoredRows(*size, transform=None)

A dataset of precedurally generated images of rows randomly colorized.

Parameters:
  • *size (int) – size of images
  • transform (transforms or None) – the image transforms to apply to the generated pictures

Datasets wrappers

class torchelie.datasets.HorizontalConcatDataset(datasets)

Concatenates multiple datasets. However, while torchvision’s ConcatDataset just concatenates samples, torchelie’s also relabels classes. While a vertical concat like torchvision’s is useful to add more examples per class, an horizontal concat merges datasets to more classes.

Parameters:datasets (list of Dataset) – the datasets to concatenate
class torchelie.datasets.PairedDataset(dataset1, dataset2)

A dataset that returns all possible pairs of samples of two datasets

Parameters:
  • dataset1 (Dataset) – a dataset
  • dataset2 (Dataset) – another dataset
class torchelie.datasets.MixUpDataset(dataset, alpha=0.4)

Linearly mixes two samples and labels from a dataset according to the MixUp algorithm

https://arxiv.org/abs/1905.02249

Parameters:
  • dataset (Dataset) – the dataset
  • alpha (float) – the alpha that parameterizes the beta distribution from which the blending factor is sampled
class torchelie.datasets.NoexceptDataset(ds)

Wrap a dataset and absorbs the exceptions it raises. Useful in case of a big downloaded dataset with corrupted samples for instance.

Parameters:ds (Dataset) – a dataset
class torchelie.datasets.WithIndexDataset(ds)

Wrap a dataset. Also returns the index of the accessed element. Original dataset’s attributes are transparently accessible

Parameters:ds (Dataset) – A dataset
class torchelie.datasets.CachedDataset(ds, transform=None, device='cpu')

Wrap a dataset. Lazily caches elements returned by the underlying dataset.

Parameters:
  • ds (Dataset) – A dataset
  • transform (Callable) – transform to apply on cached elements
  • device – the device on which the cache is allocated
class torchelie.datasets.Subset(ds, ratio, remap_unused_classes=False)

Create a subset that is a random ratio of a dataset.

Parameters:
  • ds (Dataset) – the dataset to sample from. Must have a .samples member like torchvision’s datasets.
  • ratio (float) – a value between 0 and 1, the subsampling ratio.
  • remap_unused_classes (boolean) – if True, classes not represented in the subset will not be considered. Remaining classes will be numbered from 0 to N.

Functions

torchelie.datasets.mixup(x1, x2, y1, y2, num_classes, mixer=None, alpha=0.4)

Mixes samples x1 and x2 with respective labels y1 and y2 according to MixUp

\(\lambda \sim \text{Beta}(\alpha, \alpha)\)

\(x = \lambda x_1 + (1-\lambda) x_2\)

\(y = \lambda y_1 + (1 - \lambda) y_2\)

Parameters:
  • x1 (tensor) – sample 1
  • x2 (tensor) – sample 2
  • y1 (tensor) – label 1
  • y2 (tensor) – label 2
  • num_classes (int) – number of classes
  • mixer (Distribution, optional) – a distribution to sample lambda from. If unspecified, the distribution will be a Beta(alpha, alpha)
  • alpha (float) – if mixer is unspecified, used to parameterize the Beta distribution