Models

GAN from GauGAN

class torchelie.models.VggImg2ImgGeneratorDebug(in_noise, out_ch, side_ch=1)

A vgg based image decoder that decodes a latent / noise vector into an image, conditioned on another image, like Pix2Pix or GauGAN. This architecture is really close to GauGAN as it’s not an encoder-decoder architecture and uses SPADE

Parameters:
  • in_noise (int) – dimension of the input latent
  • out_ch (int) – number of channels of the output image, 3 for RGB image
  • side_ch (int) – number of channels of the input image, 3 for RGB image
forward(x, y)

Generate an image

Parameters:
  • x (2D tensor) – input latent vectors
  • y (4D tensor) – input images
Returns:

the generated images as a 4D tensor

class torchelie.models.VggClassCondGeneratorDebug(in_noise, out_ch, num_classes)

A vgg based image decoder that decodes a latent / noise vector into an image, conditioned on a class label (through conditional batchnorm).

Parameters:
  • in_noise (int) – dimension of the input latent
  • out_ch (int) – number of channels of the output image, 3 for RGB image
  • side_ch (int) – number of channels of the input image, 3 for RGB image
forward(x, y)

Generate images

Parameters:
  • x (2D tensor) – latent vectors
  • y (1D tensor) – class labels
Returns:

generated images as a 4D tensor

GAN from Pix2Pix

torchelie.models.patch_discr(arch, in_ch=3, out_ch=1, norm=None)

Construct a PatchGAN discriminator

Parameters:
  • arch (list of ints) – a list of number of filters. For instance [64, 128, 256] generates a PatchGAN with 3 conv layers, with respective number of kernels 64, 128 and 256.
  • in_ch (int) – number of input channels, 3 for RGB images
  • out_ch (int) – number of output channels, 1 for fake / real discriminator
  • norm (fn) – a normalization layer ctor
Returns:

the specified patchGAN as CondSeq

torchelie.models.proj_patch_discr(arch, num_classes, in_ch=3, out_ch=1, norm=None)

Construct a PatchGAN discriminator with projection

Parameters:
  • arch (list of ints) – a list of number of filters. For instance [64, 128, 256] generates a PatchGAN with 3 conv layers, with respective number of kernels 64, 128 and 256.
  • num_classes (int) – number of classes to discriminate
  • in_ch (int) – number of input channels, 3 for RGB images
  • out_ch (int) – number of output channels, 1 for fake / real discriminator
  • norm (fn) – a normalization layer ctor
Returns:

the specified patchGAN as CondSeq

torchelie.models.Patch286(in_ch=3, out_ch=1, norm=<sphinx.ext.autodoc.importer._MockObject object>)

Patch Discriminator from pix2pix

Parameters:
  • in_ch (int) – input channels, 3 for pictures
  • out_ch (int) – output channels, 1 for binary real / fake classification
  • norm (function) – the normalization layer to use
torchelie.models.Patch70(in_ch=3, out_ch=1, norm=<sphinx.ext.autodoc.importer._MockObject object>)

Patch Discriminator from pix2pix

Parameters:
  • in_ch (int) – input channels, 3 for pictures
  • out_ch (int) – output channels, 1 for binary real / fake classification
  • norm (function) – the normalization layer to use
torchelie.models.Patch32(in_ch=3, out_ch=1, norm=<sphinx.ext.autodoc.importer._MockObject object>)

Patch Discriminator from pix2pix

Parameters:
  • in_ch (int) – input channels, 3 for pictures
  • out_ch (int) – output channels, 1 for binary real / fake classification
  • norm (function) – the normalization layer to use
torchelie.models.ProjPatch32(in_ch=3, out_ch=1, norm=<sphinx.ext.autodoc.importer._MockObject object>, num_classes=10)

Patch Discriminator from pix2pix, with projection for conditional GANs

Parameters:
  • in_ch (int) – input channels, 3 for pictures
  • out_ch (int) – output channels, 1 for binary real / fake classification
  • norm (function) – the normalization layer to use
  • num_classes (int) – how many classes to discriminate
torchelie.models.Patch16(in_ch=3, out_ch=1, norm=<sphinx.ext.autodoc.importer._MockObject object>)

Patch Discriminator from pix2pix

Parameters:
  • in_ch (int) – input channels, 3 for pictures
  • out_ch (int) – output channels, 1 for binary real / fake classification
  • norm (function) – the normalization layer to use

FIXME: Where to put that one? .. autofunction:: torchelie.models.VggGeneratorDebug

Other GANs

class torchelie.models.AutoGAN(arch, n_skip_max=2, in_noise=256, out_ch=3, batchnorm_in_output=False)

Generator discovered in AutoGAN: Neural Architecture Search for Generative Adversarial Networks.

Parameters:
  • arch (list) – architecture specification: a list of output channel for each block. Each block doubles the resolution of the generated image. Example: [512, 256, 128, 64, 32].
  • n_skip_max (int) – how many blocks far back will be used for the skip connections maximum.
  • in_noise (int) – dimension of the input noise vector
  • out_ch (int) – number of channels on the image
  • batchnorm_in_output (bool) – whether to have a batchnorm just before projecting to RGB. I have found it better on False, but the official AutoGAN repo has it.
forward(z)

Forward pass

Parameters:z (tensor) – A batch of noise vectors
Returns:generated batch of images
torchelie.models.autogan_32(in_noise, out_ch=3)
torchelie.models.autogan_64(in_noise, out_ch=3)
torchelie.models.autogan_128(in_noise, out_ch=3)
torchelie.models.snres_discr(arch, in_ch=3, out_ch=1)

Make a resnet discriminator with spectral norm, using SNResidualDiscrBlock.

Parameters:
  • arch (list) – a list of ints to specify output channels of the blocks, and ‘D’ to downsample. Example: [32, ‘D’, 64, ‘D’]
  • in_ch (int) – number of input channels
  • out_ch (int) – number of output channels
Returns:

an instance

torchelie.models.snres_projdiscr(arch, num_classes, in_ch=3)

Make a resnet discriminator with spectral norm and projection, using SNResidualDiscrBlock.

Parameters:
  • arch (list) – a list of ints to specify output channels of the blocks, and ‘D’ to downsample. Example: [32, ‘D’, 64, ‘D’]
  • in_ch (int) – number of input channels
  • num_classes (int) – number of classes in the dataset
Returns:

an instance

torchelie.models.snres_discr_4l(in_ch=3, out_ch=1)

Make a 4 layers resnet discriminator with spectral norm, using SNResidualDiscrBlock.

Parameters:
  • in_ch (int) – number of input channels
  • out_ch (int) – number of output channels
Returns:

an instance

torchelie.models.snres_projdiscr_4l(num_classes, in_ch=3)

Make a 4 layers resnet discriminator with spectral norm and projection, using SNResidualDiscrBlock.

Parameters:
  • in_ch (int) – number of input channels
  • num_classes (int) – number of classes in the dataset
Returns:

an instance

torchelie.models.snres_discr_5l(in_ch=3, out_ch=1)

Make a 5 layers resnet discriminator with spectral norm, using SNResidualDiscrBlock.

Parameters:
  • in_ch (int) – number of input channels
  • out_ch (int) – number of output channels
Returns:

an instance

torchelie.models.snres_projdiscr_5l(num_classes, in_ch=3)

Make a 5 layers resnet discriminator with spectral norm and projection, using SNResidualDiscrBlock.

Parameters:
  • in_ch (int) – number of input channels
  • num_classes (int) – number of classes in the dataset
Returns:

an instance

Convolutional

torchelie.models.VggBNBone(arch, in_ch=3, leak=0, block=<function Conv2dBNReLU>, debug=False)

Construct a VGG net

How to specify a VGG architecture:

It’s a list of blocks specifications. Blocks are either:

  • ‘M’ for maxpool of kernel size 2 and stride 2
  • ‘A’ for average pool of kernel size 2 and stride 2
  • ‘U’ for nearest neighbors upsampling (scale factor 2)
  • an integer ch for a block with ch output channels
Parameters:
  • arch (list) – architecture specification
  • in_ch (int) – number of input channels
  • leak (float) – leak in relus
  • block (fn) – block ctor
Returns:

A VGG instance

torchelie.models.ResNetBone(arch, head, block, in_ch=3, debug=False)

A resnet

How to specify an architecture:

It’s a list of block specifications. Each element is a string of the form “output channels:stride”. For instance “64:2” is a block with input stride 2 and 64 output channels.

Parameters:
  • arch (list) – the architecture specification
  • head (fn) – the module ctor to build for the first conv
  • block (fn) – the residual block to use ctor
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – should insert debug layers between each layer
Returns:

A Resnet instance

torchelie.models.VectorCondResNetBone(arch, head, hidden, in_ch=3, debug=False)

A resnet with vector side condition.

Parameters:
  • arch (list) – the architecture specification
  • head (fn) – the module ctor to build for the first conv
  • hidden (int) – the hidden size of condition projection
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – should insert debug layers between each layer
Returns:

A Resnet instance

torchelie.models.ClassCondResNetBone(*args, **kwargs)

A resnet with class side condition.

Parameters:
  • arch (list) – the architecture specification
  • head (fn) – the module ctor to build for the first conv
  • hidden (int) – the hidden size of the side label embedding
  • num_classes (int) – the number of possible labels in the side condition
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – should insert debug layers between each layer
Returns:

A Resnet instance

class torchelie.models.UNetBone(arch, in_ch=3, out_ch=1)

Configurable UNet model.

Note: Not all input sizes are valid. Make sure that the model can decode an image of the same size first.

Parameters:
  • arch (list) – an architecture specification made of: - an int, for an kernel with specified output_channels - ‘U’ for upsampling+conv - ‘D’ for downsampling (maxpooling)
  • in_ch (int) – number of input channels
  • out_ch (int) – number of output channels
class torchelie.models.Attention56Bone(in_ch=3)

Attention56 bone

Parameters:in_ch (int) – number of channels in the images

Image classifiers

torchelie.models.VggDebug(num_classes, in_ch=1, debug=False)

A not so small Vgg net classifier for testing purposes

Parameters:
  • num_classes (int) – number of output classes
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – whether to add debug layers
Returns:

a VGG instance

torchelie.models.ResNetDebug(num_classes, in_ch=3, debug=False)

A not so big predefined resnet classifier for debugging purposes.

Parameters:
  • num_classes (int) – the number of output classes
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – whereas to print additional debug info
Returns:

a resnet instance

torchelie.models.PreactResNetDebug(num_classes, in_ch=3, debug=False)

A not so big predefined preactivation resnet classifier for debugging purposes.

Parameters:
  • num_classes (int) – the number of output classes
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – whereas to print additional debug info
Returns:

a resnet instance

torchelie.models.VectorCondResNetDebug(vector_size, in_ch=3, debug=False)

A not so big predefined resnet classifier for debugging purposes.

Parameters:
  • vector_size (int) – size of the conditioning vector
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – whereas to print additional debug info
Returns:

a resnet instance

torchelie.models.ClassCondResNetDebug(num_classes, num_cond_classes, in_ch=3, debug=False)

A not so big predefined resnet classifier for debugging purposes.

Parameters:
  • num_cond_classes (int) – the number of possible labels in the side condition
  • num_classes (int) – the number of output classes
  • in_ch (int) – number of input channels, 3 for RGB images
  • debug (bool) – whereas to print additional debug info
Returns:

a resnet instance

torchelie.models.attention56(num_classes, in_ch=3)

Build a attention56 network

Parameters:
  • num_classes (int) – number of classes
  • in_ch (int) – number of channels in the images

Image Segmenter

torchelie.models.UNet(in_ch=3, out_ch=1)

Instantiate the UNet network specified in _U-Net: Convolutional Networks for Biomedical Image Segmentation_ (Ronneberger, 2015)

Valid input sizes include : 572x572, 132x132

Parameters:
  • in_ch (int) – number of input channels
  • out_ch (int) – number of output channels
Returns:

An instantiated UNet

class torchelie.models.Hourglass(noise_dim=32, down_channels=[128, 128, 128, 128, 128], skip_channels=4, down_kernel=[3, 3, 3, 3, 3], up_kernel=[3, 3, 3, 3, 3], upsampling='bilinear', pad=<sphinx.ext.autodoc.importer._MockObject object>, relu=<sphinx.ext.autodoc.importer._MockObject object>)

Hourglass model from Deep Image Prior.

Classification heads

class torchelie.models.Classifier(feat_extractor, feature_size, num_classes)

A classification head added on top of a feature extraction model.

Parameters:
  • feat_extractor (nn.Module) – a feature extraction model
  • feature_size (int) – the number of features in the last layer of the feature extractor
  • num_classes (int) – the number of output classes
forward(*xs)

Forward pass

Parameters:*xs – arguments for feat_extractor
class torchelie.models.Classifier1(feat_extractor, feature_size, num_classes)

A one layer classification head added on top of a feature extraction model.

Parameters:
  • feat_extractor (nn.Module) – a feature extraction model
  • feature_size (int) – the number of features in the last layer of the feature extractor
  • num_classes (int) – the number of output classes
forward(*xs)

Forward pass

Parameters:*xs – arguments for feat_extractor
class torchelie.models.ProjectionDiscr(feat_extractor, feature_size, num_classes)

A classification head for conditional GANs discriminators using a projection discriminator from https://arxiv.org/abs/1802.05637

Parameters:
  • feat_extractor (nn.Module) – a feature extraction model
  • feature_size (int) – the number of features in the last layer of the feature extractor
  • num_classes (int) – the number of output classes
forward(x, y)

Forward pass

Parameters:
  • x – argument for feat_extractor
  • y – class label
torchelie.models.PerceptualNet(layers)

Make a VGG16 with appropriately named layers that records intermediate activations.

Parameters:layers (list of str) – the names of the layers for which to save the activations.

PixelCNN

class torchelie.models.PixelCNN(hid, sz, channels=3, n_layer=3)

A PixelCNN model with 6 blocks

Parameters:
  • hid (int) – the number of hidden channels in the blocks
  • sz ((int, int)) – the size of the images to learn. Must be square
  • channels (int) – number of channels in the data. 3 for RGB images
forward(x)

A forward pass for training

sample(temp, N)

Sample a batch of images

Parameters:
  • temp (float) – the sampling temperature
  • N (int) – number of images to generate in the batch
Returns:

A batch of images