What’s an Autoencoder?

This term pops out quite often in papers, but what’s an Autoencoder? It is a machine learning model trained so that it is able to reconstruct its input. This seems quite crazy at the beginning, it is just x = f(x), what is the point in creating a model that is equivalent to the identity? Chapter 14 of Deep learning by Ian Goodfellow explains why this can become useful in 25 pages. This article is just a short resume and you are invited to follow the link and read the whole chapter.

As usual let’s x be a vector of inputs. The autoencoder internally will be divided in 2 layers, an encoder which computes h=encode(x) and a decoder which computes y=decode(h); ideally you should obtain x=y=decode(encode(x)). The crucial point is that the h hidden vector is not the same dimension of x and can have useful properties. For instance x can be a thousand of pixels from an images, and h can be just composed of tents of elements. When the size of h is smaller than the size of x we speak of undercomplete autoencoders.

If we allow the decoder layer to be powerful and complex, like a deep neural network, we will end up in having a model that just learns the identify function, and this will not be useful. We will instead allow only the encoder to be complex, the decoder must be simple. We are interested therefore in obtaining an useful h representation of the complex input, and we use the decoder part just because we want to do unsupervided learning. We do not need to define and classify into h dimensions all the inputs we have, we just want the model to obtain by itself an h that has the useful properties we are interested in.

Which useful properties do we want to impose to h? Sparsity is an interesting property: if h is sparse a small input change won’t influence much the h representation. The encoder will extract some brief representation of the input, and in practice we will use this representation to compare between them different inputs. At the chapter’s end there is a reference to universal hashing, recognizing similar texts by comparing the h vector; an interesting topic I would like to describe in my next posts. The encoder can also be used as generative model, given a change in the h state you can check what is the corresponding input, good to visualize what the model is considering

Sparsity is obtained by adapting the loss function used during the training

L(x, decode(encode(x))) + regularize(h)

The regularize function can for instance penalize elements with too high value. In our case we may want to have it done using rectifier units: ReLU will naturally move to 0 all elements where h is near to zero or negative, while keeping the value for positive h elements. The representation we will obtain will become sparse.

Among autoencoders we have denoising autoencoders. Here the idea is that we do not feed just x to the model, but x+noise, and we still want the model to recostruct x. x=decode(encode(x+noise)). By doing this we force the model to be robust to some small modification of the input, the model will actually provide a likelihood x’, doing a kind of projections of the input vector to the inputs seen in the past. The book gives some nice visual pictures to explain this concept, for instance figure 14.4. The autoencoder has learned to recognize one manifold of inputs, a subset of the input space, when a noisy input comes it is projected to this manifold giving the most promising candidate x’. Citing the authors:

The fact that x is drawn from the training data is crucial, because it means the autoencoder need not successfully reconstruct inputs that are not probable under the data-generating distribution
https://www.deeplearningbook.org/contents/autoencoders.html

Another interesting idea that come about noise and sparsity is the following: what about using sigmoid units to compute the final h representation, and inject just before them a bit of noise? The sigmoids saturates on extreme values: the noise will naturally be discarded if they work far from zero. Injecting the noise forces the h elements to be binary and sparse.

Written by Giovanni

May 29, 2023 at 3:24 pm

Posted in Varie

One Response

Subscribe to comments with RSS.

[…] my previous posts I have described Autoencoders and Deep belief networks, these concepts are needed to understand how semantic hashing works. […]

What is Semantic Hashing? | Giovanni Bricconi

June 4, 2023 at 11:28 am

Reply

Giovanni Bricconi