RISE: Randomized Input Sampling for Explanation (of black-box models). Why a black sheep is a cow

I recently attended a presentation about machine learning explainability. The team from Sopra-Steria was presenting theirs work on using submarine sonar sensors to detect internal equipment failures. Each machine in the submarine emits some noise, analyzing these noises they wanted to detect when a component was starting to fail, for instance a defecting ball bearing in a pump. Once the model build and show that it was obtaining good prediction scores, they faced many questions coming from navy engineers. The studied the field for years, had a lot of background and wanted to understand why the neural network was giving a specific result. A difficult challenge they solved using RISE.

RISE: Randomized Input Sampling for Explanation of Black-box Models
Vitali Petsiuk, Abir Das, Kate Saenko
https://arxiv.org/abs/1806.07421

Let’s leave the submarine sound word and come to the paper, that is instead centered on image classification explainability. Look at the picture below:

Here the question was: “why in this picture the AI model detects a sheep and a cow, and not just sheeps”. As these king of models have millions of parameters, understanding why from that point of view is impossible. The authors used a black-box approach, that produced the 2nd and 3rd picture, showing that the model is unable to recognize the black sheep as a sheep. As the model is black-box, it can be applied to any model, not just neural networks.

The idea is surprisingly easy to explain. Given the original image we have some classification probabilities: 26% sheep and 17% cow. Let’s focus just on the cow probability; what happens if I hide a patch of the original image and I reapply the same AI model? I will obtain a different probability. Let’s say 16.9 if I hide a part of the water, and 15% if I hide the black sheep’s legs.

If we repeat this patch and evaluate loop many many times, we can do a pixel by pixel average and decide that some pixels are more important because they drive up the probability. In the end we can paint in red the more important and in blue the others, obtaining the interesting picture above.

Of course I am over-simplifying the problem: how many times do I have to do this? How big must be the picture patches? How do I patch the image, turning the pixels to gray? to black? blurring them? Turning the pixels to black and using a sort of sub-sampling grid to decide where to put the patches seems the better approach.

To evaluate and compare RISE with other methods (such as LIME) the author presented the “deletion” metric. Look at this picture:

On the x-axis a measure of how much of the original image has been hidden, before applying the AI model. On the y-axis a measure of the classification probability. Removing a very small part of the image, but from the importance hot-spot, makes the probability drop. It means that the RISE method is doing good at identifying the hot-spot.

A complementary metric can be introduced reversing the approach: how much the probability rises giving more and more pixels; this is the insertion metric.

To conclude: you can obtain nice images explaining the hot-spots that made a class be chosen, but you have to evaluate the model on thousand of “altered” inputs for a single input instance. In the submarine case, for a 30 seconds sound track it was needed half an hour elaboration to provide an explanation.

Written by Giovanni

July 22, 2023 at 9:17 am

Posted in Varie

Giovanni Bricconi