Giovanni Bricconi

My site on WordPress.com

The Neuro-Symbolic Concept Learner

with one comment

Reading the Explainable AI paper I have found a reference to Neuro-Symbolic approach: extracting and working with symbols would indeed make neural network predictions human interpretable. One referred article was about answering questions on still life simplified scenes using neural networks; for instance “is the yellow cube of the same material of the red cylinder?”.

https://cs.stanford.edu/people/jcjohns/clevr/

You see above a picture taken from the CLEVR dataset project. They provide images with simple geometric object paired with questions and answers, to enable ML models benchmarking. The shapes used and the questions structure is on purpose limited and well defined, to make the problem approachable.

Having been exposed, long time ago, to languages like Prolog and Clips I was expecting some mix of neural networks and symbolic programs to answer the questions: they were in my mind quite complementary. Symbolic programming to analyze the question and evaluate its result, neural networks to extract the scene features… but I was wrong, in the following paper all is done in a much more neural-network way

Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu. THE NEURO-SYMBOLIC CONCEPT LEARNER: INTERPRETING SCENES, WORDS, AND SENTENCES FROM NATURAL SUPERVISION. ICLR 2019

http://nscl.csail.mit.edu/

The neuro-symbolic concept learner (NSCL) is composed of 3 modules, a neural perception module that extracts latent features from the scene, a semantic parser that analyze the questions and a program executor that provides the answer. What surprised me, and I am still not clear on how it works, is that all the modules are implemented with neural networks and is therefore possible to train them in the neural network way. Citing the authors:

…We propose the neuro-symbolic concept learner (NS-CL), which jointly learns
visual perception, words, and semantic language parsing from images and question-answer pairs.

Starting from an image perception module pre-trained on the CLEVR dataset, the other modules are trained in a “curricular way”: the training set is structured so that in a first phase only simple questions are proposed, and in later steps things get more complicated. First questions on object level concepts like color and shape, then relational question such as “how many object are left of the red cube”, etc.

The visual perception module extracts concepts like in the following picture taken from the paper:

Relations between objects are encoded in a similar way. Each property will be a probabilistic value, of being a cube, of being red of being above the sphere… Having this probabilistic representation is possible to construct a program that use the probabilities to compute the result. For instance you can define a filter operation that filters all the cube objects, just selecting the object that have high probability of being a cube and discarding the others. The coefficients of this filter operation will be learned from the training data set.

A question will be decomposed in a sequence of operations like: Query(“color”, Filter(“cube”,Relation(“left”,Filter(“sphere”, scene)))) -> tell me the color of the cube left to the sphere. All the operation works with probabilities and concepts embedding.

It is not clear to me how the parsing and the execution works, the authors say they used bidirectional GRU for that. Also the parser is trained from the questions, in my understanding generating parse trees and discarding those that executed do not lead to the correct answer. This part is too short in the paper, I will try to dig more into this in future. I feel also missing some examples on how the features are represented.

Anyway, as the execution is decomposed in stages have a symbolic meaning (filter, relation,…), it is easy to understand “why” the ML has chosen an answer. If that answer is not correct you can look backward in the execution and see if the attribute extraction was wrong or the problem comes from some other stage. Much more XAI oriented than a simple neural network. There are a lot of interesting references to have a look to in this article, I will try to dig further.

Written by Giovanni

February 5, 2023 at 11:32 am

Posted in Varie

One Response

Subscribe to comments with RSS.

  1. […] weeks ago I posted the neuro-symbolic concept learner article, and I wanted to know more about this approach. I then read about GRUs, used in that […]


Leave a comment